InsideOut (Riley)

Server Details

AI infrastructure design agent. Describe your app in plain English; Riley designs, prices, and deploys AWS or GCP infrastructure with generated Terraform.

Status: Healthy
Last Tested: 2026-05-25 09:32
Transport: Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

A4.5/5.0

Tool DescriptionsA

Average 4.6/5 across 24 of 24 tools scored.

Server CoherenceA

Disambiguation5/5

Each tool has a clearly distinct purpose, grouped by prefix (convo-, tf-, aws-, gcp-, stack-). No two tools perform the same operation; batch variants are explicit about being faster batch alternatives.

Naming Consistency5/5

Consistent prefix-based naming with snake_case conventions throughout (e.g., awsinspect_batch, stackrollback). The prefixes (convo, tf, aws, gcp, stack) reliably indicate the tool's domain.

Tool Count4/5

24 tools is slightly above the ideal range but still well-scoped for a comprehensive infrastructure management server covering conversation, design, terraform, deployment, inspection, and feedback.

Completeness5/5

The tool set covers the full lifecycle from conversation start to deployment, destroy, monitoring, inspection, versioning, credential handling, and feedback. No obvious gaps in the domain.

Available Tools

24 tools

awsinspectInspect AWS InfrastructureA

Read-only

Inspect

INSPECTION: Inspect AWS infrastructure for a deployed project ⚠️ PREREQUISITE: This tool requires a prior deployment ATTEMPT (successful or failed). Check convostatus for hasDeployAttempt=true before calling. Works even after failed deploys to inspect orphaned resources.

Inspect deployed AWS resources after a deployment attempt. Use this tool when the user asks about the status or details of their deployed infrastructure. It fetches temporary read-only credentials securely and queries the AWS API directly.

RESPONSE TIERS (default is summary for token efficiency):

Summary (default): Key fields only (~500 tokens). Set detail=false, raw=false or omit both.
Detail: Full metadata for a specific resource. Set detail=true + resource filter.
Raw: Complete unprocessed API response. Set raw=true.

REQUIRES: session_id from convoopen response (format: sess_v2_...). Supported services: account, acm, alb, apigateway, apprunner, backup, bedrock, cloudfront, cloudwatchlogs, cognito, cost-explorer, dynamodb, ebs, ec2, ecs, eks, elasticache, kms, lambda, msk, opensearch, rds, route53, s3, sagemaker, secretsmanager, sqs, vpc, waf For a specific service's actions, call with action="list-actions". METRICS: Use list-metrics to discover available metrics for a service (no credentials needed). Then use get-metrics to retrieve data (auto-discovers resources). Most services return CloudWatch time-series. KMS returns key health (rotation, state). SecretsManager returns secret health (rotation, last accessed/rotated). Optional filters JSON: {"hours":6,"period":300}. BILLING: Use service=cost-explorer to inspect AWS costs. Actions: get-cost-summary (last 30 days by service, filters: {"days":7,"granularity":"DAILY"}), get-cost-forecast (projected spend through end of month), get-cost-by-tag (costs grouped by tag, filters: {"tag_key":"Environment","days":30}). Requires ce:GetCostAndUsage and ce:GetCostForecast IAM permissions.

EXAMPLES:

awsinspect(session_id=..., service="ec2", action="describe-instances")
awsinspect(session_id=..., service="cost-explorer", action="get-cost-summary")
awsinspect(session_id=..., service="ec2", action="get-metrics", filters="{"hours":6}")
awsinspect(session_id=..., service="rds", action="describe-db-instances", detail=true)

ParametersJSON Schema

Name	Required	Description
`raw`	Yes	When true, returns the unprocessed AWS API response. Escape hatch for fields the summarized response doesn't surface.
`action`	Yes	Operation on the service. Examples: 'describe-instances' (ec2), 'list-buckets' (s3), 'list-keys' (kms), 'get-cost-summary' (cost-explorer), 'list-actions' (discovery), 'list-metrics' / 'get-metrics' (CloudWatch).
`detail`	Yes	When true, returns full metadata for a single resource (requires a resource ID in filters). When false (default), returns a summary.
`filters`	Yes	Optional JSON-encoded filter object passed through to the underlying AWS API. Examples: '{"hours":6}' for metric windows, '{"days":7,"granularity":"DAILY"}' for cost queries.
`service`	Yes	AWS service to query. Examples: 'ec2', 'rds', 'vpc', 's3', 'lambda', 'eks', 'ecs', 'cost-explorer'. Use action='list-actions' to discover the supported actions for a service.
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing. The session must have an AWS deploy attempt before inspect probes will succeed.

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations readOnlyHint=true and openWorldHint=true are consistent. The description adds that the tool fetches temporary read-only credentials securely and queries the AWS API directly. It explains response tiers (summary, detail, raw) and that it works even after failed deploys. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is lengthy but well-structured with sections (PREREQUISITE, RESPONSE TIERS, REQUIRES, supported services, METRICS, BILLING, EXAMPLES). It is front-loaded with key information. Some redundancy could be trimmed (e.g., 'Inspection: Inspect AWS infrastructure' repeats), but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, many services/actions, response tiers, prerequisites), the description is thorough. It covers prerequisites, how to use metrics and billing, and provides multiple examples. Without an output schema, it still explains response tiers (summary, detail, raw) sufficiently.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, so the description compensates. It explains session_id format (sess_v2_ from convoopen), lists supported services, describes action with examples, explains filters as JSON string with examples, and clarifies detail and raw booleans for response tiers. This provides rich meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Inspect AWS infrastructure for a deployed project.' It specifies the prerequisite of a prior deployment attempt, and distinguishes itself from sibling tools like convostatus (checks deployment attempt status) and tfdeploy (deployments). The verb 'Inspect' and resource 'AWS infrastructure' are specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use: 'Use this tool when the user asks about the status or details of their deployed infrastructure.' It details the prerequisite: requires a prior deployment attempt and advises checking convostatus for hasDeployAttempt=true. It also provides examples for various actions, offering clear context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

awsinspect_batchBatch-Inspect AWS InfrastructureA

Read-only

Inspect

BATCH INSPECTION: run up to 32 AWS inspect probes in one call. ⚠️ PREREQUISITE: Same as awsinspect — deploy attempt required. Check convostatus for hasDeployAttempt=true before calling.

Use this when you need to check more than ~3 resources. The backend fetches Oracle credentials ONCE per batch and fans out probes against a single AWS config — for a 12-resource health check this is ~5–8× faster and 12× fewer Oracle round-trips than calling awsinspect 12 times.

BUDGETS:

Up to 32 sub-probes per call (subs array length).
30s per-sub timeout; 60s total batch wall-clock.
Concurrency cap 8 — sub-probes run in parallel but never saturate AWS.
512 KB response cap: subs past the cap keep their envelope (index/service/action/ok) but have result replaced with truncated=true.

PARTIAL FAILURE IS EXPECTED. The response is an ordered results array; each entry has {index, service, action, ok, result, error}. Inspect each result — do NOT abort on the first error. A credential fetch failure leaves cred-less probes (list-actions, list-metrics) succeeding anyway.

EXAMPLES:

awsinspect_batch(session_id=..., subs=[ {"service":"ec2","action":"describe-instances"}, {"service":"rds","action":"describe-db-instances"}, {"service":"vpc","action":"describe-vpcs"}, {"service":"s3","action":"list-buckets"}])
awsinspect_batch(session_id=..., subs=[ {"service":"ec2","action":"get-metrics","filters":"{"hours":6}"}, {"service":"rds","action":"get-metrics","filters":"{"hours":6}"}])

ParametersJSON Schema

Name	Required	Description	Default
`subs`	Yes	Up to 32 sub-probes, each with {service, action, filters?, detail?, raw?}. The backend fetches credentials once per batch and fans out probes in parallel (concurrency 8, 30s per-sub timeout, 60s total wall clock). Partial failure is expected — inspect each result.ok independently.
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing. The session must have an AWS deploy attempt before inspect probes will succeed.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds significant context: budgets, timeouts, concurrency cap, response cap, partial failure expectation, credential handling, and response structure. This fully informs the agent of behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear headings (BUDGETS, PARTIAL FAILURE, etc.) and front-loaded with purpose and prerequisite. Every sentence adds value—no redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (batch, multiple limits, partial failure) and no output schema, the description covers prereqs, limits, response format, alternative tools, and examples. It is fully adequate for an agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so the description carries the burden. It explains the session_id format and the subs array structure (service, action, optional filters) with examples. The filters parameter format is implied via JSON examples but not explicitly described. Still, agents can infer usage from examples and the supported services list.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs batch inspection of AWS infrastructure using up to 32 probes. It explicitly distinguishes from the sibling tool 'awsinspect' by noting that this is for batches and faster when checking more than ~3 resources.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use ('Use this when you need to check more than ~3 resources') and when-not-to-use ('For a specific service's actions, use awsinspect (singular)'). Also specifies prerequisite (check convostatus for hasDeployAttempt=true) and required session_id format.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

convoawaitAwait Pending ResponseA

Read-only

Inspect

Wait for a pending response from Riley after a convoreply timeout.

🎯 USE THIS TOOL WHEN: convoreply returned a timeout error. This allows you to continue waiting for the response without resending the message.

REQUIRES:

session_id: from convoopen response

OPTIONAL:

message_id: if known (from convoreply timeout error)
timeout (integer): seconds to wait. For Cursor, use 50 (default). Max 55.

Returns the same format as convoreply when successful.

ParametersJSON Schema

Name	Required	Description
`timeout`	No	Max seconds to wait. Default 50, max 55.
`message_id`	No	Optional message ID from a convoreply timeout error. Not required for normal turn-based flow.
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing.

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, openWorldHint) are complemented by the description, which adds details on required session_id, optional message_id and timeout, and return format. Does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is concise and well-structured: front-loaded purpose, then usage, then parameter details. No redundant sentences.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers when to use, required/optional parameters, constraints, and return format (same as convoreply). Highly complete for a waiting tool with no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 0% schema description coverage, the description explains each parameter: session_id from convoopen, message_id from timeout error, timeout integer with default 50 and max 55. This adds significant meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool waits for a pending response after a convoreply timeout. It specifies the verb 'wait' and resource 'pending response', and distinguishes from sibling tools like convoreply.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'USE THIS TOOL WHEN: convoreply returned a timeout error', providing clear context and conditions. Could mention when not to use, but the guidance is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

convoinspectInspect Session TranscriptA

Read-only

Inspect

INSPECTION: View a session's conversation transcript and metadata Returns the full message history (user / assistant / tool turns) plus the session's meta — workflow step, cloud, deployment status, drift state.

This is the transcript-reader companion to the other read tools — combine it with: • convostatus for the live stack / config / pricing • tfruns for deployment history (apply / destroy / plan / drift) • stackversions for the stack-version ladder

Use it when a user asks 'what did I say earlier?' or you need to retrace why the session ended up where it did. Read-only; never mutates session state.

REQUIRES: session_id (format: sess_v2_...).

ParametersJSON Schema

Name	Required	Description	Default
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing.

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint. The description reinforces read-only behavior and adds details about the returned data (full message history, metadata). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise with clear sections (INSPECTION, Returns, companion tools, use case, Requires). Every sentence adds value; no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with one parameter and no output schema, the description fully explains purpose, return data, usage context, and parameter format, making it complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter session_id is not described in the schema (0% coverage), but the description adds format requirement 'sess_v2_...', which is essential for correct invocation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool inspects a session's conversation transcript and metadata, using specific verbs ('view', 'returns') and distinguishing itself from sibling tools like convostatus, tfruns, and stackversions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly provides when-to-use scenarios ('what did I say earlier?', retracing session state) and lists companion tools for different purposes, along with a read-only note.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

convoopenStart Design SessionAInspect

WORKFLOW: Step 1 of 4 - Start infrastructure design conversation Open an InsideOut V2 session and receive the assistant's intro message. The response contains a clean message from Riley (the infrastructure advisor) - display it to the user. ⚠️ Riley will ask questions - forward these to the user, DO NOT answer on their behalf. CRITICAL: This tool returns a session_id in the response metadata. You MUST use this session_id for ALL subsequent tool calls (convoreply, tfgenerate, tfdeploy, etc.). ⚠️ The session_id includes a ?token=... suffix (format: sess_v2_xxx?token=yyy) which is part of the session credential — without it, downstream tools fall back to a tokenless connect URL that 401s. Always pass session_id verbatim to subsequent tools and to the user; do NOT shorten, paraphrase, or strip the ?token= portion when summarizing the session in chat or in your own scratch notes. Use when the user mentions keywords like: 'setup my cloud infra', 'provision infrastructure', 'deploy infra', 'start insideout', 'use insideout', or similar intent to begin infra setup.

OPTIONAL: project_context (string) - General tech stack summary so Riley can skip discovery questions and jump to recommendations. The agent should confirm this with the user before sending. Include whichever apply: language/framework, databases/services, container usage, existing IaC, CI/CD platform, cloud provider, Kubernetes usage, what the project does. Example: 'Next.js 14 + TypeScript, PostgreSQL, Redis, Docker Compose, deployed to AWS ECS, GitHub Actions CI/CD, ~50k MAU'. NEVER include credentials, secrets, API keys, PII, source code, or internal URLs/IPs -- only general metadata summaries useful to a cloud architect agent. IMPORTANT: source (string) - You MUST set this to identify which IDE/tool you are. Auto-detect from your environment: 'claude-code', 'codex', 'antigravity', 'kiro', 'vscode', 'web', 'mcp'. If unsure, use the name of your IDE/tool in lowercase. Do NOT omit this — it controls the 'Open {IDE}' button on the credential connect screen. OPTIONAL: github_username (string) - GitHub username for deploy commit attribution. Pre-populates the GitHub username field on the connect page. 💡 TIP: Examine workflow.usage prompt for more context on how to properly use these tools.

ParametersJSON Schema

Name	Required	Description
`source`	No	IDE/tool identifier so the connect screen can show the right 'Open {IDE}' button. Use lowercase: 'claude-code', 'codex', 'antigravity', 'kiro', 'cursor', 'vscode', 'windsurf', 'zed', 'aider', 'copilot', 'web', 'mcp'.
`github_username`	No	GitHub username used for deploy commit attribution; pre-fills the GitHub username field on the connect screen.
`project_context`	No	Optional tech-stack summary so Riley can skip discovery questions (e.g. 'Next.js 14 + Postgres on AWS, ~50k MAU'). No PII, secrets, file paths, or source code — only general metadata useful to a cloud architect.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description explains that the tool returns a session_id required for subsequent calls, that the response contains a message from the advisor Riley, and includes critical instructions like not answering on behalf of the user. It also details parameter behavior like auto-detecting source. This goes far beyond the minimal annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is lengthy but well-structured with clear sections (WORKFLOW, CRITICAL, IMPORTANT, TIP) and bullet points. Every sentence provides essential information for correct usage, though it could be slightly more compact.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's role as the first step in a complex workflow, the description covers: purpose, return value (session_id, Riley's message), parameter details, usage triggers, and workflow coupling. It also references additional context in 'workflow.usage' prompt, making it thorough for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, but the tool description fully compensates by explaining each parameter: source (with examples of tool names), project_context (what to include/exclude, example), and github_username (purpose). This adds significant value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states that this tool starts an infrastructure design conversation by opening an InsideOut V2 session and receiving an intro message. It clearly names the specific verb 'open' and resource 'session,' and distinguishes it from sibling tools like 'convoreply' by noting it is step 1 of a 4-step workflow.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit trigger keywords such as 'setup my cloud infra' and 'start insideout,' and indicates this is the initial step in a workflow. While it does not list when not to use the tool, it gives clear context for when to start the session.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

convoreplySend MessageAInspect

WORKFLOW: Step 2 of 4 - Continue infrastructure design conversation Send a user message to the active InsideOut session and receive the assistant reply. The response contains a clean message from Riley - display it to the user.

⚠️ CRITICAL: DO NOT answer Riley's questions yourself! Forward questions to the user and wait for their response. NEVER fabricate or assume the user's answer, even if you think you know what they would say. Examples of questions Riley asks that YOU MUST forward to the user:

'Any questions or tweaks to these details?'
'Ready for the cost estimate?'
'Do you want to change the stack/config?'
'Ready to proceed to Terraform?' When Riley asks ANY question, STOP and wait for the user's answer!

📋 WORKFLOW PHASES: The typical flow is conversation → tfgenerate → tfdeploy When terraform_ready=true appears in THIS tool's response, THEN you can call tfgenerate. ⚠️ DO NOT call tfgenerate until this tool returns! Wait for the response first.

🎯 KEY SIGNALS IN RESPONSE:

[TERRAFORM_READY: true] → NOW you can call tfgenerate
[[BUTTON_TF_APPLY: ...]] → Deployment is ready! Ask user if they want to deploy, then use tfdeploy
[[BUTTON_TF_DESTROY: ...]] → User confirmed destroy intent! Ask user to confirm, then use tfdestroy
[[BUTTON_TF_PLAN: ...]] → User wants to preview changes! Use tfplan to run a plan, then tfdeploy with plan_id to apply

REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: timeout (integer) - seconds to wait for response. For Cursor, use 50 (default). Max 55. OPTIONAL: project_context (string) - Only pass genuinely NEW project details the user shares after convoopen. Do NOT resend context already provided in convoopen — Riley remembers it. Do NOT scan files or directories to gather this — only use what the user explicitly tells you. Example: user reveals a new constraint like 'we also need HIPAA compliance' mid-conversation. 💡 TIP: Use convostatus to check progress anytime. Examine workflow.usage prompt for more guidance.

ParametersJSON Schema

Name	Required	Description
`text`	Yes	User message to send to Riley. Forward verbatim what the user said — do not summarize or rewrite.
`retry`	No	When true, re-send the most recent user turn instead of submitting a new one.
`timeout`	No	Max seconds to wait for Riley's response. Default 50, max 55.
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing.
`project_context`	No	Only NEW project details revealed after convoopen (e.g. user mentions a new constraint mid-conversation). Don't re-send context already provided in convoopen. No PII or secrets.

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare openWorldHint and destructiveHint. The description adds significant behavioral context: it is a step in a workflow, returns assistant reply with signals, and warns against fabricating user answers. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (WORKFLOW, CRITICAL, etc.) and front-loads important information. While lengthy, every part adds necessary context for correct tool usage. Could be slightly trimmed but remains effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, no output schema, multi-step workflow), the description is highly complete. It covers return signals, workflow phases, relationships with sibling tools, and edge cases like when to call other tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so description carries full burden. It explains most parameters: session_id, timeout, project_context, auto_accept, and text (implied). However, the 'retry' parameter is not mentioned at all, which is a minor gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: sending a user message to an active InsideOut session and receiving the assistant reply. It also distinguishes itself from sibling tools like convoopen (session start) and convostatus (progress check).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use and when-not-to-use instructions: do not answer Riley's questions, forward them to the user. Specifies workflow phases and when to call tfgenerate. Notes requirement of session_id from convoopen and gives guidance on optional parameters like project_context and auto_accept.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

convostatusView Session Stack StatusA

Read-only

Inspect

INSPECTION: View the current infrastructure stack for a session Returns the current state of the user's infrastructure design including:

Components - Selected infrastructure services (VPC, databases, caching, etc.) • Shows what services the user has chosen (e.g., PostgreSQL, Redis, S3) • Includes architecture decisions (EKS vs EC2, monolith vs microservices)

Config - Configuration details for each component • Database sizes, replica counts, storage amounts • Cache settings, queue configurations • Backup schedules and retention policies

Pricing - Cost estimates (when available) • Monthly cost estimates per component • Total estimated monthly spend

Phase Indicators - Where the user is in the design workflow: • hasComponents: User has selected infrastructure services • hasConfig: User has configured component details • hasPricing: Cost estimates have been calculated • hasTerraform: Ready for Terraform generation

Use this tool when the user asks 'what is my current stack?', 'show my infrastructure', 'what have I selected?', or similar questions about their design progress. REQUIRES: session_id from convoopen response (format: sess_v2_...).

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	No	Optional. Specific job ID to inspect. When omitted, returns the status of the latest job for the session.
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing.

Tool Definition Quality

A4.3/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. Description goes beyond by detailing the four sections of output (Components, Config, Pricing, Phase Indicators) and their contents, effectively explaining behavior even without an output schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with bullet points and clear sections. Front-loaded with purpose and usage guidance. Slightly verbose in listing detailed bullet items, but every sentence adds value relative to complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite lacking an output schema, the description provides a comprehensive breakdown of the return structure (Components, Config, Pricing, Phase Indicators) and their meanings. Also specifies prerequisite and use cases, making the tool self-contained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage. Description only adds meaning for session_id (source/format) but completely omits include_code, leaving its purpose unclear. Insufficient compensation for undocumented parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'View the current infrastructure stack for a session' with detailed breakdown of components, config, pricing, and phase indicators. Differentiates from siblings like awsinspect or convoinspect by focusing on session stack status.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly lists example user queries that should trigger this tool, and mentions prerequisite (session_id from convoopen). Does not explicitly state when not to use or list alternatives, which prevents a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

credawaitAwait Cloud CredentialsA

Idempotent

Inspect

Wait for the user to securely connect their cloud account and subscribe to Luther Systems. Polls until credentials appear on the session.

🎯 USE THIS TOOL WHEN: tfdeploy returns an 'auth_required', 'no_credentials', or 'credentials_expired' error.

The user needs to visit the connect URL to:

Connect their cloud credentials (AWS or GCP)
Sign up and subscribe to a Luther Systems plan (required for deployment)

This secure connection allows InsideOut to deploy and manage infrastructure in the user's cloud account on their behalf. Credentials are handled securely and only used for deployment and management sessions.

WORKFLOW:

FIRST: Present the connect URL and explanation to the user (from the tfdeploy error response)
THEN: Call this tool to begin polling for credentials
The user opens the URL in their browser to subscribe and add credentials
When credentials are found, inform the user and call tfdeploy to deploy

IMPORTANT: Do NOT call this tool without first showing the connect URL to the user. The user needs to see the URL to complete the process.

REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: cloud ('aws' or 'gcp'), timeout (integer, seconds to wait, default 300, max 600).

ParametersJSON Schema

Name	Required	Description
`cloud`	No	Cloud provider whose credentials are awaited: 'aws' or 'gcp'. Defaults to 'aws'.
`timeout`	No	Max seconds to wait for the user to complete the browser-based credential connect flow. Default 300, max 600.
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing.

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare idempotentHint=true and destructiveHint=false. The description adds that the tool polls, requires user interaction, and handles credentials securely. It does not contradict annotations, and additional context (polling, user action) enhances transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with emojis, bold headers, and numbered steps, making it scannable. It front-loads the purpose and use case. Slightly verbose but earns each sentence; could trim minor redundancy (e.g., 'This secure connection allows...' is useful but a bit lengthy).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (polls external user action, multiple steps, no output schema), the description covers the workflow, prerequisites, and expected user flow. It lacks explicit mention of timeout behavior or return value, but the agent can infer success/failure from credential existence. Overall sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage, yet the description explains all three parameters: session_id (required, format), cloud (optional, 'aws' or 'gcp'), timeout (integer, default 300, max 600). This fully compensates for missing schema descriptions and adds crucial usage context like default and max values.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool waits for cloud credentials to be connected, specifies the verb 'wait', and distinguishes it from siblings like tfdeploy and convoawait by detailing the exact use case (after auth_required errors). It uses specific resource 'cloud credentials' and 'Luther Systems subscription'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (after tfdeploy returns specific errors), provides a numbered workflow, and includes a strong 'do not call without showing URL' warning. Also references the required prerequisite (session_id from convoopen) and optional parameters.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

gcpinspectInspect GCP InfrastructureA

Read-only

Inspect

INSPECTION: Inspect GCP infrastructure for a deployed project ⚠️ PREREQUISITE: This tool requires a prior deployment ATTEMPT (successful or failed). Check convostatus for hasDeployAttempt=true before calling. Works even after failed deploys to inspect orphaned resources.

Inspect deployed GCP resources after a deployment attempt. Use this tool when the user asks about the status or details of their deployed GCP infrastructure. It fetches temporary read-only credentials securely and queries the GCP API directly.

RESPONSE TIERS (default is summary for token efficiency):

Summary (default): Key fields only (~500 tokens). Set detail=false, raw=false or omit both.
Detail: Full metadata for a specific resource. Set detail=true + resource filter.
Raw: Complete unprocessed API response. Set raw=true.

METRICS: Use list-metrics to see available Cloud Monitoring metrics for any service (no credentials needed — progressive disclosure). Use get-metrics to retrieve time-series data. Optional filters JSON: {"hours":6,"period":300}. Label breakdowns: Cloud Functions (by status), Load Balancer/API Gateway (by response_code_class), Cloud CDN (by cache_result). Secret Manager get-metrics returns operational health (version count, replication, create time) — no time-series. Bastion is an alias for Compute Engine metrics (SSH connection count not available as a GCP metric). BILLING: Use service=billing to inspect GCP billing. Actions: get-billing-info (check if billing enabled, which billing account), get-budgets (list budget alerts for the project — auto-fetches billing account). Requires roles/billing.viewer IAM role. Required IAM roles: Monitoring Viewer (roles/monitoring.viewer) for metrics, Secret Manager Viewer (roles/secretmanager.viewer) for secret health, Billing Viewer (roles/billing.viewer) for billing.

EXAMPLES:

gcpinspect(session_id=..., service="compute", action="list-instances")
gcpinspect(session_id=..., service="gke", action="list-clusters")
gcpinspect(session_id=..., service="cloudsql", action="get-metrics", filters="{"hours":6}")
gcpinspect(session_id=..., service="billing", action="get-billing-info")

ParametersJSON Schema

Name	Required	Description
`raw`	Yes	When true, returns the unprocessed GCP API response. Escape hatch for fields the summarized response doesn't surface.
`action`	Yes	Operation on the service. Examples: 'list-instances' (compute), 'list-buckets' (storage), 'list-clusters' (gke), 'list-actions' (discovery), 'list-metrics' / 'get-metrics' (Cloud Monitoring).
`detail`	Yes	When true, returns full metadata for a single resource. When false (default), returns a summary.
`filters`	Yes	Optional JSON-encoded filter object passed through to the underlying GCP API. Examples: '{"hours":6}' for metric windows, '{"zone":"us-central1-a"}' for zone-scoped queries.
`service`	Yes	GCP service to query. Examples: 'compute', 'storage', 'cloudsql', 'gke', 'cloudrun', 'pubsub', 'firestore'. Use action='list-actions' to discover supported actions for a service.
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing. The session must have a GCP deploy attempt before inspect probes will succeed.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and openWorldHint, but the description adds critical behavioral details: secure credential fetching, direct GCP API queries, response tiers (summary, detail, raw), and required IAM roles. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is lengthy but well-structured with clear sections, front-loading the purpose and prerequisites. While dense, every part adds value, though it could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 6 parameters, no output schema, and high complexity, the description covers prerequisites, response tiers, supported services, metrics, billing, IAM roles, and examples. It is exceptionally complete for guiding correct tool usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage, but the description fully explains all parameters: session_id format, service list, action possibilities, filters JSON format, and detail/raw booleans. This compensates fully for the schema gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool inspects GCP infrastructure after a deployment attempt, distinguishing it from sibling tools like awsinspect and tfstatus. The specific verb 'inspect' and resource 'GCP infrastructure' make the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides clear prerequisites (prior deployment attempt), references convostatus to check hasDeployAttempt, and gives examples. While it doesn't explicitly state when not to use, the conditions are well outlined.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

gcpinspect_batchBatch-Inspect GCP InfrastructureA

Read-only

Inspect

BATCH INSPECTION: run up to 32 GCP inspect probes in one call. ⚠️ PREREQUISITE: Same as gcpinspect — deploy attempt required. Check convostatus for hasDeployAttempt=true before calling.

Use this when you need to check more than ~3 resources. The backend fetches Oracle credentials ONCE per batch and fans out probes against a single GCP credentials blob — a 12-resource health check is ~5–8× faster and 12× fewer Oracle round-trips than calling gcpinspect 12 times.

BUDGETS:

Up to 32 sub-probes per call (subs array length).
30s per-sub timeout; 60s total batch wall-clock.
Concurrency cap 8.
512 KB response cap: subs past the cap keep their envelope (index/service/action/ok) but have result replaced with truncated=true.

REQUIRES: session_id from convoopen response (format: sess_v2_...). Supported services: apigateway, bastion, billing, certificatemanager, cloudarmor, cloudbuild, cloudcdn, clouddeploy, clouddns, cloudfunctions, cloudkms, cloudlogging, cloudmonitoring, cloudrun, cloudsql, compute, firestore, gcs, gke, iam, identityplatform, loadbalancer, memorystore, pubsub, secretmanager, vertexai, vpc For a specific service's actions, use gcpinspect (singular) with action="list-actions" — batch is not the place for discovery. Batch responses are always summarized (no detail/raw per-sub); use singular gcpinspect when you need full metadata or raw API output for one resource.

EXAMPLES:

gcpinspect_batch(session_id=..., subs=[ {"service":"compute","action":"list-instances"}, {"service":"gke","action":"list-clusters"}, {"service":"cloudsql","action":"list-instances"}])
gcpinspect_batch(session_id=..., subs=[ {"service":"compute","action":"get-metrics","filters":"{"hours":6}"}, {"service":"cloudrun","action":"get-metrics","filters":"{"hours":6}"}])

ParametersJSON Schema

Name	Required	Description	Default
`subs`	Yes	Up to 32 sub-probes, each with {service, action, filters?, detail?, raw?}. The backend fetches credentials once per batch and fans out probes in parallel (concurrency 8, 30s per-sub timeout, 60s total wall clock). Partial failure is expected — inspect each result.ok independently.
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing. The session must have a GCP deploy attempt before inspect probes will succeed.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint=true and openWorldHint=true. The description adds extensive behavioral details: timeout limits (30s per-sub, 60s total), concurrency cap 8, 512KB response cap with truncated fields, credential fetching behavior, and summarized responses. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is lengthy but well-structured with sections (BATCH INSPECTION, PREREQUISITE, BUDGETS, EXAMPLES). It is front-loaded with key information. However, some redundancy exists (e.g., repeating concurrency limits in both narrative and bullet list). Still, the structure aids readability for a complex tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (batch operations, partial failure, credential optimization, no output schema), the description is comprehensive. It covers all critical aspects: prerequisite, usage guidance, performance benefits, limits, failure modes, response structure, and examples. No gaps identified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description fully compensates by explaining the 'subs' array structure (service, action, optional filters), listing supported services, and providing examples. It clarifies that filters are optional and can be used for metrics with time ranges.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that this is a batch inspection tool for GCP, supporting up to 32 probes per call, and distinguishes it from the singular gcpinspect by noting it is faster for multiple resources. It uses specific verbs like 'run up to 32 GCP inspect probes' and explicitly says 'Use this when you need to check more than ~3 resources.'

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to use: 'Use this when you need to check more than ~3 resources.' Also states prerequisites (deploy attempt required), notes partial failure is expected, and warns against using it for action discovery (use gcpinspect singular).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

helpWorkflow GuideA

Read-only

Inspect

Get workflow guidance for using InsideOut infrastructure tools. Call help() for a compact overview, or help(section=...) for a detailed guide. Sections: workflow, tools, examples, inspect. Responses include hints with next_actions and related_tools.

ParametersJSON Schema

Name	Required	Description	Default
`section`	No	Optional section to focus the response. One of: 'workflow', 'tools', 'examples', 'inspect'. When omitted, returns a compact overview (~500 tokens).

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that responses include hints with next_actions and related_tools, adding value beyond readOnlyHint and openWorldHint annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise 4-sentence description, front-loaded with purpose, efficiently covering usage and output.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete for a help tool with no output schema: covers invocation, sections, and response hints.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Lists valid sections (workflow, tools, examples, inspect) for the 'section' parameter, which is not described in the input schema (0% coverage).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

States specific verb 'Get workflow guidance' for infrastructure tools, distinguishes from sibling tools which are specific operational tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Describes when to call help() vs help(section=...), providing clear usage context without explicit exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

stackdiffCompare Stack VersionsA

Read-only

Inspect

Structured diff showing what would be deployed if the user ran tfdeploy now. Returns component-level changes (added/removed/modified), field-level details, and pricing deltas.

Defaults (#1392): with no version arguments, compares the LAST SUCCESSFULLY DEPLOYED version against the user's CURRENT LIVE DESIGN (the same data the UI shows). Empty baseline if nothing has been deployed or after a destroy. Pending drafts are NOT used as the target — they go stale once the user edits past them; live IR via chat history is always current.

Pass explicit from_version and/or to_version integers to compare any two saved versions (e.g. v3 → v5).

REQUIRES: session_id from convoopen response (format: sess_v2_...).

ParametersJSON Schema

Name	Required	Description
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing.
`to_version`	No	Ending stack version number for the diff. Defaults to the current draft.
`from_version`	No	Starting stack version number for the diff. Defaults to the latest applied version.

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, so the description adds no additional behavioral context (e.g., no mention of rate limits or auth beyond session_id). The description is consistent with annotations and adds minimal transparency value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is front-loaded with the core purpose, uses bullet points for key details, and every sentence is necessary. No filler words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (3 params, no output schema), the description adequately covers inputs and behavior. It mentions the diff output format (component-level, field-level, pricing deltas), though an explicit return type or structure would be helpful. No output schema exists, so the description partially fills the gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage, but description explains the default behaviors (from_version defaults to latest applied, to_version defaults to current draft) and the required session_id format (sess_v2_...). This adds meaningful context beyond the raw schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it provides a structured diff between two stack versions, showing component-level changes, field details, and pricing deltas. This verb+resource is distinct from siblings like stackversions (just lists) or tfplan (plan vs. diff).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Specifies default values for optional parameters (from_version defaults to latest applied, to_version defaults to current draft) and notes that session_id must be in 'sess_v2_...' format from convoopen. However, no explicit guidance on when to use this tool versus alternatives like stackrollback or tfdrift.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

stackrollbackRollback Stack VersionA

Idempotent

Inspect

Create a draft version by reverting to a previous version's config. Copies components, config, and pricing from the target version. If a draft already exists, updates it in-place (single-draft rule).

Use stackversions first to find available version numbers.

REQUIRES: session_id from convoopen response (format: sess_v2_...), version (target version number).

ParametersJSON Schema

Name	Required	Description	Default
`version`	Yes	Target stack version number to roll back to. Use stackversions to list available versions.
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate idempotent and non-destructive behavior. The description adds that it creates a draft (not immediate effect) and updates in-place, providing useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three short paragraphs: purpose, behavior, prerequisites. All sentences are necessary and front-loaded. No fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers behavior and prerequisites well. Lacks explicit mention of return value or confirmation, but given the lack of output schema, the description is reasonably complete for an agent to infer the outcome.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates by explaining 'version' is a target version number and 'session_id' is from convoopen with format sess_v2_..., giving essential meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: creating a draft version by reverting to a previous version's config, specifying what is copied and the single-draft rule. This is distinct from sibling tools like stackdiff and stackversions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear guidance: 'Use stackversions first to find available version numbers' and 'REQUIRES: session_id from convoopen response.' This directs the user to prerequisites, though it doesn't explicitly state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

stackversionsList Stack VersionsA

Read-only

Inspect

List all stack versions for a session (newest first). Shows version history including version number, status (draft/confirmed/applied), change summaries, and timestamps.

Use this tool to see the design history, review what changed between iterations, or find a version number to roll back to.

REQUIRES: session_id from convoopen response (format: sess_v2_...).

ParametersJSON Schema

Name	Required	Description	Default
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing.

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true. Description adds sorting (newest first) and what fields are shown, but does not disclose pagination or rate limits. Does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, front-loaded purpose, no fluff. Requirements are clearly separated. Efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with one required param and no output schema, the description covers purpose, usage, and output format. Minor omission: no mention of pagination if results are large, but generally complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, description adds meaning: it specifies the parameter must be a session_id from convoopen, format sess_v2_..., and is required. This compensates for the lack of schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it lists all stack versions for a session, newest first, and specifies the data shown (version number, status, change summaries, timestamps). This clearly distinguishes it from sibling tools like stackdiff or stackrollback.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It gives explicit use cases: see design history, review changes, find version to roll back to. Also states the required session_id format and source. Does not explicitly mention when not to use, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

submit_feedbackSubmit FeedbackAInspect

FEEDBACK: Submit feedback, bug reports, or feature requests to Luther Systems Use this tool to forward user feedback directly to the Luther Systems team. This includes bug reports, feature requests, questions, or general feedback about InsideOut. The agent itself can also use this tool to report issues it encounters during operation.

REQUIRES: session_id, category, message OPTIONAL: user_email (for follow-up), user_name, source (default: 'mcp'), initiator ('user' or 'agent')

Categories: bug_report, feature_request, general_feedback, question, security

The 'initiator' field tracks who triggered the report:

'user' — the user explicitly reported the issue or requested feedback submission
'agent' — Riley detected an issue and initiated the feedback flow

Examples:

User says 'the deploy button is broken' → submit_feedback(category='bug_report', message='...', initiator='user')
User says 'I wish it had dark mode' → submit_feedback(category='feature_request', message='...', initiator='user')
Deployment failed with Terraform error → submit_feedback(category='bug_report', message='Deployment failed: Terraform apply error on aws_alb resource — timeout waiting for ALB provisioning', initiator='agent')

ParametersJSON Schema

Name	Required	Description
`source`	No	Optional source channel: 'mcp', 'cli', or 'web'.
`message`	Yes	Feedback content. Free-form text describing the issue, request, or comment.
`category`	Yes	Feedback category. One of: bug_report, feature_request, general_feedback, question.
`initiator`	No	Optional originator: 'user' (human triggered) or 'agent' (automated).
`user_name`	No	Optional display name for attribution.
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing. Identifies the conversation the feedback is about.
`user_email`	No	Optional email address for follow-up.

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds detail about the initiator field and categories but does not reveal behavioral traits beyond what annotations already provide (openWorldHint and destructiveHint). It is consistent with annotations but adds limited extra transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with headings, bullet points, and examples. It is concise enough for a tool with 7 parameters, though slightly longer than minimal.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity of a feedback submission tool, the description covers all necessary aspects: required and optional parameters, categories, examples, and usage details. No output schema exists, but the description is self-sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description compensates by listing all parameters, their optionality, and explaining categories and the initiator field. This adds significant meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool submits feedback, bug reports, or feature requests to Luther Systems. It uses a specific verb-resource combination and is distinct from sibling tools (infrastructure and conversation tools).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use the tool (for user feedback, bug reports, feature requests, etc.), provides examples, and notes agent-initiated usage. It does not explicitly state when not to use, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfdeployDeploy InfrastructureA

Destructive

Inspect

WORKFLOW: Step 4 of 4 - Deploy infrastructure to the cloud Deploy infrastructure by starting a Terraform job for an InsideOut session. This tool initiates the actual deployment process after Terraform files have been generated. IMPORTANT: This starts a long-running job (15+ minutes). Use tfstatus to monitor progress. SINGLE-FLIGHT: only one TF job (apply/plan/destroy/drift) runs per session at a time. If another job is already in flight, tfdeploy returns tf_job_conflict with the live job_id — attach with tfstatus/tflogs instead of retrying, or pass force_new=true to override. Returns confirmation that the deployment has started. REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: plan_id (string) — Apply a previously created plan from tfplan. Preview-then-apply workflow: tfplan → tflogs (review) → tfdeploy(plan_id=...). OPTIONAL: sandbox (boolean, default false) — deploys real generated Terraform. Set to true for cheap sandbox template (testing only). OPTIONAL: ignore_drift (boolean, default false) - when true, proceeds with deploy even if infrastructure drift is detected. By default, deploys fail on drift. Use after reviewing drift details via tfdrift or tflogs. OPTIONAL: force_new (boolean, default false) - bypass the session-level single-flight guard. Use only when the existing run is provably wedged. CREDENTIAL FLOW (if credentials are missing):

Response includes a connect_url — present it to the user
Call credawait(session_id=...) to poll for credentials
When credawait returns success, retry tfdeploy Do NOT call credawait without first showing the connect URL to the user.

ParametersJSON Schema

Name	Required	Description
`plan_id`	No	Apply a previously created plan from tfplan. When set, project_id should also be provided.
`sandbox`	No	When true (default for MCP), deploys a small sandbox stack instead of the real generated Terraform. Set false to deploy the actual user stack.
`version`	No	Deploy a specific stack version number. Defaults to the current draft.
`force_new`	No	When true, bypass the session-level single-flight guard and start a new deploy even if another job is in flight. Use only when an existing run is provably wedged.
`project_id`	No	Project ID returned by tfplan. Required alongside plan_id.
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing.
`ignore_drift`	No	When true, proceed with deploy even if drift is detected on the existing stack.

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses critical behavioral traits not covered by annotations: long-running job (15+ min), single-flight constraint (only one job per session), returns tf_job_conflict on conflict, and credential re-auth flow. Annotations indicate destructiveHint=true (deployment modifies infrastructure) and openWorldHint=true (external cloud resources), which the description aligns with, so no contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with headings ('WORKFLOW', 'IMPORTANT', 'SINGLE-FLIGHT', 'OPTIONAL', 'CREDENTIAL FLOW') and bullet points, making it scannable. It front-loads the purpose and workflow position. While it is somewhat lengthy (approximately 300 words), every sentence serves a purpose, and the length is justified by the complexity of the tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (long-running, single-flight, credential flow, optional params, no output schema), the description is comprehensive. It covers the main purpose, prerequisites, parameter explanations, error handling (conflict scenario), workflow integration (tfplan->tfdeploy->tfstatus), and credential re-auth steps. No output schema exists, so the description compensates fully with context about what the tool returns (confirmation with job_id or conflict).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It explains most parameters: session_id (required), plan_id (apply a plan), sandbox (testing flag), ignore_drift (bypass drift check), force_new (override single-flight). However, 'version' and 'project_id' are not described, leaving a gap. Overall, the description adds substantial meaning beyond the schema for 5 of 7 parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states this is Step 4 of 4, deploying infrastructure by starting a Terraform job. It specifies the action ('deploy infrastructure') and distinguishes from sibling tools like tfplan (planning), tfdestroy (destroy), and tfstatus (monitoring) by positioning it as the deployment step after file generation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use (Step 4, after Terraform files generated), when-not-to-use (if another TF job is in flight, use tfstatus/tflogs instead unless force_new=true), and alternative tools (tfplan for preview, tfdestroy for teardown). It also covers prerequisites (session_id) and a detailed credential flow with connect_url and credawait.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfdestroyDestroy InfrastructureA

Destructive

Inspect

DESTROY: Tear down previously deployed infrastructure Destroys infrastructure by calling the Oracle destroy endpoint for a session that has a prior successful deployment. IMPORTANT: This starts a long-running job. Use tfstatus/tflogs to monitor progress. SINGLE-FLIGHT: only one TF job per session at a time. If another job is already in flight, tfdestroy returns tf_job_conflict with the live job_id — attach with tfstatus/tflogs, or pass force_new=true to override. REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: force_new (boolean, default false) - bypass the single-flight guard. Use only when the existing run is provably wedged. PREREQUISITE: The session must have a prior successful deployment with a project_id. After destroy completes, the session is kept for historical record but hasDeployment is set to false.

ParametersJSON Schema

Name	Required	Description	Default
`force_new`	No	When true, bypass the single-flight guard and force a new destroy even if another job is in flight.
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing. The deployed stack for this session will be torn down.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate destructive and open-world hints. The description adds critical behavioral details: long-running job, single-flight mechanism, conflict return with job_id, force_new override, and that hasDeployment becomes false after completion. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is structured with clear sections (DESTROY, IMPORTANT, SINGLE-FLIGHT, etc.) and front-loads key information. While slightly verbose, it is organized and each sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's two parameters and no output schema, the description covers purpose, prerequisites, behavioral traits, error conditions, and post-destroy state. It is sufficiently complete for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 0% description coverage. The description explains session_id format and source, and force_new's purpose (bypass single-flight guard for wedged jobs) and default (false). This adds significant meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Destroy' and the resource 'infrastructure', specifying it tears down previously deployed infrastructure via an Oracle destroy endpoint. It is distinct from sibling tools like tfdeploy, tfplan, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use context: after a successful deployment. It lists prerequisites (session_id from convoopen, prior successful deployment), warns about long-running jobs and single-flight, and mentions alternatives for monitoring (tfstatus/tflogs) and overriding (force_new). It also describes post-destroy state.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfdriftCheck Infrastructure DriftA

Idempotent

Inspect

DRIFT CHECK: Run a read-only drift detection check Checks whether deployed infrastructure has drifted from the expected Terraform state. This is a read-only operation — it does NOT modify any infrastructure. Returns job_id. Use tflogs to stream the drift check results. SINGLE-FLIGHT: only one TF job per session at a time. If another job is already in flight, tfdrift returns tf_job_conflict with the live job_id — attach with tfstatus/tflogs, or pass force_new=true to override. REQUIRES: session_id from convoopen response (format: sess_v2_...). PREREQUISITE: The session must have a prior deployment with a project_id. OPTIONAL: force_new (boolean, default false) - bypass the single-flight guard. Use only when the existing run is provably wedged. If drift is detected, the user can either fix the drift or use tfdeploy(ignore_drift=true) to proceed.

ParametersJSON Schema

Name	Required	Description	Default
`force_new`	No	When true, bypass the single-flight guard and force a new drift check even if another job is in flight.
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond annotations by detailing the read-only nature, return value (job_id), single-flight limitation, conflict response format, and the force_new override. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with headings and bullet points, and the key info is front-loaded. However, it is somewhat verbose with extra details that could be trimmed without losing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description covers inputs, prerequisites, behavior (single-flight), error handling, and next steps. It is complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates fully. It specifies that session_id must be in the format 'sess_v2_...' from convoopen, and explains force_new's purpose and when to use it.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Run a read-only drift detection check' to detect infrastructure drift. It distinguishes itself from siblings like tfdeploy and tfplan by explicitly mentioning it is read-only and providing context for when to use it.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool vs alternatives, including prerequisites (session_id, prior deployment), single-flight constraint, and how to handle conflicts. It also directs users to tflogs for results and mentions using tfdeploy with ignore_drift when drift is detected.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfgenerateGenerate TerraformA

Idempotent

Inspect

WORKFLOW: Step 3 of 4 - Generate Terraform files from completed design Generate Terraform files from an InsideOut session that has completed infrastructure design.

⚠️ PREREQUISITE: Only call this AFTER convoreply returns with terraform_ready=true in the response metadata. DO NOT call this while convoreply is still running or before terraform_ready is confirmed! If you get 'session has not reached terraform-ready state', wait for convoreply to complete first.

🎯 USE THIS TOOL WHEN: convoreply has returned with terraform_ready=true, OR the user asks to 'see the terraforms', 'generate terraform', 'show me the code', etc.

DEFAULT RESPONSE: Returns summary table + download URL (keeps code out of LLM context). FALLBACK: Set include_code: true to get full code inline if curl/unzip fails.

CRITICAL WORKFLOW (default mode):

Call this tool to get file summary and download URL
ASK the user: 'Where would you like me to save the Terraform files? Default: ./insideout-infra/'
WAIT for user confirmation before running the download command
Run the curl/unzip command with the user's chosen directory
If curl/unzip FAILS (sandbox, security, platform issues), retry with include_code: true

AFTER GENERATION: Ask user if they want to review the files and then deploy with tfdeploy

REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: include_code (boolean) - set true to return full code inline as fallback. 💡 TIP: Examine workflow.usage prompt for more context on how to properly use these tools.

ParametersJSON Schema

Name	Required	Description	Default
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing. Riley must have signaled [TERRAFORM_READY: true] before calling this tool.
`include_code`	No	When true, the response inlines the full generated Terraform source. Use as a fallback when the host can't read the on-disk archive (sandbox or security restrictions).

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond annotations (idempotentHint, destructiveHint, openWorldHint) by detailing the default summary+download URL response, the include_code fallback, the critical workflow steps (ask user for directory, wait for confirmation), and error messages for precondition failures. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with bold headings, bullet points, and emojis. Information is front-loaded with purpose and workflow step. Every sentence adds value; no redundancy. Despite length, it remains concise for the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's role in a multi-step workflow, the description covers prerequisites (session_id, terraform_ready state), default and fallback responses, post-action steps (ask user for save location, deploy with tfdeploy), and error handling. No output schema is provided, but the response type is adequately described.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates by explaining the required session_id format (sess_v2_...) and the optional include_code parameter's purpose (fallback for curl failures). While not exhaustive (no value constraints), it adds meaningful context beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool generates Terraform files from a completed InsideOut session, with specific verb 'Generate' and resource 'Terraform files'. It distinguishes itself from siblings like tfdeploy (deployment) and convoreply (conversation) by placing it as step 3 of a 4-step workflow.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states the prerequisite (call only after convoreply returns terraform_ready=true), and gives clear when-to-use conditions (user asks for 'see the terraforms' or similar). It also provides a fallback for curl/unzip failures.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tflogsFetch Deploy LogsA

Read-only

Inspect

MONITORING: Fetch Terraform deployment logs with pagination Fetches logs from a running or completed Terraform deployment job. For completed jobs: uses REST endpoint for instant retrieval (supports tail for server-side filtering). For running jobs: streams via SSE with timeout-based pagination.

PAGINATION (running jobs only): Use last_event_id from the response to fetch more:

First call: tflogs(session_id='...') → get logs + last_event_id
Next call: tflogs(session_id='...', last_event_id='...') → get NEW logs only
Repeat until complete: true in response

RESPONSE FIELDS:

logs: Array of log messages collected
last_event_id: Pass this back to get more logs (pagination cursor, SSE only)
complete: true if job finished, false if more logs may be available
total_logs: total log entries before tail truncation

REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: job_id to target a specific deployment (use tfruns to discover IDs), timeout (default 50s, max 55s), last_event_id (for pagination), tail (return only last N entries) ⚠️ CONTEXT WARNING: Deploy logs can be hundreds of lines. Use tail: 50 for completed jobs to avoid blowing up the context window.

ParametersJSON Schema

Name	Required	Description
`tail`	No	Return only the last N log entries. Use 0 (or omit) for all available entries.
`job_id`	No	Optional. Target a specific job. Use tfruns to discover job IDs. When omitted, streams the latest job for the session.
`timeout`	No	Max seconds to collect logs. Default 50, max 55.
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing.
`last_event_id`	No	Resume cursor for pagination. Pass back the last_event_id from a previous tflogs response to fetch only newer entries.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint and openWorldHint, and the description adds rich behavioral details: different mechanisms for completed vs running jobs, pagination with last_event_id, SSE streaming, timeout limits, and context window warning. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is organized into clear sections (general, pagination steps, response fields, requirements, context warning). Each sentence adds value, though slightly lengthy; front-loading the purpose helps.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description details response fields (logs, last_event_id, complete, total_logs). It covers pagination thoroughly, context window concerns, and required inputs. Nothing critical is missing for an agent to use the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, but the description fully explains every parameter: session_id required, job_id optional, timeout with defaults, last_event_id for pagination, and tail for truncation. It also provides usage patterns for pagination.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'MONITORING: Fetch Terraform deployment logs with pagination' and distinguishes between running and completed jobs. Among siblings like tfstatus and tfruns, this is the specific logs tool, so purpose is unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains prerequisites (session_id from convoopen), optional parameters, and context warnings. It implicitly tells when to use (monitoring deployments) but lacks explicit when-not-to-use or direct sibling comparisons.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfoutputsGet Deploy OutputsA

Read-only

Inspect

INSPECTION: Retrieve Terraform outputs from a completed deployment Returns structured output values (VPC IDs, endpoints, cluster names, etc.) after a successful deploy. Sensitive outputs are redacted (shown as '(sensitive)').

By default returns outputs for the latest successful deploy. Optionally specify job_id to get outputs for a specific deployment.

REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: job_id (specific deployment), lifecycle (filter by step e.g. 'cloud-provision').

ParametersJSON Schema

Name	Required	Description
`job_id`	No	Optional. Specific job ID to fetch outputs from. When omitted, returns outputs from the latest successful apply.
`lifecycle`	No	Optional Oracle deploy-step filter for the outputs. Common values are 'provision', 'cloud-provision', 'k8s-provision' — these correspond to the lifecycle stages of the deployed stack. When omitted, returns outputs from all lifecycle steps.
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint=true and openWorldHint=true. The description adds that sensitive outputs are redacted and that outputs are structured values. This provides additional context beyond annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a bold header, inline examples, and bullet-like format for requirements/options. While slightly verbose, it is efficient and front-loaded with purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having no output schema, the description covers default behavior, optional parameters, redaction, and expected return types. It fully compensates for schema gaps and provides comprehensive guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description explains each parameter: session_id format and source, job_id for specific deploy, and lifecycle with an example. This adds significant meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'INSPECTION: Retrieve Terraform outputs from a completed deployment' and lists example outputs (VPC IDs, endpoints). This distinguishes it from sibling tools like tfplan, tflogs, and tfdeploy.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explains that by default it returns outputs for the latest successful deploy, and optionally allows specifying a job_id for a specific deployment. It also notes required session_id format, but does not explicitly exclude alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfplanPreview Infrastructure PlanA

Idempotent

Inspect

PREVIEW: Run terraform plan to preview infrastructure changes Runs a terraform plan for an InsideOut session without applying any changes. This lets the user review what will be created/changed/destroyed before committing. Returns job_id, plan_id, and project_id. Use tflogs to stream the plan output. After the plan completes, use tfdeploy with plan_id to apply the exact plan. SINGLE-FLIGHT: only one TF job per session at a time. If another job is already in flight, tfplan returns tf_job_conflict with the live job_id — attach with tfstatus/tflogs, or pass force_new=true to override. REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: sandbox (boolean, default false) — plans real generated Terraform. Set to true for cheap sandbox template (testing only). OPTIONAL: force_new (boolean, default false) - bypass the single-flight guard. Use only when the existing run is provably wedged. CREDENTIAL HANDLING: Same as tfdeploy - credentials must be configured first.

ParametersJSON Schema

Name	Required	Description
`sandbox`	No	When true, plan against the sandbox stack; when false (default), plan the real generated Terraform.
`force_new`	No	When true, bypass the single-flight guard and force a new plan even if one is already running.
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate idempotent and non-destructive, which description reinforces by stating no changes applied. Discloses single-flight behavior, optional force_new override, and credential handling. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections (PREVIEW, SINGLE-FLIGHT, REQUIRES, OPTIONAL). Concise yet comprehensive, no redundant sentences. Front-loaded with key purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all necessary context: return values (job_id, plan_id, project_id), error handling (tf_job_conflict), how to get output (tflogs), and prerequisites (session_id format). Adequate given no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage, description compensates well by explaining each parameter: session_id required with format hint, sandbox (default false, testing vs real), force_new (overrides guard). Minor lack of explicit default values, but context is clear.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it runs terraform plan to preview infrastructure changes without applying. It distinguishes from sibling tools like tfdeploy (apply) and tfdestroy, and specifies the preview nature upfront with 'PREVIEW' labeling.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (preview before applying), when not (use tfdeploy instead), and provides alternatives (tflogs to stream output). Also covers single-flight rule and how to handle conflicts with force_new.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfrunsList Deploy RunsA

Read-only

Inspect

INSPECTION: List all Terraform deployment runs for a session Returns job IDs, statuses, types (apply/destroy), and timestamps for every run. Use this to see deployment history, find job IDs for log inspection, or check which deployments succeeded or failed.

REQUIRES: session_id from convoopen response (format: sess_v2_...).

ParametersJSON Schema

Name	Required	Description	Default
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing. Returns the deployment-job history (apply / destroy / plan / drift) for this session.

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses read-only inspection behavior, which aligns with annotations (readOnlyHint, openWorldHint). Adds specifics on returned data fields. Does not address potential edge cases like empty history or rate limits, but the tool is simple.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Efficient single paragraph with front-loaded 'INSPECTION' label. Each sentence adds value: purpose, output, use cases, parameter requirement. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 1 parameter and no output schema, the description adequately covers what it does, what it returns, and how to provide the parameter. No missing information needed for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds critical meaning: specifies the session_id format (sess_v2_...) and its source (convoopen response). Since schema coverage is 0%, this provides necessary guidance for correct parameter use.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists all Terraform deployment runs for a session, specifying the returned fields (job IDs, statuses, types, timestamps). This distinguishes it from sibling tools like tfstatus (current state) or tfplan (pending changes).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly lists use cases: viewing history, finding job IDs, checking success/failure. Also notes the required session_id format. However, it lacks explicit guidance on when not to use it or mention alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfstatusCheck Deploy StatusA

Read-only

Inspect

MONITORING: Quick status check for Terraform deployments Check the current status of a Terraform deployment job. Use this tool to quickly check if a deployment is running, completed, or failed. Returns job status, job_id, and other metadata without streaming logs. Use tflogs to stream the actual deployment logs. REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: job_id to target a specific deployment (use tfruns to discover IDs).

LIVENESS: The response carries two distinct timestamps:

updated_at — last semantic change (only bumped when status / drift / version actually differ). Useful for sorting deployments; NOT a per-poll heartbeat.
last_refresh_at — last successful Oracle decode (stamped on every poll where reliable reached Oracle, even if nothing in the row changed). Use this to confirm reliable is still actively talking to Oracle for a long-running RUNNING job. Absent on rows that haven't been refreshed since the column was added. 💡 TIP: Examine workflow.usage prompt for more context on how to properly use these tools.

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	No	Optional. Specific job ID to inspect. When omitted, returns the status of the latest job for the session.
`session_id`	Yes	Session ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_?token=). The suffix is part of the session credential; never strip it when summarizing.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint and openWorldHint. The description adds detailed behavioral context: it does not stream logs, explains liveness timestamps (updated_at vs last_refresh_at) and what they indicate, and notes that last_refresh_at is absent on unrefreshed rows. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately long but well-structured: a header, then purpose, usage, requirements, and detailed timestamp behavior. Every sentence serves a purpose, though the timestamp section could be slightly tighter. Still, it's efficient for a tool with behavioral nuance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema, the description covers purpose, usage, parameters, and behavioral details (timestamps, no log streaming). It does not list every possible return field but mentions 'job status, job_id, and other metadata', which is sufficient for most use cases. The tip to examine workflow.usage adds completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 0% description coverage, so description must compensate. It does: explains that session_id is required and must be in format sess_v2_..., and job_id is optional with guidance to use tfruns to discover IDs. This adds clear meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with 'MONITORING: Quick status check for Terraform deployments' and clearly states 'Check the current status of a Terraform deployment job.' It distinguishes itself from the sibling tool 'tflogs' by noting it returns only status and metadata without streaming logs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use this tool ('quickly check if a deployment is running, completed, or failed') and when not ('Use tflogs to stream the actual deployment logs'). Also provides required session_id and optional job_id sourcing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Resources

Need Help?