Skip to main content
Glama

Server Details

AI infrastructure design agent. Describe your app in plain English; Riley designs, prices, and deploys AWS or GCP infrastructure with generated Terraform.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4.7/5 across 24 of 24 tools scored.

Server CoherenceA
Disambiguation5/5

Each tool has a distinct purpose with clear boundaries: workflow tools (convoopen, convoreply, tfgenerate, tfdeploy), inspection tools (awsinspect, gcpinspect, convostatus), and utility tools (help, submit_feedback, credawait). Despite multiple 'inspect' variants, descriptions clearly differentiate between cloud providers and batch vs. single queries.

Naming Consistency4/5

Tool names follow a predictable pattern with prefix-based grouping (convo*, tf*, aws*, gcp*, stack*). The naming is mostly verb_noun (e.g., convoopen, tfdeploy, awsinspect). Minor inconsistency comes from mixing prefixes, but within each group it's fully consistent.

Tool Count4/5

24 tools is reasonable for a comprehensive infrastructure platform covering conversation, design, generation, deployment, destroy, monitoring, and inspection across multiple clouds. Slightly high but well-justified by the scope; each tool serves a specific need.

Completeness5/5

The tool set covers the full lifecycle: conversation, design, terraform generation, planning, deployment, destroy, drift detection, monitoring, and inspection for both AWS and GCP. Includes credential handling, version management, rollback, and feedback submission. No obvious gaps for the stated purpose.

Available Tools

24 tools
awsinspectInspect AWS InfrastructureA
Read-only
Inspect

INSPECTION: Inspect AWS infrastructure for a deployed project ⚠️ PREREQUISITE: This tool requires a prior deployment ATTEMPT (successful or failed). Check convostatus for hasDeployAttempt=true before calling. Works even after failed deploys to inspect orphaned resources.

Inspect deployed AWS resources after a deployment attempt. Use this tool when the user asks about the status or details of their deployed infrastructure. It fetches temporary read-only credentials securely and queries the AWS API directly.

RESPONSE TIERS (default is summary for token efficiency):

  • Summary (default): Key fields only (~500 tokens). Set detail=false, raw=false or omit both.

  • Detail: Full metadata for a specific resource. Set detail=true + resource filter.

  • Raw: Complete unprocessed API response. Set raw=true.

REQUIRES: session_id from convoopen response (format: sess_v2_...). Supported services: account, acm, alb, apigateway, apprunner, backup, bedrock, cloudfront, cloudwatchlogs, cognito, cost-explorer, dynamodb, ebs, ec2, ecs, eks, elasticache, kms, lambda, msk, opensearch, rds, route53, s3, sagemaker, secretsmanager, sqs, vpc, waf For a specific service's actions, call with action="list-actions". METRICS: Use list-metrics to discover available metrics for a service (no credentials needed). Then use get-metrics to retrieve data (auto-discovers resources). Most services return CloudWatch time-series. KMS returns key health (rotation, state). SecretsManager returns secret health (rotation, last accessed/rotated). Optional filters JSON: {"hours":6,"period":300}. BILLING: Use service=cost-explorer to inspect AWS costs. Actions: get-cost-summary (last 30 days by service, filters: {"days":7,"granularity":"DAILY"}), get-cost-forecast (projected spend through end of month), get-cost-by-tag (costs grouped by tag, filters: {"tag_key":"Environment","days":30}). Requires ce:GetCostAndUsage and ce:GetCostForecast IAM permissions.

EXAMPLES:

  • awsinspect(session_id=..., service="ec2", action="describe-instances")

  • awsinspect(session_id=..., service="cost-explorer", action="get-cost-summary")

  • awsinspect(session_id=..., service="ec2", action="get-metrics", filters="{"hours":6}")

  • awsinspect(session_id=..., service="rds", action="describe-db-instances", detail=true)

ParametersJSON Schema
NameRequiredDescriptionDefault
rawYesWhen true, returns the unprocessed AWS API response. Escape hatch for fields the summarized response doesn't surface.
actionYesOperation on the service. Examples: 'describe-instances' (ec2), 'list-buckets' (s3), 'list-keys' (kms), 'get-cost-summary' (cost-explorer), 'list-actions' (discovery), 'list-metrics' / 'get-metrics' (CloudWatch).
detailYesWhen true, returns full metadata for a single resource (requires a resource ID in filters). When false (default), returns a summary.
filtersYesOptional JSON-encoded filter object passed through to the underlying AWS API. Examples: '{"hours":6}' for metric windows, '{"days":7,"granularity":"DAILY"}' for cost queries.
serviceYesAWS service to query. Examples: 'ec2', 'rds', 'vpc', 's3', 'lambda', 'eks', 'ecs', 'cost-explorer'. Use action='list-actions' to discover the supported actions for a service.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing. The session must have an AWS deploy attempt before inspect probes will succeed.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, openWorldHint), the description explains it fetches temporary read-only credentials, queries AWS API directly, and details response tiers (summary, detail, raw). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with sections (PREREQUISITE, RESPONSE TIERS, etc.) and examples. It is comprehensive but slightly verbose; could be tighter.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete for a complex tool: covers prerequisites, response types, supported services, metrics/billing, and examples. No output schema, but description explains return formats.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds value by explaining session_id format, service list, action types, filters usage, and response tiers with examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it inspects AWS infrastructure for a deployed project, using the verb 'inspect' and specifying the resource. It distinguishes from sibling tools like awsinspect_batch and gcpinspect.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states prerequisites (prior deployment attempt, checking convostatus), when to use (user asks about infrastructure status), and provides alternatives (awsinspect_batch for batch operations).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

awsinspect_batchBatch-Inspect AWS InfrastructureA
Read-only
Inspect

BATCH INSPECTION: run up to 32 AWS inspect probes in one call. ⚠️ PREREQUISITE: Same as awsinspect — deploy attempt required. Check convostatus for hasDeployAttempt=true before calling.

Use this when you need to check more than ~3 resources. The backend fetches Oracle credentials ONCE per batch and fans out probes against a single AWS config — for a 12-resource health check this is ~5–8× faster and 12× fewer Oracle round-trips than calling awsinspect 12 times.

BUDGETS:

  • Up to 32 sub-probes per call (subs array length).

  • 30s per-sub timeout; 60s total batch wall-clock.

  • Concurrency cap 8 — sub-probes run in parallel but never saturate AWS.

  • 512 KB response cap: subs past the cap keep their envelope (index/service/action/ok) but have result replaced with truncated=true.

PARTIAL FAILURE IS EXPECTED. The response is an ordered results array; each entry has {index, service, action, ok, result, error}. Inspect each result — do NOT abort on the first error. A credential fetch failure leaves cred-less probes (list-actions, list-metrics) succeeding anyway.

REQUIRES: session_id from convoopen response (format: sess_v2_...). Supported services: account, acm, alb, apigateway, apprunner, backup, bedrock, cloudfront, cloudwatchlogs, cognito, cost-explorer, dynamodb, ebs, ec2, ecs, eks, elasticache, kms, lambda, msk, opensearch, rds, route53, s3, sagemaker, secretsmanager, sqs, vpc, waf For a specific service's actions, use awsinspect (singular) with action="list-actions" — batch is not the place for discovery. Batch responses are always summarized (no detail/raw per-sub); use singular awsinspect when you need full metadata or raw API output for one resource.

EXAMPLES:

  • awsinspect_batch(session_id=..., subs=[ {"service":"ec2","action":"describe-instances"}, {"service":"rds","action":"describe-db-instances"}, {"service":"vpc","action":"describe-vpcs"}, {"service":"s3","action":"list-buckets"}])

  • awsinspect_batch(session_id=..., subs=[ {"service":"ec2","action":"get-metrics","filters":"{"hours":6}"}, {"service":"rds","action":"get-metrics","filters":"{"hours":6}"}])

ParametersJSON Schema
NameRequiredDescriptionDefault
subsYesUp to 32 sub-probes, each with {service, action, filters?, detail?, raw?}. The backend fetches credentials once per batch and fans out probes in parallel (concurrency 8, 30s per-sub timeout, 60s total wall clock). Partial failure is expected — inspect each result.ok independently.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing. The session must have an AWS deploy attempt before inspect probes will succeed.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and openWorldHint=true. The description adds extensive behavioral context: partial failure expected, credential fetch behavior, response structure with truncation, budgets (max 32, timeouts, concurrency cap, 512KB cap). It explains that batch responses are always summarized and singular tool for detail/raw. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections: purpose, prerequisite, when to use, budgets, partial failure, required fields, supported services, examples. Front-loaded with key info. Each sentence adds value; no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description explains the response structure (ordered array with index, service, action, ok, result, error), error handling (partial failure), truncation behavior, and limitations (summarized only). Lists all supported services. Covers all necessary context for correct usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters, but the description adds significant meaning: explains that sub-probes run in parallel with concurrency 8, 30s per-sub timeout, 60s total, partial failure expected. Provides examples and explains the credential fetch optimization. The description enriches the schema's definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs batch inspection of AWS resources, running up to 32 probes in one call. It distinguishes from the singular awsinspect tool by explicitly noting when to use each: use this for >3 resources, use singular for discovery or full metadata. The verb 'Batch-Inspect' and resource 'AWS Infrastructure' are specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when to use this tool vs alternatives: 'Use this when you need to check more than ~3 resources.' Also states when not to use (for discovery or full metadata, use singular awsinspect). Includes prerequisite (deploy attempt required, check convostatus). Gives examples of valid calls.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

convoawaitAwait Pending ResponseA
Read-only
Inspect

Wait for a pending response from Riley after a convoreply timeout.

🎯 USE THIS TOOL WHEN: convoreply returned a timeout error. This allows you to continue waiting for the response without resending the message.

REQUIRES:

  • session_id: from convoopen response

OPTIONAL:

  • message_id: if known (from convoreply timeout error)

  • timeout (integer): seconds to wait. For Cursor, use 50 (default). Max 55.

Returns the same format as convoreply when successful.

ParametersJSON Schema
NameRequiredDescriptionDefault
timeoutNoMax seconds to wait. Default 50, max 55.
message_idNoOptional message ID from a convoreply timeout error. Not required for normal turn-based flow.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true, and description adds behavioral details: polling behavior, timeout constraints, return format (same as convoreply). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with emoji, sections, bullet points. Every sentence is meaningful and concise. No redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Tool has 3 parameters, all described. Context of why it exists (timeout recovery) is clear. No output schema, but return format is referenced via convoreply. Complete for its complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, providing baseline 3. Description adds extra value: explains session_id source, message_id from timeout error, timeout defaults and Cursor-specific guidance, and format requirements.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool waits for a pending response after a convoreply timeout, with specific verb and resource. It explicitly distinguishes from convoreply by stating it is used when convoreply returned a timeout error.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (after convoreply timeout error), what it does (continue waiting without resending), and provides requirements and optional parameters. No ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

convoinspectInspect Session TranscriptA
Read-only
Inspect

INSPECTION: View a session's conversation transcript and metadata Returns the full message history (user / assistant / tool turns) plus the session's meta — workflow step, cloud, deployment status, drift state.

This is the transcript-reader companion to the other read tools — combine it with: • convostatus for the live stack / config / pricing • tfruns for deployment history (apply / destroy / plan / drift) • stackversions for the stack-version ladder

Use it when a user asks 'what did I say earlier?' or you need to retrace why the session ended up where it did. Read-only; never mutates session state.

REQUIRES: session_id (format: sess_v2_...).

ParametersJSON Schema
NameRequiredDescriptionDefault
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses behavioral traits beyond annotations: it states 'Read-only; never mutates session state,' which aligns with readOnlyHint=true. It also warns about the session_id format and the importance of preserving the token suffix as part of session credentials.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured, with clear sections (INSPECTION, returns, companion tools, usage guidance, requirements). Every sentence adds value without redundancy. It is front-loaded with the primary action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having no output schema, the description provides complete context: it details exactly what is returned (message history + metadata), when to use the tool, the parameter requirement, and its role among sibling tools. No gaps are present.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the description adds significant meaning to the parameter session_id: it explains the origin (from convoopen), the exact format (sess_v2_*?token=*), and a critical behavioral note to never strip the token suffix. This goes well beyond the schema's pattern and description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'View a session's conversation transcript and metadata.' It specifies the return value includes full message history and session meta. It differentiates from siblings by naming companion tools (convostatus, tfruns, stackversions) and positioning itself as the 'transcript-reader companion.'

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit usage guidance is provided: 'Use it when a user asks what did I say earlier? or you need to retrace why the session ended up where it did.' It also indicates when not to use it (read-only, never mutates state) and suggests combining with other tools for broader context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

convoopenStart Design SessionAInspect

WORKFLOW: Step 1 of 4 - Start infrastructure design conversation Open an InsideOut V2 session and receive the assistant's intro message. The response contains a clean message from Riley (the infrastructure advisor) - display it to the user. ⚠️ Riley will ask questions - forward these to the user, DO NOT answer on their behalf. CRITICAL: This tool returns a session_id in the response metadata. You MUST use this session_id for ALL subsequent tool calls (convoreply, tfgenerate, tfdeploy, etc.). ⚠️ The session_id includes a ?token=... suffix (format: sess_v2_xxx?token=yyy) which is part of the session credential — without it, downstream tools fall back to a tokenless connect URL that 401s. Always pass session_id verbatim to subsequent tools and to the user; do NOT shorten, paraphrase, or strip the ?token= portion when summarizing the session in chat or in your own scratch notes. Use when the user mentions keywords like: 'setup my cloud infra', 'provision infrastructure', 'deploy infra', 'start insideout', 'use insideout', or similar intent to begin infra setup.

OPTIONAL: project_context (string) - General tech stack summary so Riley can skip discovery questions and jump to recommendations. The agent should confirm this with the user before sending. Include whichever apply: language/framework, databases/services, container usage, existing IaC, CI/CD platform, cloud provider, Kubernetes usage, what the project does. Example: 'Next.js 14 + TypeScript, PostgreSQL, Redis, Docker Compose, deployed to AWS ECS, GitHub Actions CI/CD, ~50k MAU'. NEVER include credentials, secrets, API keys, PII, source code, or internal URLs/IPs -- only general metadata summaries useful to a cloud architect agent. IMPORTANT: source (string) - You MUST set this to identify which IDE/tool you are. Auto-detect from your environment: 'claude-code', 'codex', 'antigravity', 'kiro', 'vscode', 'web', 'mcp'. If unsure, use the name of your IDE/tool in lowercase. Do NOT omit this — it controls the 'Open {IDE}' button on the credential connect screen. OPTIONAL: github_username (string) - GitHub username for deploy commit attribution. Pre-populates the GitHub username field on the connect page. 💡 TIP: Examine workflow.usage prompt for more context on how to properly use these tools.

ParametersJSON Schema
NameRequiredDescriptionDefault
sourceNoIDE/tool identifier so the connect screen can show the right 'Open {IDE}' button. Use lowercase: 'claude-code', 'codex', 'antigravity', 'kiro', 'cursor', 'vscode', 'windsurf', 'zed', 'aider', 'copilot', 'web', 'mcp'.
github_usernameNoGitHub username used for deploy commit attribution; pre-fills the GitHub username field on the connect screen.
project_contextNoOptional tech-stack summary so Riley can skip discovery questions (e.g. 'Next.js 14 + Postgres on AWS, ~50k MAU'). No PII, secrets, file paths, or source code — only general metadata useful to a cloud architect.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses critical behavior: returns a session_id with '?token=...' suffix, warns that omitting it causes 401 errors, and states that Riley asks questions that must be forwarded. These details go beyond the annotations (openWorldHint, destructiveHint) and add actionable transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with sections, bold warnings, and a tip; front-loads critical workflow info. However, some repetition about the token suffix could be condensed. Still efficiently communicates necessary details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema and 3 parameters, the description fully covers purpose, usage, parameters, and behavioral constraints, tying into the larger workflow and sibling tools. Meets all needs for an AI agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description enriches each parameter: source has auto-detection instructions, project_context includes security prohibitions and examples, github_username explains pre-filling. This adds significant context beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the verb 'Open' and resource 'InsideOut V2 session', and positions it as 'Step 1 of 4' in a workflow, distinguishing it from sibling tools like convoreply and tfgenerate. It states the action and expected output (assistant's intro message).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly lists triggering user intents (e.g., 'setup my cloud infra'), instructs the agent to forward questions to the user, and mentions the need to examine workflow.usage prompt. It also clarifies when not to answer on behalf of the user, providing clear usage boundaries.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

convoreplySend MessageAInspect

WORKFLOW: Step 2 of 4 - Continue infrastructure design conversation Send a user message to the active InsideOut session and receive the assistant reply. The response contains a clean message from Riley - display it to the user.

⚠️ CRITICAL: DO NOT answer Riley's questions yourself! Forward questions to the user and wait for their response. NEVER fabricate or assume the user's answer, even if you think you know what they would say. Examples of questions Riley asks that YOU MUST forward to the user:

  • 'Any questions or tweaks to these details?'

  • 'Ready for the cost estimate?'

  • 'Do you want to change the stack/config?'

  • 'Ready to proceed to Terraform?' When Riley asks ANY question, STOP and wait for the user's answer!

📋 WORKFLOW PHASES: The typical flow is conversation → tfgenerate → tfdeploy When terraform_ready=true appears in THIS tool's response, THEN you can call tfgenerate. ⚠️ DO NOT call tfgenerate until this tool returns! Wait for the response first.

🎯 KEY SIGNALS IN RESPONSE:

  • [TERRAFORM_READY: true] → NOW you can call tfgenerate

  • [[BUTTON_TF_APPLY: ...]] → Deployment is ready! Ask user if they want to deploy, then use tfdeploy

  • [[BUTTON_TF_DESTROY: ...]] → User confirmed destroy intent! Ask user to confirm, then use tfdestroy

  • [[BUTTON_TF_PLAN: ...]] → User wants to preview changes! Use tfplan to run a plan, then tfdeploy with plan_id to apply

REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: timeout (integer) - seconds to wait for response. For Cursor, use 50 (default). Max 55. OPTIONAL: project_context (string) - Only pass genuinely NEW project details the user shares after convoopen. Do NOT resend context already provided in convoopen — Riley remembers it. Do NOT scan files or directories to gather this — only use what the user explicitly tells you. Example: user reveals a new constraint like 'we also need HIPAA compliance' mid-conversation. 💡 TIP: Use convostatus to check progress anytime. Examine workflow.usage prompt for more guidance.

ParametersJSON Schema
NameRequiredDescriptionDefault
textYesUser message to send to Riley. Forward verbatim what the user said — do not summarize or rewrite.
retryNoWhen true, re-send the most recent user turn instead of submitting a new one.
timeoutNoMax seconds to wait for Riley's response. Default 50, max 55.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
project_contextNoOnly NEW project details revealed after convoopen (e.g. user mentions a new constraint mid-conversation). Don't re-send context already provided in convoopen. No PII or secrets.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate openWorldHint and destructiveHint, but the description adds significant behavioral context: workflow steps, critical warnings about not fabricating answers, key signals in the response, and parameter behaviors. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long but well-structured with sections (WORKFLOW, CRITICAL, KEY SIGNALS). Every section serves a purpose, though the word count could be slightly reduced without losing clarity. Front-loading key signals earlier would improve conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and 5 parameters, the description is very complete: covers workflow context, response signals, parameter details, and references sibling tools like convostatus. It leaves no ambiguity for an agent to misuse the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with descriptions, but the description adds nuance: explaining the session_id token suffix, timeout defaults, project_context constraints, and the text forwarding rule. This adds value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's role as Step 2 of 4 in the workflow, sending a user message to an InsideOut session and receiving the reply. It distinguishes itself from siblings like convoopen (starts session) and convostatus (checks progress) by specifying its position and function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly provides when to use (after convoopen, before tfgenerate), when not to (do not answer Riley's questions), and workflow phases. It also mentions alternatives like convostatus for checking progress and gives critical warnings about forwarding user questions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

convostatusView Session Stack StatusA
Read-only
Inspect

INSPECTION: View the current infrastructure stack for a session Returns the current state of the user's infrastructure design including:

Components - Selected infrastructure services (VPC, databases, caching, etc.) • Shows what services the user has chosen (e.g., PostgreSQL, Redis, S3) • Includes architecture decisions (EKS vs EC2, monolith vs microservices)

Config - Configuration details for each component • Database sizes, replica counts, storage amounts • Cache settings, queue configurations • Backup schedules and retention policies

Pricing - Cost estimates (when available) • Monthly cost estimates per component • Total estimated monthly spend

Phase Indicators - Where the user is in the design workflow: • hasComponents: User has selected infrastructure services • hasConfig: User has configured component details • hasPricing: Cost estimates have been calculated • hasTerraform: Ready for Terraform generation

Use this tool when the user asks 'what is my current stack?', 'show my infrastructure', 'what have I selected?', or similar questions about their design progress. REQUIRES: session_id from convoopen response (format: sess_v2_...).

ParametersJSON Schema
NameRequiredDescriptionDefault
job_idNoOptional. Specific job ID to inspect. When omitted, returns the status of the latest job for the session.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds valuable context about the return content (components, config, pricing, phase indicators) without contradicting the annotations. It is transparent about what the tool inspects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with bullet points and a clear purpose statement at the top. It is slightly verbose but all sentences add value. No redundancy, and it is organized for quick scanning.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately explains the return value (components, config, pricing, phase indicators). It covers the key aspects of the tool's output and usage context, making it complete for an inspection tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not add significant semantics beyond the schema descriptions for the two parameters. The mention of 'REQUIRES: session_id from convoopen' slightly reinforces but does not add new meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool views the current infrastructure stack for a session. It starts with 'INSPECTION: View the current infrastructure stack for a session' and elaborates with bullet points on what is returned. This distinguishes it from sibling tools like convoinspect which likely inspect other aspects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit usage guidance is provided: 'Use this tool when the user asks...' with example queries. It also states the prerequisite requirement for session_id from convoopen response. This is comprehensive and helps the agent decide when to invoke this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

credawaitAwait Cloud CredentialsA
Idempotent
Inspect

Wait for the user to securely connect their cloud account and subscribe to Luther Systems. Polls until credentials appear on the session.

🎯 USE THIS TOOL WHEN: tfdeploy returns an 'auth_required', 'no_credentials', or 'credentials_expired' error.

The user needs to visit the connect URL to:

  1. Connect their cloud credentials (AWS or GCP)

  2. Sign up and subscribe to a Luther Systems plan (required for deployment)

This secure connection allows InsideOut to deploy and manage infrastructure in the user's cloud account on their behalf. Credentials are handled securely and only used for deployment and management sessions.

WORKFLOW:

  1. FIRST: Present the connect URL and explanation to the user (from the tfdeploy error response)

  2. THEN: Call this tool to begin polling for credentials

  3. The user opens the URL in their browser to subscribe and add credentials

  4. When credentials are found, inform the user and call tfdeploy to deploy

IMPORTANT: Do NOT call this tool without first showing the connect URL to the user. The user needs to see the URL to complete the process.

REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: cloud ('aws' or 'gcp'), timeout (integer, seconds to wait, default 300, max 600).

ParametersJSON Schema
NameRequiredDescriptionDefault
cloudNoCloud provider whose credentials are awaited: 'aws' or 'gcp'. Defaults to 'aws'.
timeoutNoMax seconds to wait for the user to complete the browser-based credential connect flow. Default 300, max 600.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds that the tool polls until credentials appear, requires session_id, and is non-destructive. Annotations already indicate idempotentHint=true and destructiveHint=false, so the description aligns and adds context about secure credential handling and the URL flow. Could mention blocking behavior but overall good.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is slightly long but well-structured with emoji, sections, and bullet points. It front-loads the purpose and usage. Some repetition of workflow could be trimmed, but remains clear and organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a polling tool with no output schema, the description fully covers prerequisites (session_id), workflow (present URL, call tool, wait, then call tfdeploy), important notes, and required parameters. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds valuable context: session_id format and warning not to strip suffix, cloud enum defaults, timeout defaults and max. These go beyond the schema's basic descriptions and ensure correct usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Explicitly states the tool waits for user to connect cloud account and subscribe, with a clear verb ('Wait') and resource ('Cloud Credentials'). It distinguishes from siblings like convoawait and awsinspect by specifying it is used after auth_required errors from tfdeploy.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use (when tfdeploy returns 'auth_required', 'no_credentials', or 'credentials_expired') and workflow steps, including the important instruction not to call before showing the connect URL. Alternatives are implied by the context of sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

gcpinspectInspect GCP InfrastructureA
Read-only
Inspect

INSPECTION: Inspect GCP infrastructure for a deployed project ⚠️ PREREQUISITE: This tool requires a prior deployment ATTEMPT (successful or failed). Check convostatus for hasDeployAttempt=true before calling. Works even after failed deploys to inspect orphaned resources.

Inspect deployed GCP resources after a deployment attempt. Use this tool when the user asks about the status or details of their deployed GCP infrastructure. It fetches temporary read-only credentials securely and queries the GCP API directly.

RESPONSE TIERS (default is summary for token efficiency):

  • Summary (default): Key fields only (~500 tokens). Set detail=false, raw=false or omit both.

  • Detail: Full metadata for a specific resource. Set detail=true + resource filter.

  • Raw: Complete unprocessed API response. Set raw=true.

REQUIRES: session_id from convoopen response (format: sess_v2_...). Supported services: apigateway, bastion, billing, certificatemanager, cloudarmor, cloudbuild, cloudcdn, clouddeploy, clouddns, cloudfunctions, cloudkms, cloudlogging, cloudmonitoring, cloudrun, cloudsql, compute, firestore, gcs, gke, iam, identityplatform, loadbalancer, memorystore, pubsub, secretmanager, vertexai, vpc For a specific service's actions, call with action="list-actions".

METRICS: Use list-metrics to see available Cloud Monitoring metrics for any service (no credentials needed — progressive disclosure). Use get-metrics to retrieve time-series data. Optional filters JSON: {"hours":6,"period":300}. Label breakdowns: Cloud Functions (by status), Load Balancer/API Gateway (by response_code_class), Cloud CDN (by cache_result). Secret Manager get-metrics returns operational health (version count, replication, create time) — no time-series. Bastion is an alias for Compute Engine metrics (SSH connection count not available as a GCP metric). BILLING: Use service=billing to inspect GCP billing. Actions: get-billing-info (check if billing enabled, which billing account), get-budgets (list budget alerts for the project — auto-fetches billing account). Requires roles/billing.viewer IAM role. Required IAM roles: Monitoring Viewer (roles/monitoring.viewer) for metrics, Secret Manager Viewer (roles/secretmanager.viewer) for secret health, Billing Viewer (roles/billing.viewer) for billing.

EXAMPLES:

  • gcpinspect(session_id=..., service="compute", action="list-instances")

  • gcpinspect(session_id=..., service="gke", action="list-clusters")

  • gcpinspect(session_id=..., service="cloudsql", action="get-metrics", filters="{"hours":6}")

  • gcpinspect(session_id=..., service="billing", action="get-billing-info")

ParametersJSON Schema
NameRequiredDescriptionDefault
rawYesWhen true, returns the unprocessed GCP API response. Escape hatch for fields the summarized response doesn't surface.
actionYesOperation on the service. Examples: 'list-instances' (compute), 'list-buckets' (storage), 'list-clusters' (gke), 'list-actions' (discovery), 'list-metrics' / 'get-metrics' (Cloud Monitoring).
detailYesWhen true, returns full metadata for a single resource. When false (default), returns a summary.
filtersYesOptional JSON-encoded filter object passed through to the underlying GCP API. Examples: '{"hours":6}' for metric windows, '{"zone":"us-central1-a"}' for zone-scoped queries.
serviceYesGCP service to query. Examples: 'compute', 'storage', 'cloudsql', 'gke', 'cloudrun', 'pubsub', 'firestore'. Use action='list-actions' to discover supported actions for a service.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing. The session must have a GCP deploy attempt before inspect probes will succeed.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and openWorldHint=true. Description adds significant behavioral context: uses temporary read-only credentials, queries GCP API directly, details response tiers, session_id format, IAM roles required, and specific behaviors for metrics and billing. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Front-loaded with prerequisite warning and response tiers. Structured with clear sections, emojis for readability. Every sentence adds unique value—examples, IAM roles, edge cases (bastion alias, no time-series for Secret Manager). Efficient despite length.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all necessary context: prerequisites, response formats, supported services, action patterns, metrics, billing, IAM roles, and multiple examples. No output schema, but response tiers are described. Comprehensive for a complex tool with 6 required parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage 100% already, but description adds layered meaning: explains response tiers (summary/detail/raw), lists supported services and action conventions, details session_id format and suffix, gives filter examples for metrics. Significantly enhances understanding beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Inspect GCP infrastructure' with specific verb 'inspect' and resource 'GCP infrastructure'. Distinguishes from sibling 'awsinspect' by being GCP-specific. Lists many services and actions, making purpose unmistakable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states prerequisite: requires prior deployment attempt and to check convostatus. Provides context for when to use (user asks about status/details). Does not explicitly list when not to use, but sibling tools imply alternatives. Clear usage guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

gcpinspect_batchBatch-Inspect GCP InfrastructureA
Read-only
Inspect

BATCH INSPECTION: run up to 32 GCP inspect probes in one call. ⚠️ PREREQUISITE: Same as gcpinspect — deploy attempt required. Check convostatus for hasDeployAttempt=true before calling.

Use this when you need to check more than ~3 resources. The backend fetches Oracle credentials ONCE per batch and fans out probes against a single GCP credentials blob — a 12-resource health check is ~5–8× faster and 12× fewer Oracle round-trips than calling gcpinspect 12 times.

BUDGETS:

  • Up to 32 sub-probes per call (subs array length).

  • 30s per-sub timeout; 60s total batch wall-clock.

  • Concurrency cap 8.

  • 512 KB response cap: subs past the cap keep their envelope (index/service/action/ok) but have result replaced with truncated=true.

PARTIAL FAILURE IS EXPECTED. The response is an ordered results array; each entry has {index, service, action, ok, result, error}. Inspect each result — do NOT abort on the first error. A credential fetch failure leaves cred-less probes (list-actions, list-metrics) succeeding anyway.

REQUIRES: session_id from convoopen response (format: sess_v2_...). Supported services: apigateway, bastion, billing, certificatemanager, cloudarmor, cloudbuild, cloudcdn, clouddeploy, clouddns, cloudfunctions, cloudkms, cloudlogging, cloudmonitoring, cloudrun, cloudsql, compute, firestore, gcs, gke, iam, identityplatform, loadbalancer, memorystore, pubsub, secretmanager, vertexai, vpc For a specific service's actions, use gcpinspect (singular) with action="list-actions" — batch is not the place for discovery. Batch responses are always summarized (no detail/raw per-sub); use singular gcpinspect when you need full metadata or raw API output for one resource.

EXAMPLES:

  • gcpinspect_batch(session_id=..., subs=[ {"service":"compute","action":"list-instances"}, {"service":"gke","action":"list-clusters"}, {"service":"cloudsql","action":"list-instances"}])

  • gcpinspect_batch(session_id=..., subs=[ {"service":"compute","action":"get-metrics","filters":"{"hours":6}"}, {"service":"cloudrun","action":"get-metrics","filters":"{"hours":6}"}])

ParametersJSON Schema
NameRequiredDescriptionDefault
subsYesUp to 32 sub-probes, each with {service, action, filters?, detail?, raw?}. The backend fetches credentials once per batch and fans out probes in parallel (concurrency 8, 30s per-sub timeout, 60s total wall clock). Partial failure is expected — inspect each result.ok independently.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing. The session must have a GCP deploy attempt before inspect probes will succeed.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, openWorldHint), description adds operational details: backend fetches credentials once, fans out probes with concurrency 8, 30s per-sub timeout, 60s wall clock, 512KB response cap, and partial failure behavior (credential fetch failure leaves cred-less probes succeeding). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with headers, bullet points, and warnings. Front-loaded key purpose. Some redundancy (subs schema repeated in examples) but overall efficient for the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, description explains response format (ordered array with index/service/action/ok/result/error), truncation behavior, and partial failure. Covers prerequisites, limits, and examples. Complete for a batch inspection tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% so baseline is 3. Description adds value with examples showing subs array structure and session_id format, plus practical usage notes like warning about ?token= suffix. Extra context justifies a 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with 'BATCH INSPECTION: run up to 32 GCP inspect probes in one call', clearly specifying the verb (batch-inspect), resource (GCP infrastructure), and scope (up to 32). It distinguishes from sibling gcpinspect by contrasting batch vs singular.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance: 'Use this when you need to check more than ~3 resources', prerequisite check for hasDeployAttempt, and when not to use: 'For a specific service's actions, use gcpinspect (singular)'. Also warns about partial failure and not aborting on first error.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

helpWorkflow GuideA
Read-only
Inspect

Get workflow guidance for using InsideOut infrastructure tools. Call help() for a compact overview, or help(section=...) for a detailed guide. Sections: workflow, tools, examples, inspect. Responses include hints with next_actions and related_tools.

ParametersJSON Schema
NameRequiredDescriptionDefault
sectionNoOptional section to focus the response. One of: 'workflow', 'tools', 'examples', 'inspect'. When omitted, returns a compact overview (~500 tokens).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes response structure: 'Responses include hints with next_actions and related_tools.' Annotations already indicate readOnlyHint=true and openWorldHint=true, so description adds useful context without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: first states purpose, second details usage, third describes response hints. No wasted words, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete information for a help tool: purpose, usage modes, section options, and response content (hints). No output schema needed as description explains return type. Covers all necessary context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage for the single optional parameter 'section'. Description mentions the sections but does not add meaning beyond the schema's enum and description. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Get workflow guidance for using InsideOut infrastructure tools.' and distinguishes from sibling tools by being a meta-guidance tool. It specifies the verb 'get' and resource 'workflow guidance'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit instructions: 'Call help() for a compact overview, or help(section=...) for a detailed guide.' Lists available sections, providing clear when-to-use guidance. Distinguishes from siblings which are specific infrastructure tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

stackdiffCompare Stack VersionsA
Read-only
Inspect

Structured diff showing what would be deployed if the user ran tfdeploy now. Returns component-level changes (added/removed/modified), field-level details, and pricing deltas.

Defaults (#1392): with no version arguments, compares the LAST SUCCESSFULLY DEPLOYED version against the user's CURRENT LIVE DESIGN (the same data the UI shows). Empty baseline if nothing has been deployed or after a destroy. Pending drafts are NOT used as the target — they go stale once the user edits past them; live IR via chat history is always current.

Pass explicit from_version and/or to_version integers to compare any two saved versions (e.g. v3 → v5).

REQUIRES: session_id from convoopen response (format: sess_v2_...).

ParametersJSON Schema
NameRequiredDescriptionDefault
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
to_versionNoEnding stack version number for the diff. Defaults to the current draft.
from_versionNoStarting stack version number for the diff. Defaults to the latest applied version.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and openWorldHint=true. The description adds that the tool uses 'live IR via chat history' and that pending drafts are not used as the target, providing context beyond the annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (a few sentences) with the main purpose stated first, followed by details on defaults and requirements. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 3 parameters (1 required), no output schema, and the tool's complexity, the description covers defaults, version behavior, and session requirements. Could be slightly improved with an example output but is sufficiently complete for tool selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All parameters have schema descriptions (100% coverage). The description adds clarification on defaults (from_version defaults to last applied, to_version defaults to current draft) and the exact format of session_id, enhancing meaning beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description specifies a 'structured diff showing what would be deployed', including component-level changes, field-level details, and pricing deltas. This clearly distinguishes it from sibling tools like stackversions (list versions) and tfplan (full plan).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains default behavior (comparing last deployed vs current live design), explicit version arguments for arbitrary comparison, and the required session_id. It mentions the empty baseline scenario but does not explicitly state when not to use (e.g., if no deployment history).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

stackrollbackRollback Stack VersionA
Idempotent
Inspect

Create a draft version by reverting to a previous version's config. Copies components, config, and pricing from the target version. If a draft already exists, updates it in-place (single-draft rule).

Use stackversions first to find available version numbers.

REQUIRES: session_id from convoopen response (format: sess_v2_...), version (target version number).

ParametersJSON Schema
NameRequiredDescriptionDefault
versionYesTarget stack version number to roll back to. Use stackversions to list available versions.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate idempotent and non-destructive behavior. The description adds details: copies components/config/pricing, and updates draft in-place if it exists (single-draft rule). This aligns with annotations and provides behavioral context beyond them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: three sentences plus a usage note, front-loaded with the main action. Every sentence adds essential information without fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description covers prerequisites, behavior, and parameter details. It could explicitly state what is returned (e.g., draft version ID), but overall it is sufficiently complete for a tool with good annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining the session_id format and warning not to strip the token suffix, and clarifies that version is from stackversions. This provides useful semantics beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Create a draft version by reverting to a previous version's config.' It specifies the verb (create), resource (draft version), and distinguishes it from sibling tools like stackversions (list versions) and stackdiff (diff).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises using stackversions first to find version numbers and mentions required session_id from convoopen. This provides clear context on when to use the tool and prerequisites, though it does not explicitly state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

stackversionsList Stack VersionsA
Read-only
Inspect

List all stack versions for a session (newest first). Shows version history including version number, status (draft/confirmed/applied), change summaries, and timestamps.

Use this tool to see the design history, review what changed between iterations, or find a version number to roll back to.

REQUIRES: session_id from convoopen response (format: sess_v2_...).

ParametersJSON Schema
NameRequiredDescriptionDefault
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the agent knows it's safe. The description adds behavioral context: lists newest first, includes status and change summaries. It also mentions the required input format. No contradictions. The description provides useful behavioral details beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: three sentences plus a requirements line. It front-loads the primary purpose, then adds details and usage guidance. Every sentence adds value, with no redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that there is no output schema, the description adequately explains what the tool returns: version number, status, change summaries, timestamps. The single parameter is fully documented. Complexity is low, and the description covers all necessary context for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a detailed description for session_id, including pattern and instruction not to strip token. The description reinforces that the session_id must come from convoopen and specifies the format. This adds significant value beyond the schema by explaining the source and handling.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List all stack versions for a session (newest first)' using a specific verb and resource. It also lists the information included: version number, status, change summaries, and timestamps. This distinguishes it from sibling tools like stackdiff (compares versions) and stackrollback (rolls back).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this tool to see the design history, review what changed between iterations, or find a version number to roll back to.' This provides clear use cases. It also notes a prerequisite (session_id from convoopen). However, it does not specify when not to use or list alternatives, but given the narrow scope, this is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

submit_feedbackSubmit FeedbackAInspect

FEEDBACK: Submit feedback, bug reports, or feature requests to Luther Systems Use this tool to forward user feedback directly to the Luther Systems team. This includes bug reports, feature requests, questions, or general feedback about InsideOut. The agent itself can also use this tool to report issues it encounters during operation.

REQUIRES: session_id, category, message OPTIONAL: user_email (for follow-up), user_name, source (default: 'mcp'), initiator ('user' or 'agent')

Categories: bug_report, feature_request, general_feedback, question, security

The 'initiator' field tracks who triggered the report:

  • 'user' — the user explicitly reported the issue or requested feedback submission

  • 'agent' — Riley detected an issue and initiated the feedback flow

Examples:

  • User says 'the deploy button is broken' → submit_feedback(category='bug_report', message='...', initiator='user')

  • User says 'I wish it had dark mode' → submit_feedback(category='feature_request', message='...', initiator='user')

  • Deployment failed with Terraform error → submit_feedback(category='bug_report', message='Deployment failed: Terraform apply error on aws_alb resource — timeout waiting for ALB provisioning', initiator='agent')

ParametersJSON Schema
NameRequiredDescriptionDefault
sourceNoOptional source channel: 'mcp', 'cli', or 'web'.
messageYesFeedback content. Free-form text describing the issue, request, or comment.
categoryYesFeedback category. One of: bug_report, feature_request, general_feedback, question.
initiatorNoOptional originator: 'user' (human triggered) or 'agent' (automated).
user_nameNoOptional display name for attribution.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing. Identifies the conversation the feedback is about.
user_emailNoOptional email address for follow-up.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate openWorldHint=true and destructiveHint=false. The description adds behavioral details: it forwards feedback to the Luther Systems team, requires certain parameters, and explains the initiator field. There is no contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with sections (FEEDBACK, REQUIRES, OPTIONAL, Categories, Initiator, Examples) and front-loaded. It is somewhat verbose but not wasteful. Minor redundancy (e.g., 'This includes...') prevents a perfect score.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters, no output schema, and moderate complexity, the description covers all necessary aspects: parameter explanations, usage patterns, categories, initiator semantics, and examples. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage (all 7 parameters described). The description adds value by listing required vs optional parameters, explaining the initiator field with examples, and detailing categories. This goes beyond the schema's descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is for submitting feedback, bug reports, or feature requests to Luther Systems. It distinguishes itself from sibling tools (none of which are feedback-related) and uses a specific verb-resource combination (submit feedback).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use the tool (forward user feedback or report agent-detected issues) and includes examples for both user and agent initiators. It does not explicitly state when not to use it, but the context is clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfdeployDeploy InfrastructureA
Destructive
Inspect

WORKFLOW: Step 4 of 4 - Deploy infrastructure to the cloud Deploy infrastructure by starting a Terraform job for an InsideOut session. This tool initiates the actual deployment process after Terraform files have been generated. IMPORTANT: This starts a long-running job (15+ minutes). Use tfstatus to monitor progress. SINGLE-FLIGHT: only one TF job (apply/plan/destroy/drift) runs per session at a time. If another job is already in flight, tfdeploy returns tf_job_conflict with the live job_id — attach with tfstatus/tflogs instead of retrying, or pass force_new=true to override. Returns confirmation that the deployment has started. REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: plan_id (string) — Apply a previously created plan from tfplan. Preview-then-apply workflow: tfplan → tflogs (review) → tfdeploy(plan_id=...). OPTIONAL: sandbox (boolean, default false) — deploys real generated Terraform. Set to true for cheap sandbox template (testing only). OPTIONAL: ignore_drift (boolean, default false) - when true, proceeds with deploy even if infrastructure drift is detected. By default, deploys fail on drift. Use after reviewing drift details via tfdrift or tflogs. OPTIONAL: force_new (boolean, default false) - bypass the session-level single-flight guard. Use only when the existing run is provably wedged. CREDENTIAL FLOW (if credentials are missing):

  1. Response includes a connect_url — present it to the user

  2. Call credawait(session_id=...) to poll for credentials

  3. When credawait returns success, retry tfdeploy Do NOT call credawait without first showing the connect URL to the user.

ParametersJSON Schema
NameRequiredDescriptionDefault
plan_idNoApply a previously created plan from tfplan. When set, project_id should also be provided.
sandboxNoWhen true (default for MCP), deploys a small sandbox stack instead of the real generated Terraform. Set false to deploy the actual user stack.
versionNoDeploy a specific stack version number. Defaults to the current draft.
force_newNoWhen true, bypass the session-level single-flight guard and start a new deploy even if another job is in flight. Use only when an existing run is provably wedged.
project_idNoProject ID returned by tfplan. Required alongside plan_id.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
ignore_driftNoWhen true, proceed with deploy even if drift is detected on the existing stack.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond the annotations (destructiveHint, openWorldHint). It notes that the job runs 15+ minutes, enforces single-flight (only one TF job per session), returns a conflict job_id when blocked, and explains credential handling with connect_url and credawait. This fully informs the agent of what to expect.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is fairly long but well-structured: it starts with workflow context, then highlights critical notes (long-running, single-flight), followed by optional parameters and credential flow. Every sentence adds value, though some redundancy exists (e.g., repeating 'single-flight' in multiple places). Overall, it's efficient for the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description covers return values (confirmation, connect_url, conflict job_id). It includes all parameters, workflow steps, restrictions, and error handling. The credential flow is fully documented. An agent can use this information to correctly invoke the tool and handle responses.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, baseline is 3, but the description enriches each parameter with usage context. For example, 'plan_id' is linked to the preview-then-apply workflow, 'sandbox' explains real vs test deployment, 'ignore_drift' clarifies when to use after reviewing drift, and 'force_new' describes the single-flight bypass. This goes well beyond the schema's brief descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Deploy infrastructure by starting a Terraform job for an InsideOut session.' It also places it as 'Step 4 of 4' in a workflow, distinguishing it from sibling tools like tfplan, tflogs, and tfstatus. The verb 'deploy' and resource 'infrastructure' are specific, and the scope is well-defined.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use (step 4 after file generation) and when-not-to-use (if another job is in flight, use tfstatus/tflogs). It offers alternatives like 'pass force_new=true' for wedged runs and details the preview-then-apply workflow using tfplan and tflogs. Credential flow is also outlined, guiding the agent on using credawait after showing connect_url.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfdestroyDestroy InfrastructureA
Destructive
Inspect

DESTROY: Tear down previously deployed infrastructure Destroys infrastructure by calling the Oracle destroy endpoint for a session that has a prior successful deployment. IMPORTANT: This starts a long-running job. Use tfstatus/tflogs to monitor progress. SINGLE-FLIGHT: only one TF job per session at a time. If another job is already in flight, tfdestroy returns tf_job_conflict with the live job_id — attach with tfstatus/tflogs, or pass force_new=true to override. REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: force_new (boolean, default false) - bypass the single-flight guard. Use only when the existing run is provably wedged. PREREQUISITE: The session must have a prior successful deployment with a project_id. After destroy completes, the session is kept for historical record but hasDeployment is set to false.

ParametersJSON Schema
NameRequiredDescriptionDefault
force_newNoWhen true, bypass the single-flight guard and force a new destroy even if another job is in flight.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing. The deployed stack for this session will be torn down.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant context beyond annotations: it's a long-running job, single-flight behavior, return of tf_job_conflict, and post-destroy session state (hasDeployment set to false). Annotations only indicate destructiveness, so description carries full behavioral disclosure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear headings and bullet points, front-loaded with the purpose, and every sentence adds value. No redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description covers prerequisites, post-destroy state, monitoring references, and error handling (tf_job_conflict). It is complete for a destructive tool with these usage patterns.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds critical meaning: session_id must be exact from convoopen with ?token= suffix, never stripped; force_new explained as bypass guard only when wedged. This goes beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool's purpose: 'DESTROY: Tear down previously deployed infrastructure' and specifies it calls the Oracle destroy endpoint. It clearly distinguishes from siblings like tfdeploy and tflogs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use guidance (after successful deployment, requires session_id) and when-not-to-use (single-flight conflict). It names alternatives (tfstatus/tflogs for monitoring) and conditions for force_new override.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfdriftCheck Infrastructure DriftA
Idempotent
Inspect

DRIFT CHECK: Run a read-only drift detection check Checks whether deployed infrastructure has drifted from the expected Terraform state. This is a read-only operation — it does NOT modify any infrastructure. Returns job_id. Use tflogs to stream the drift check results. SINGLE-FLIGHT: only one TF job per session at a time. If another job is already in flight, tfdrift returns tf_job_conflict with the live job_id — attach with tfstatus/tflogs, or pass force_new=true to override. REQUIRES: session_id from convoopen response (format: sess_v2_...). PREREQUISITE: The session must have a prior deployment with a project_id. OPTIONAL: force_new (boolean, default false) - bypass the single-flight guard. Use only when the existing run is provably wedged. If drift is detected, the user can either fix the drift or use tfdeploy(ignore_drift=true) to proceed.

ParametersJSON Schema
NameRequiredDescriptionDefault
force_newNoWhen true, bypass the single-flight guard and force a new drift check even if another job is in flight.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description discloses read-only/non-destructive nature (matching annotations), single-flight conflict return, and drift detection outcome handling. Adds context beyond annotations, e.g., that force_new should be used only when the current run is provably wedged.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is moderately long but well-structured with bold headings and bullet points. Every sentence adds value, though it could be slightly more concise. Front-loaded with purpose and key constraints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description explains the return value (job_id) and how to retrieve results (tflogs). It covers all prerequisites, constraints (single-flight), and post-drift actions. For a two-parameter tool, this is thoroughly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds meaningful guidance: for 'force_new' it explains when to use (bypass single-flight guard when wedged), and for 'session_id' it emphasizes exact formatting and retaining the token suffix. This exceeds schema-only info.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'DRIFT CHECK: Run a read-only drift detection check' and explains it checks deployed infrastructure against Terraform state. It distinguishes itself from sibling tools (e.g., tfdeploy, tflogs) by focusing on drift detection only.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use (detect drift), when-not-to-use (single-flight constraint, force_new only if wedged), prerequisites (session_id from convoopen, prior deployment with project_id), and alternatives (tflogs for streaming, tfdeploy with ignore_drift to proceed).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfgenerateGenerate TerraformA
Idempotent
Inspect

WORKFLOW: Step 3 of 4 - Generate Terraform files from completed design Generate Terraform files from an InsideOut session that has completed infrastructure design.

⚠️ PREREQUISITE: Only call this AFTER convoreply returns with terraform_ready=true in the response metadata. DO NOT call this while convoreply is still running or before terraform_ready is confirmed! If you get 'session has not reached terraform-ready state', wait for convoreply to complete first.

🎯 USE THIS TOOL WHEN: convoreply has returned with terraform_ready=true, OR the user asks to 'see the terraforms', 'generate terraform', 'show me the code', etc.

DEFAULT RESPONSE: Returns summary table + download URL (keeps code out of LLM context). FALLBACK: Set include_code: true to get full code inline if curl/unzip fails.

CRITICAL WORKFLOW (default mode):

  1. Call this tool to get file summary and download URL

  2. ASK the user: 'Where would you like me to save the Terraform files? Default: ./insideout-infra/'

  3. WAIT for user confirmation before running the download command

  4. Run the curl/unzip command with the user's chosen directory

  5. If curl/unzip FAILS (sandbox, security, platform issues), retry with include_code: true

AFTER GENERATION: Ask user if they want to review the files and then deploy with tfdeploy

REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: include_code (boolean) - set true to return full code inline as fallback. 💡 TIP: Examine workflow.usage prompt for more context on how to properly use these tools.

ParametersJSON Schema
NameRequiredDescriptionDefault
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing. Riley must have signaled [TERRAFORM_READY: true] before calling this tool.
include_codeNoWhen true, the response inlines the full generated Terraform source. Use as a fallback when the host can't read the on-disk archive (sandbox or security restrictions).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate idempotency and non-destructiveness. The description adds many behavioral details: default response returns summary+download URL, fallback with include_code, session_id format requirement, and the critical workflow. No annotation contradiction. Slightly lacking on error handling for invalid sessions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is longer but well-structured with sections (WORKFLOW, PREREQUISITE, etc.) and front-loaded with purpose. Each section adds necessary workflow context. Could be slightly more concise, but the structure aids comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description is comprehensive for a multi-step workflow tool: it covers prerequisites, default response, fallback, critical workflow steps, and post-generation actions. No output schema, but the description explains the default representation. All relevant context is provided.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds important context: session_id must include the '?token=' suffix and never be stripped, and include_code is a fallback for sandbox issues. This adds clear semantic guidance beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is 'Step 3 of 4' and 'Generate Terraform files from completed design'. It distinguishes from sibling tools by specifying it is part of the InsideOut workflow and requires terraform_ready=true from convoreply, which is unique among the sibling list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to call (after convoreply returns with terraform_ready=true) and when not to (while convoreply is running). Provides a step-by-step critical workflow and mentions post-generation steps, but does not explicitly contrast with sibling tools like tfplan or tfdeploy.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tflogsFetch Deploy LogsA
Read-only
Inspect

MONITORING: Fetch Terraform deployment logs with pagination Fetches logs from a running or completed Terraform deployment job. For completed jobs: uses REST endpoint for instant retrieval (supports tail for server-side filtering). For running jobs: streams via SSE with timeout-based pagination.

PAGINATION (running jobs only): Use last_event_id from the response to fetch more:

  1. First call: tflogs(session_id='...') → get logs + last_event_id

  2. Next call: tflogs(session_id='...', last_event_id='...') → get NEW logs only

  3. Repeat until complete: true in response

RESPONSE FIELDS:

  • logs: Array of log messages collected

  • last_event_id: Pass this back to get more logs (pagination cursor, SSE only)

  • complete: true if job finished, false if more logs may be available

  • total_logs: total log entries before tail truncation

REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: job_id to target a specific deployment (use tfruns to discover IDs), timeout (default 50s, max 55s), last_event_id (for pagination), tail (return only last N entries) ⚠️ CONTEXT WARNING: Deploy logs can be hundreds of lines. Use tail: 50 for completed jobs to avoid blowing up the context window.

ParametersJSON Schema
NameRequiredDescriptionDefault
tailNoReturn only the last N log entries. Use 0 (or omit) for all available entries.
job_idNoOptional. Target a specific job. Use tfruns to discover job IDs. When omitted, streams the latest job for the session.
timeoutNoMax seconds to collect logs. Default 50, max 55.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
last_event_idNoResume cursor for pagination. Pass back the last_event_id from a previous tflogs response to fetch only newer entries.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds substantial behavioral context: different modes for running vs completed jobs (REST vs SSE), timeout-based pagination for running jobs, response fields including complete flag, and a warning about context window. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (MONITORING, PAGINATION, RESPONSE FIELDS, REQUIRES/OPTIONAL, CONTEXT WARNING). It is concise, every sentence adds value, and the most critical information (purpose and required parameter) appears first.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite lacking an output schema, the description thoroughly explains response fields (logs, last_event_id, complete, total_logs). It covers both running and completed job scenarios, pagination mechanics, and potential issues like context window overload. No gaps for a log-fetching tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 5 parameters have schema descriptions (100% coverage), but the description adds extra value: session_id includes a critical warning about preserving the ?token= suffix, tail is recommended for completed jobs, and last_event_id is explained in the pagination flow. This goes beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it fetches Terraform deployment logs with pagination, distinguishing it from sibling tools like tfruns (which lists runs) and tfstatus (which shows status). It uses a specific verb ('Fetch') and resource ('Deploy Logs'), and the MONITORING heading reinforces the purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use pagination, how to paginate with last_event_id, and the required session_id from convoopen. It also recommends using tail:50 for completed jobs to avoid context overload, and mentions optional job_id from tfruns. This covers both when and how to use the tool effectively.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfoutputsGet Deploy OutputsA
Read-only
Inspect

INSPECTION: Retrieve Terraform outputs from a completed deployment Returns structured output values (VPC IDs, endpoints, cluster names, etc.) after a successful deploy. Sensitive outputs are redacted (shown as '(sensitive)').

By default returns outputs for the latest successful deploy. Optionally specify job_id to get outputs for a specific deployment.

REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: job_id (specific deployment), lifecycle (filter by step e.g. 'cloud-provision').

ParametersJSON Schema
NameRequiredDescriptionDefault
job_idNoOptional. Specific job ID to fetch outputs from. When omitted, returns outputs from the latest successful apply.
lifecycleNoOptional Oracle deploy-step filter for the outputs. Common values are 'provision', 'cloud-provision', 'k8s-provision' — these correspond to the lifecycle stages of the deployed stack. When omitted, returns outputs from all lifecycle steps.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, but the description adds valuable detail: sensitive outputs are redacted, default returns latest deploy, and job_id/lifecycle behavior. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (INSPECTION, Returns, Sensitive, By default, REQUIRES, OPTIONAL). Every sentence adds value with no redundancy. It is appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, so the description explains that outputs are structured values like VPC IDs and endpoints, and mentions redaction. It covers the main aspects but lacks detail on error handling or empty results. Overall, it is fairly complete for a simple inspection tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, baseline 3. The description adds meaning beyond the schema: it explains that job_id fetches outputs from a specific deployment, lifecycle filters by deploy step, and session_id must include the token suffix and not be stripped. This provides practical guidance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it retrieves Terraform outputs from a completed deployment, using specific verb 'Retrieve' and resource 'Terraform outputs'. It distinguishes from sibling tools like awsinspect and gcpinspect by focusing on deploy outputs with structured values like VPC IDs and endpoints.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly requires session_id from convoopen and optionally job_id and lifecycle, providing clear context for when to use. It labels the tool as 'INSPECTION', indicating read-only purpose. However, it does not explicitly state when not to use or compare to other tools for exclusion.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfplanPreview Infrastructure PlanA
Idempotent
Inspect

PREVIEW: Run terraform plan to preview infrastructure changes Runs a terraform plan for an InsideOut session without applying any changes. This lets the user review what will be created/changed/destroyed before committing. Returns job_id, plan_id, and project_id. Use tflogs to stream the plan output. After the plan completes, use tfdeploy with plan_id to apply the exact plan. SINGLE-FLIGHT: only one TF job per session at a time. If another job is already in flight, tfplan returns tf_job_conflict with the live job_id — attach with tfstatus/tflogs, or pass force_new=true to override. REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: sandbox (boolean, default false) — plans real generated Terraform. Set to true for cheap sandbox template (testing only). OPTIONAL: force_new (boolean, default false) - bypass the single-flight guard. Use only when the existing run is provably wedged. CREDENTIAL HANDLING: Same as tfdeploy - credentials must be configured first.

ParametersJSON Schema
NameRequiredDescriptionDefault
sandboxNoWhen true, plan against the sandbox stack; when false (default), plan the real generated Terraform.
force_newNoWhen true, bypass the single-flight guard and force a new plan even if one is already running.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds significant context beyond annotations: single-flight, return values (job_id, plan_id, project_id), conflict handling, and credential requirements. Annotations already indicate idempotent and non-destructive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with sections, front-loaded with 'PREVIEW', each sentence adds value, no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all aspects: prerequisites, behavior, error handling, return values, and references related tools. No output schema, but return info is stated. Complete for a plan tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Description explains each parameter in detail: sandbox (testing), force_new (bypass guard), session_id formats and warnings. Schema coverage is 100% but description adds meaningful usage context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool previews infrastructure changes by running terraform plan without applying, using specific verbs and resource. It distinguishes from siblings like tfdeploy and tfdestroy.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says to use before tfdeploy, mentions tflogs for streaming and force_new for wedged runs. Also states prerequisites and single-flight guard.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfrunsList Deploy RunsA
Read-only
Inspect

INSPECTION: List all Terraform deployment runs for a session Returns job IDs, statuses, types (apply/destroy), and timestamps for every run. Use this to see deployment history, find job IDs for log inspection, or check which deployments succeeded or failed.

REQUIRES: session_id from convoopen response (format: sess_v2_...).

ParametersJSON Schema
NameRequiredDescriptionDefault
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing. Returns the deployment-job history (apply / destroy / plan / drift) for this session.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, so the description's 'INSPECTION' label and privacy of session-scoped results are consistent but add limited new behavioral context. No contradictions, but the description does not disclose additional traits like rate limits or auth details beyond the session token requirement.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise: a one-line purpose header ('INSPECTION: List all Terraform deployment runs for a session'), a sentence on return data, a sentence on use cases, and a brief prerequisite. No wasted words; front-loaded effectively.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with one required parameter, no output schema, and annotations covering read-only and open-world, the description is largely complete. It covers purpose, return data, and usage guidance. The only minor gap is not explicitly stating that results are scoped to the session, but that is implied by the single parameter.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema (100% coverage) already describes the session_id parameter with pattern and notes. The description adds valuable usage guidance by emphasizing to pass the ID EXACTLY as returned, including the '?token=...' suffix, which is crucial for correctness and not clear from schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (list runs) and resource (Terraform deployment runs for a session), and specifies the returned data (job IDs, statuses, types, timestamps). It effectively differentiates from sibling action-oriented tools like tfdeploy or tfstatus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit use cases ('see deployment history, find job IDs for log inspection, or check which deployments succeeded or failed') and mentions a prerequisite (session_id from convoopen). It does not explicitly state when not to use it or suggest alternatives, but the context makes it clear, earning a 4.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfstatusCheck Deploy StatusA
Read-only
Inspect

MONITORING: Quick status check for Terraform deployments Check the current status of a Terraform deployment job. Use this tool to quickly check if a deployment is running, completed, or failed. Returns job status, job_id, and other metadata without streaming logs. Use tflogs to stream the actual deployment logs. REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: job_id to target a specific deployment (use tfruns to discover IDs).

LIVENESS: The response carries two distinct timestamps:

  • updated_at — last semantic change (only bumped when status / drift / version actually differ). Useful for sorting deployments; NOT a per-poll heartbeat.

  • last_refresh_at — last successful Oracle decode (stamped on every poll where reliable reached Oracle, even if nothing in the row changed). Use this to confirm reliable is still actively talking to Oracle for a long-running RUNNING job. Absent on rows that haven't been refreshed since the column was added. 💡 TIP: Examine workflow.usage prompt for more context on how to properly use these tools.

ParametersJSON Schema
NameRequiredDescriptionDefault
job_idNoOptional. Specific job ID to inspect. When omitted, returns the status of the latest job for the session.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are readOnlyHint=true and openWorldHint=true; description adds detailed behavioral context about two timestamps (updated_at vs last_refresh_at) and their significance, which is beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is somewhat lengthy but well-structured with headings, each sentence adds value; could be slightly more concise but still efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description explains returns job status, job_id, metadata, and two timestamps; sufficient for a monitoring tool with good annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions; description adds critical context: session_id must include ?token= suffix, job_id is optional and can be found via tfruns, and warns not to strip the token suffix.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it checks deployment status (running/completed/failed), distinguishes from tflogs (stream logs) and tfruns (discover IDs), and specifies it returns metadata without streaming logs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says to use for quick status check, names alternatives tflogs and tfruns, and provides requirements (session_id, optional job_id) with format details.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.

Resources