Skip to main content
Glama

Server Details

Designs, prices, and deploys AWS/GCP cloud infrastructure from plain-English requirements.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL
Repository
luthersystems/insideout-agent-skills
GitHub Stars
0
Server Listing
insideout-mcp

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4.7/5 across 24 of 24 tools scored.

Server CoherenceA
Disambiguation5/5

Every tool has a clearly distinct purpose: conversation flow (convoopen, convoreply, convoawait), inspection (awsinspect, gcpinspect, convostatus, etc.), Terraform operations (tfgenerate, tfdeploy, tfdestroy, etc.), and auxiliary (credawait, submit_feedback, help). No overlapping functionality.

Naming Consistency4/5

Tools mostly follow a consistent prefix_verb pattern with lowercase concatenation (e.g., convoopen, tfdeploy, awsinspect). However, 'submit_feedback' uses an underscore, breaking the pattern, and 'convoawait' uses a verb but not parallel to 'open'/'reply'. Overall very consistent.

Tool Count4/5

24 tools is slightly high but acceptable for a comprehensive infrastructure management server covering conversation, design, inspection, deployment, and maintenance. The count justifies the scope, though some tools (e.g., batch variants) could be merged.

Completeness5/5

The tool set covers the full lifecycle: conversation, design, generate, plan, apply, destroy, inspect, drift detection, rollback, and feedback. No obvious gaps like missing session management or output retrieval.

Available Tools

24 tools
awsinspectInspect AWS InfrastructureA
Read-only
Inspect

INSPECTION: Inspect AWS infrastructure for a deployed project ⚠️ PREREQUISITE: This tool requires a prior deployment ATTEMPT (successful or failed). Check convostatus for hasDeployAttempt=true before calling. Works even after failed deploys to inspect orphaned resources.

Inspect deployed AWS resources after a deployment attempt. Use this tool when the user asks about the status or details of their deployed infrastructure. It fetches temporary read-only credentials securely and queries the AWS API directly.

RESPONSE TIERS (default is summary for token efficiency):

  • Summary (default): Key fields only (~500 tokens). Set detail=false, raw=false or omit both.

  • Detail: Full metadata for a specific resource. Set detail=true + resource filter.

  • Raw: Complete unprocessed API response. Set raw=true.

REQUIRES: session_id from convoopen response (format: sess_v2_...). Supported services: account, alb, apigateway, backup, bedrock, cloudfront, cloudwatchlogs, cognito, cost-explorer, dynamodb, ebs, ec2, ecs, eks, elasticache, kms, lambda, msk, opensearch, rds, s3, secretsmanager, sqs, vpc, waf For a specific service's actions, call with action="list-actions". METRICS: Use list-metrics to discover available metrics for a service (no credentials needed). Then use get-metrics to retrieve data (auto-discovers resources). Most services return CloudWatch time-series. KMS returns key health (rotation, state). SecretsManager returns secret health (rotation, last accessed/rotated). Optional filters JSON: {"hours":6,"period":300}. BILLING: Use service=cost-explorer to inspect AWS costs. Actions: get-cost-summary (last 30 days by service, filters: {"days":7,"granularity":"DAILY"}), get-cost-forecast (projected spend through end of month), get-cost-by-tag (costs grouped by tag, filters: {"tag_key":"Environment","days":30}). Requires ce:GetCostAndUsage and ce:GetCostForecast IAM permissions.

EXAMPLES:

  • awsinspect(session_id=..., service="ec2", action="describe-instances")

  • awsinspect(session_id=..., service="cost-explorer", action="get-cost-summary")

  • awsinspect(session_id=..., service="ec2", action="get-metrics", filters="{"hours":6}")

  • awsinspect(session_id=..., service="rds", action="describe-db-instances", detail=true)

ParametersJSON Schema
NameRequiredDescriptionDefault
rawYesWhen true, returns the unprocessed AWS API response. Escape hatch for fields the summarized response doesn't surface.
actionYesOperation on the service. Examples: 'describe-instances' (ec2), 'list-buckets' (s3), 'list-keys' (kms), 'get-cost-summary' (cost-explorer), 'list-actions' (discovery), 'list-metrics' / 'get-metrics' (CloudWatch).
detailYesWhen true, returns full metadata for a single resource (requires a resource ID in filters). When false (default), returns a summary.
filtersYesOptional JSON-encoded filter object passed through to the underlying AWS API. Examples: '{"hours":6}' for metric windows, '{"days":7,"granularity":"DAILY"}' for cost queries.
serviceYesAWS service to query. Examples: 'ec2', 'rds', 'vpc', 's3', 'lambda', 'eks', 'ecs', 'cost-explorer'. Use action='list-actions' to discover the supported actions for a service.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing. The session must have an AWS deploy attempt before inspect probes will succeed.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only (readOnlyHint=true). Description adds credential mechanism, API query, response tiers, and specific actions for metrics/billing. Does not mention rate limits or errors, but general behavior is well-covered.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with sections, examples, and key info front-loaded. Slightly long but every sentence adds value. IAM permissions for cost explorer are specific but useful.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, description covers response tiers, prerequisites, supported services, parameter formats, and edge cases (failed deploys, orphaned resources). Comprehensive for agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage, description fully explains all 6 parameters: session_id (format and source), service (full supported list), action (examples), filters (JSON examples for metrics/billing), detail and raw (tier boolean meanings). Also provides examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb 'inspect' and resource 'AWS infrastructure'. Explicitly distinguishes from siblings like gcpinspect and awsinspect_batch. States it's for deployed projects after deployment attempt.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (user asks about deployed infrastructure status/details) and prerequisite (prior deployment attempt, check convostatus). Details response tier usage for different needs. No exclusion but context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

awsinspect_batchBatch-Inspect AWS InfrastructureA
Read-only
Inspect

BATCH INSPECTION: run up to 32 AWS inspect probes in one call. ⚠️ PREREQUISITE: Same as awsinspect — deploy attempt required. Check convostatus for hasDeployAttempt=true before calling.

Use this when you need to check more than ~3 resources. The backend fetches Oracle credentials ONCE per batch and fans out probes against a single AWS config — for a 12-resource health check this is ~5–8× faster and 12× fewer Oracle round-trips than calling awsinspect 12 times.

BUDGETS:

  • Up to 32 sub-probes per call (subs array length).

  • 30s per-sub timeout; 60s total batch wall-clock.

  • Concurrency cap 8 — sub-probes run in parallel but never saturate AWS.

  • 512 KB response cap: subs past the cap keep their envelope (index/service/action/ok) but have result replaced with truncated=true.

PARTIAL FAILURE IS EXPECTED. The response is an ordered results array; each entry has {index, service, action, ok, result, error}. Inspect each result — do NOT abort on the first error. A credential fetch failure leaves cred-less probes (list-actions, list-metrics) succeeding anyway.

REQUIRES: session_id from convoopen response (format: sess_v2_...). Supported services: account, alb, apigateway, backup, bedrock, cloudfront, cloudwatchlogs, cognito, cost-explorer, dynamodb, ebs, ec2, ecs, eks, elasticache, kms, lambda, msk, opensearch, rds, s3, secretsmanager, sqs, vpc, waf For a specific service's actions, use awsinspect (singular) with action="list-actions" — batch is not the place for discovery. Batch responses are always summarized (no detail/raw per-sub); use singular awsinspect when you need full metadata or raw API output for one resource.

EXAMPLES:

  • awsinspect_batch(session_id=..., subs=[ {"service":"ec2","action":"describe-instances"}, {"service":"rds","action":"describe-db-instances"}, {"service":"vpc","action":"describe-vpcs"}, {"service":"s3","action":"list-buckets"}])

  • awsinspect_batch(session_id=..., subs=[ {"service":"ec2","action":"get-metrics","filters":"{"hours":6}"}, {"service":"rds","action":"get-metrics","filters":"{"hours":6}"}])

ParametersJSON Schema
NameRequiredDescriptionDefault
subsYesUp to 32 sub-probes, each with {service, action, filters?, detail?, raw?}. The backend fetches credentials once per batch and fans out probes in parallel (concurrency 8, 30s per-sub timeout, 60s total wall clock). Partial failure is expected — inspect each result.ok independently.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing. The session must have an AWS deploy attempt before inspect probes will succeed.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read-only and open-world hints. The description adds significant behavioral context: partial failure is expected, credential failure behavior, budget limits (concurrency cap of 8, timeouts, response cap of 512 KB), and the response format. This complements annotations with actionable details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is thorough and well-organized with headings and bullet points, but slightly verbose. Every section adds value, but some details (e.g., full list of supported services) could be streamlined. Still, it is effectively structured and front-loaded with key prerequisites.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and lack of output schema, the description covers all essential aspects: prerequisite, usage context, budgets, failure modes, input requirements, supported services, and examples. It leaves no obvious gaps for an agent to understand correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, but the description compensates fully. It explains the session_id format (sess_v2_...), lists supported services, and provides examples of the subs array structure. This adds crucial meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: running up to 32 AWS inspect probes in one call. It specifies the scope (batch inspection of multiple resources) and distinguishes itself from the sibling tool 'awsinspect' by noting that batch is for checking >3 resources and is not for discovery.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use the tool (when checking more than ~3 resources) and when not to (for service discovery, use awsinspect). It also lists prerequisites (deploy attempt required, check convostatus for hasDeployAttempt=true) and includes budgets and partial failure handling.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

convoawaitAwait Pending ResponseA
Read-only
Inspect

Wait for a pending response from Riley after a convoreply timeout.

🎯 USE THIS TOOL WHEN: convoreply returned a timeout error. This allows you to continue waiting for the response without resending the message.

REQUIRES:

  • session_id: from convoopen response

OPTIONAL:

  • message_id: if known (from convoreply timeout error)

  • timeout (integer): seconds to wait. For Cursor, use 50 (default). Max 55.

Returns the same format as convoreply when successful.

ParametersJSON Schema
NameRequiredDescriptionDefault
timeoutNoMax seconds to wait. Default 50, max 55.
message_idNoOptional message ID from a convoreply timeout error. Not required for normal turn-based flow.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and openWorldHint=true, indicating no mutation and external interaction. Description adds waiting behavior, timeout max, and return format matching convoreply. However, it doesn't specify what happens on timeout expiration (e.g., error).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is well-structured: succinct first sentence, then usage callout, then bullet points for params. No fluff, all sentences add value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers main scenario and return format (same as convoreply). Lacks error handling details (e.g., timeout reached, invalid session) but given simplicity and sibling context, it is mostly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has zero description coverage, but description fully compensates: session_id from convoopen, message_id from convoreply timeout error, timeout with default 50 and max 55. This adds essential meaning beyond type/null constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it waits for a pending response after a convoreply timeout. It specifies the verb 'Wait for' and resource 'pending response from Riley', distinguishing it from siblings like convoreply (which initiates) and convoinspect (which inspects).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly tells when to use: 'USE THIS TOOL WHEN: convoreply returned a timeout error'. It also explains it continues waiting without resending, giving clear context for its purpose.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

convoinspectInspect Session TranscriptA
Read-only
Inspect

INSPECTION: View a session's conversation transcript and metadata Returns the full message history (user / assistant / tool turns) plus the session's meta — workflow step, cloud, deployment status, drift state.

This is the transcript-reader companion to the other read tools — combine it with: • convostatus for the live stack / config / pricing • tfruns for deployment history (apply / destroy / plan / drift) • stackversions for the stack-version ladder

Use it when a user asks 'what did I say earlier?' or you need to retrace why the session ended up where it did. Read-only; never mutates session state.

REQUIRES: session_id (format: sess_v2_...).

ParametersJSON Schema
NameRequiredDescriptionDefault
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Reinforces readOnlyHint annotation with 'Read-only; never mutates session state.' Adds behavioral constraint on session_id format (sess_v2_...) beyond what schema provides. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with header, bullet list, usage note, and requirement. Slightly verbose but each sentence adds value. Front-loaded with main purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Comprehensively covers what the tool returns (transcript and metadata) and how it relates to siblings. No output schema, but description sufficiently describes results. All relevant context is present.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage for the sole parameter session_id. Description adds format requirement (sess_v2_...) and states it is required, adding meaningful context beyond the raw schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Explicitly states 'INSPECTION: View a session's conversation transcript and metadata' and lists returned data (full message history, session meta). Clearly distinguishes from siblings like convostatus, tfruns, stackversions by describing their complementary roles.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit usage scenarios ('what did I say earlier?' or retracing session state), declares read-only nature, and lists sibling tools with their purposes for alternative use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

convoopenStart Design SessionAInspect

WORKFLOW: Step 1 of 4 - Start infrastructure design conversation Open an InsideOut V2 session and receive the assistant's intro message. The response contains a clean message from Riley (the infrastructure advisor) - display it to the user. ⚠️ Riley will ask questions - forward these to the user, DO NOT answer on their behalf. CRITICAL: This tool returns a session_id in the response metadata. You MUST use this session_id for ALL subsequent tool calls (convoreply, tfgenerate, tfdeploy, etc.). ⚠️ The session_id includes a ?token=... suffix (format: sess_v2_xxx?token=yyy) which is part of the session credential — without it, downstream tools fall back to a tokenless connect URL that 401s. Always pass session_id verbatim to subsequent tools and to the user; do NOT shorten, paraphrase, or strip the ?token= portion when summarizing the session in chat or in your own scratch notes. Use when the user mentions keywords like: 'setup my cloud infra', 'provision infrastructure', 'deploy infra', 'start insideout', 'use insideout', or similar intent to begin infra setup.

OPTIONAL: project_context (string) - General tech stack summary so Riley can skip discovery questions and jump to recommendations. The agent should confirm this with the user before sending. Include whichever apply: language/framework, databases/services, container usage, existing IaC, CI/CD platform, cloud provider, Kubernetes usage, what the project does. Example: 'Next.js 14 + TypeScript, PostgreSQL, Redis, Docker Compose, deployed to AWS ECS, GitHub Actions CI/CD, ~50k MAU'. NEVER include credentials, secrets, API keys, PII, source code, or internal URLs/IPs -- only general metadata summaries useful to a cloud architect agent. IMPORTANT: source (string) - You MUST set this to identify which IDE/tool you are. Auto-detect from your environment: 'claude-code', 'codex', 'antigravity', 'kiro', 'vscode', 'web', 'mcp'. If unsure, use the name of your IDE/tool in lowercase. Do NOT omit this — it controls the 'Open {IDE}' button on the credential connect screen. OPTIONAL: github_username (string) - GitHub username for deploy commit attribution. Pre-populates the GitHub username field on the connect page. 💡 TIP: Examine workflow.usage prompt for more context on how to properly use these tools.

ParametersJSON Schema
NameRequiredDescriptionDefault
sourceNoIDE/tool identifier so the connect screen can show the right 'Open {IDE}' button. Use lowercase: 'claude-code', 'codex', 'antigravity', 'kiro', 'cursor', 'vscode', 'windsurf', 'zed', 'aider', 'copilot', 'web', 'mcp'.
github_usernameNoGitHub username used for deploy commit attribution; pre-fills the GitHub username field on the connect screen.
project_contextNoOptional tech-stack summary so Riley can skip discovery questions (e.g. 'Next.js 14 + Postgres on AWS, ~50k MAU'). No PII, secrets, file paths, or source code — only general metadata useful to a cloud architect.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description fully reveals behavioral traits: it returns a session_id and an intro message, warns that Riley asks questions which must be forwarded to the user, and provides critical usage instructions. Annotations (openWorldHint, destructiveHint) add no contradiction; description goes far beyond.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is lengthy but well-structured with clear sections, warnings, and tips. Every sentence adds value; minor redundancy could be trimmed, but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's role in a multi-step workflow, the description fully covers: the response content, parameter semantics, required next steps (session_id usage), and behavior expectations. No output schema is mentioned, but the response structure is clearly described.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 0% schema coverage, the description thoroughly explains each parameter: source (auto-detect IDE), project_context (tech stack summary, never secrets), and github_username (optional for attribution). It adds constraints and examples, making up for the missing schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is 'Step 1 of 4 - Start infrastructure design conversation' and 'Open an InsideOut V2 session'. It distinguishes from sibling tools by specifying the workflow and the need to capture session_id for subsequent calls like convoreply, tfgenerate, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists trigger keywords for when to use this tool (e.g., 'setup my cloud infra', 'provision infrastructure'). It provides context as the first step in a workflow, but does not explicitly list when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

convoreplySend MessageAInspect

WORKFLOW: Step 2 of 4 - Continue infrastructure design conversation Send a user message to the active InsideOut session and receive the assistant reply. The response contains a clean message from Riley - display it to the user.

⚠️ CRITICAL: DO NOT answer Riley's questions yourself! Forward questions to the user and wait for their response. NEVER fabricate or assume the user's answer, even if you think you know what they would say. Examples of questions Riley asks that YOU MUST forward to the user:

  • 'Any questions or tweaks to these details?'

  • 'Ready for the cost estimate?'

  • 'Do you want to change the stack/config?'

  • 'Ready to proceed to Terraform?' When Riley asks ANY question, STOP and wait for the user's answer!

📋 WORKFLOW PHASES: The typical flow is conversation → tfgenerate → tfdeploy When terraform_ready=true appears in THIS tool's response, THEN you can call tfgenerate. ⚠️ DO NOT call tfgenerate until this tool returns! Wait for the response first.

🎯 KEY SIGNALS IN RESPONSE:

  • [TERRAFORM_READY: true] → NOW you can call tfgenerate

  • [[BUTTON_TF_APPLY: ...]] → Deployment is ready! Ask user if they want to deploy, then use tfdeploy

  • [[BUTTON_TF_DESTROY: ...]] → User confirmed destroy intent! Ask user to confirm, then use tfdestroy

  • [[BUTTON_TF_PLAN: ...]] → User wants to preview changes! Use tfplan to run a plan, then tfdeploy with plan_id to apply

REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: timeout (integer) - seconds to wait for response. For Cursor, use 50 (default). Max 55. OPTIONAL: project_context (string) - Only pass genuinely NEW project details the user shares after convoopen. Do NOT resend context already provided in convoopen — Riley remembers it. Do NOT scan files or directories to gather this — only use what the user explicitly tells you. Example: user reveals a new constraint like 'we also need HIPAA compliance' mid-conversation. 💡 TIP: Use convostatus to check progress anytime. Examine workflow.usage prompt for more guidance.

ParametersJSON Schema
NameRequiredDescriptionDefault
textYesUser message to send to Riley. Forward verbatim what the user said — do not summarize or rewrite.
retryNoWhen true, re-send the most recent user turn instead of submitting a new one.
timeoutNoMax seconds to wait for Riley's response. Default 50, max 55.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
project_contextNoOnly NEW project details revealed after convoopen (e.g. user mentions a new constraint mid-conversation). Don't re-send context already provided in convoopen. No PII or secrets.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses key behaviors: it sends a user message and waits for a response, the response contains a clean message to display, and response signals indicate next workflow steps. This adds value beyond annotations (openWorldHint, destructiveHint) without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is lengthy but well-structured with headings and bullet points. It front-loads the workflow step and main purpose, and each section adds necessary detail. A small reduction in verbosity could improve conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple parameters, sibling tools, workflow dependencies), the description is thorough. It covers when to use, how to handle responses, parameter usage, integration signals, and provides a tip for checking progress via convostatus. All critical aspects are addressed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates by explaining session_id, text (implied), timeout (max 55), project_context (only new details), and auto_accept (default true). However, the 'retry' parameter is not mentioned, leaving a minor gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Send a user message to the active InsideOut session and receive the assistant reply.' It positions it as Step 2 of a 4-step workflow, distinguishing it from siblings like convoopen and convostatus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides extensive guidance: it must be used after convoopen, before tfgenerate; warns against answering Riley's questions; explains when to call other tools based on response signals (e.g., [TERRAFORM_READY: true]); and details appropriate use of optional parameters like project_context and auto_accept.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

convostatusView Session Stack StatusA
Read-only
Inspect

INSPECTION: View the current infrastructure stack for a session Returns the current state of the user's infrastructure design including:

Components - Selected infrastructure services (VPC, databases, caching, etc.) • Shows what services the user has chosen (e.g., PostgreSQL, Redis, S3) • Includes architecture decisions (EKS vs EC2, monolith vs microservices)

Config - Configuration details for each component • Database sizes, replica counts, storage amounts • Cache settings, queue configurations • Backup schedules and retention policies

Pricing - Cost estimates (when available) • Monthly cost estimates per component • Total estimated monthly spend

Phase Indicators - Where the user is in the design workflow: • hasComponents: User has selected infrastructure services • hasConfig: User has configured component details • hasPricing: Cost estimates have been calculated • hasTerraform: Ready for Terraform generation

Use this tool when the user asks 'what is my current stack?', 'show my infrastructure', 'what have I selected?', or similar questions about their design progress. REQUIRES: session_id from convoopen response (format: sess_v2_...).

ParametersJSON Schema
NameRequiredDescriptionDefault
job_idNoOptional. Specific job ID to inspect. When omitted, returns the status of the latest job for the session.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint: true and openWorldHint: true, and the description confirms a read-only inspection role with 'INSPECTION' and 'Returns the current state'. It adds behavioral context by listing the required session_id format and the tool's purpose, fully aligning with annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear headline, bullet points, and logical sections. It is concise yet comprehensive, with no redundant sentences. The front-loaded 'INSPECTION' label immediately signals the tool's nature.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For an inspection tool with no output schema, the description adequately explains returned data (components, config, pricing, phase). It includes prerequisites (session_id) and usage examples. However, it fails to document the include_code parameter, leaving a completeness gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage, the description must compensate. It explains session_id semantics (required, format from convoopen) but completely omits the include_code parameter, leaving its purpose unclear. This significant gap lowers the score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool as 'INSPECTION: View the current infrastructure stack for a session' and details the specific categories (Components, Config, Pricing, Phase Indicators) returned. This distinguishes it from sibling tools like awsinspect or convoinspect which likely inspect different scopes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists example user queries that trigger this tool ('what is my current stack?', 'show my infrastructure') and mentions the required session_id from convoopen. However, it does not clarify when NOT to use it or compare to alternatives like convoinspect.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

credawaitAwait Cloud CredentialsA
Idempotent
Inspect

Wait for the user to securely connect their cloud account and subscribe to Luther Systems. Polls until credentials appear on the session.

🎯 USE THIS TOOL WHEN: tfdeploy returns an 'auth_required', 'no_credentials', or 'credentials_expired' error.

The user needs to visit the connect URL to:

  1. Connect their cloud credentials (AWS or GCP)

  2. Sign up and subscribe to a Luther Systems plan (required for deployment)

This secure connection allows InsideOut to deploy and manage infrastructure in the user's cloud account on their behalf. Credentials are handled securely and only used for deployment and management sessions.

WORKFLOW:

  1. FIRST: Present the connect URL and explanation to the user (from the tfdeploy error response)

  2. THEN: Call this tool to begin polling for credentials

  3. The user opens the URL in their browser to subscribe and add credentials

  4. When credentials are found, inform the user and call tfdeploy to deploy

IMPORTANT: Do NOT call this tool without first showing the connect URL to the user. The user needs to see the URL to complete the process.

REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: cloud ('aws' or 'gcp'), timeout (integer, seconds to wait, default 300, max 600).

ParametersJSON Schema
NameRequiredDescriptionDefault
cloudNoCloud provider whose credentials are awaited: 'aws' or 'gcp'. Defaults to 'aws'.
timeoutNoMax seconds to wait for the user to complete the browser-based credential connect flow. Default 300, max 600.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already cover safety (non-destructive, idempotent). Description adds polling behavior, parameter constraints (timeout max), and secure credential handling. Doesn't detail timeout behavior, but adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with bold headings, bullet points, and a clear workflow. Slightly verbose but effective. Front-loaded with key use-case.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Comprehensive for a polling tool: covers prerequisites, workflow, parameter details, and security context. No output schema but not needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage, but description fully explains each parameter: session_id is required and format (sess_v2_...), cloud is optional with values 'aws' or 'gcp', timeout is integer seconds with defaults and max.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to wait for cloud credentials to appear after user connects account. It distinguishes from sibling tools by specifying it is for polling credentials after tfdeploy errors.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (after tfdeploy errors like 'auth_required'), and includes a workflow with a required precondition (show URL to user first). No ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

gcpinspectInspect GCP InfrastructureA
Read-only
Inspect

INSPECTION: Inspect GCP infrastructure for a deployed project ⚠️ PREREQUISITE: This tool requires a prior deployment ATTEMPT (successful or failed). Check convostatus for hasDeployAttempt=true before calling. Works even after failed deploys to inspect orphaned resources.

Inspect deployed GCP resources after a deployment attempt. Use this tool when the user asks about the status or details of their deployed GCP infrastructure. It fetches temporary read-only credentials securely and queries the GCP API directly.

RESPONSE TIERS (default is summary for token efficiency):

  • Summary (default): Key fields only (~500 tokens). Set detail=false, raw=false or omit both.

  • Detail: Full metadata for a specific resource. Set detail=true + resource filter.

  • Raw: Complete unprocessed API response. Set raw=true.

REQUIRES: session_id from convoopen response (format: sess_v2_...). Supported services: apigateway, bastion, billing, cloudarmor, cloudbuild, cloudcdn, cloudfunctions, cloudkms, cloudlogging, cloudmonitoring, cloudrun, cloudsql, compute, firestore, gcs, gke, identityplatform, loadbalancer, memorystore, pubsub, secretmanager, vertexai, vpc For a specific service's actions, call with action="list-actions".

METRICS: Use list-metrics to see available Cloud Monitoring metrics for any service (no credentials needed — progressive disclosure). Use get-metrics to retrieve time-series data. Optional filters JSON: {"hours":6,"period":300}. Label breakdowns: Cloud Functions (by status), Load Balancer/API Gateway (by response_code_class), Cloud CDN (by cache_result). Secret Manager get-metrics returns operational health (version count, replication, create time) — no time-series. Bastion is an alias for Compute Engine metrics (SSH connection count not available as a GCP metric). BILLING: Use service=billing to inspect GCP billing. Actions: get-billing-info (check if billing enabled, which billing account), get-budgets (list budget alerts for the project — auto-fetches billing account). Requires roles/billing.viewer IAM role. Required IAM roles: Monitoring Viewer (roles/monitoring.viewer) for metrics, Secret Manager Viewer (roles/secretmanager.viewer) for secret health, Billing Viewer (roles/billing.viewer) for billing.

EXAMPLES:

  • gcpinspect(session_id=..., service="compute", action="list-instances")

  • gcpinspect(session_id=..., service="gke", action="list-clusters")

  • gcpinspect(session_id=..., service="cloudsql", action="get-metrics", filters="{"hours":6}")

  • gcpinspect(session_id=..., service="billing", action="get-billing-info")

ParametersJSON Schema
NameRequiredDescriptionDefault
rawYesWhen true, returns the unprocessed GCP API response. Escape hatch for fields the summarized response doesn't surface.
actionYesOperation on the service. Examples: 'list-instances' (compute), 'list-buckets' (storage), 'list-clusters' (gke), 'list-actions' (discovery), 'list-metrics' / 'get-metrics' (Cloud Monitoring).
detailYesWhen true, returns full metadata for a single resource. When false (default), returns a summary.
filtersYesOptional JSON-encoded filter object passed through to the underlying GCP API. Examples: '{"hours":6}' for metric windows, '{"zone":"us-central1-a"}' for zone-scoped queries.
serviceYesGCP service to query. Examples: 'compute', 'storage', 'cloudsql', 'gke', 'cloudrun', 'pubsub', 'firestore'. Use action='list-actions' to discover supported actions for a service.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing. The session must have a GCP deploy attempt before inspect probes will succeed.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

While annotations already indicate readOnlyHint and openWorldHint, the description adds substantial behavioral information: it fetches temporary read-only credentials, queries GCP API directly, works after failed deploys, lists required IAM roles, and details response tiers (summary/detail/raw) with token efficiency considerations. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long but well-structured with clear sections (prerequisites, response tiers, supported services, examples). Every sentence adds value, though some redundancy exists (e.g., purpose repeated). The front-loading of the purpose and prerequisite is effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema, but the description covers response tiers, IAM roles, supported services, actions, metrics, billing, and prerequisites. It provides comprehensive guidance for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description compensates fully by explaining each parameter: session_id format and source, service list, action examples, filters as JSON with examples, detail and raw booleans with tier definitions. Examples further clarify usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Inspect GCP infrastructure for a deployed project' and elaborates on the tool's function. It distinguishes from siblings like gcpinspect_batch by focusing on synchronous inspection after deployment, and from awsinspect by targeting GCP.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly mentions the prerequisite (prior deployment attempt), how to check it via convostatus, and when to use the tool (when user asks about deployed GCP infrastructure). It does not explicitly contrast with gcpinspect_batch, but the context cues are strong.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

gcpinspect_batchBatch-Inspect GCP InfrastructureA
Read-only
Inspect

BATCH INSPECTION: run up to 32 GCP inspect probes in one call. ⚠️ PREREQUISITE: Same as gcpinspect — deploy attempt required. Check convostatus for hasDeployAttempt=true before calling.

Use this when you need to check more than ~3 resources. The backend fetches Oracle credentials ONCE per batch and fans out probes against a single GCP credentials blob — a 12-resource health check is ~5–8× faster and 12× fewer Oracle round-trips than calling gcpinspect 12 times.

BUDGETS:

  • Up to 32 sub-probes per call (subs array length).

  • 30s per-sub timeout; 60s total batch wall-clock.

  • Concurrency cap 8.

  • 512 KB response cap: subs past the cap keep their envelope (index/service/action/ok) but have result replaced with truncated=true.

PARTIAL FAILURE IS EXPECTED. The response is an ordered results array; each entry has {index, service, action, ok, result, error}. Inspect each result — do NOT abort on the first error. A credential fetch failure leaves cred-less probes (list-actions, list-metrics) succeeding anyway.

REQUIRES: session_id from convoopen response (format: sess_v2_...). Supported services: apigateway, bastion, billing, cloudarmor, cloudbuild, cloudcdn, cloudfunctions, cloudkms, cloudlogging, cloudmonitoring, cloudrun, cloudsql, compute, firestore, gcs, gke, identityplatform, loadbalancer, memorystore, pubsub, secretmanager, vertexai, vpc For a specific service's actions, use gcpinspect (singular) with action="list-actions" — batch is not the place for discovery. Batch responses are always summarized (no detail/raw per-sub); use singular gcpinspect when you need full metadata or raw API output for one resource.

EXAMPLES:

  • gcpinspect_batch(session_id=..., subs=[ {"service":"compute","action":"list-instances"}, {"service":"gke","action":"list-clusters"}, {"service":"cloudsql","action":"list-instances"}])

  • gcpinspect_batch(session_id=..., subs=[ {"service":"compute","action":"get-metrics","filters":"{"hours":6}"}, {"service":"cloudrun","action":"get-metrics","filters":"{"hours":6}"}])

ParametersJSON Schema
NameRequiredDescriptionDefault
subsYesUp to 32 sub-probes, each with {service, action, filters?, detail?, raw?}. The backend fetches credentials once per batch and fans out probes in parallel (concurrency 8, 30s per-sub timeout, 60s total wall clock). Partial failure is expected — inspect each result.ok independently.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing. The session must have a GCP deploy attempt before inspect probes will succeed.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds critical details beyond annotations: budgets (32 sub-probes, timeouts, concurrency cap, response cap with truncation), partial failure expectations, credential fetch behavior, and summarized response nature.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear headings (BUDGETS, PARTIAL FAILURE, REQUIRES, EXAMPLES). Every sentence adds value, no redundancy. Efficient despite length due to density of useful information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all essential context: prerequisites, response structure, error handling, performance benefits, and limits. Includes examples for different use cases. No output schema, but description adequately explains response format.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage, the description compensates by explaining session_id format, subs structure (service/action/filters), filters as JSON string, and supported services via examples. Slightly less explicit for individual parameters but still highly informative.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool runs up to 32 GCP inspect probes in one call and distinguishes it from the singular gcpinspect for individual probes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use (more than ~3 resources), provides prerequisites (check convostatus), and warns when not to use (not for listing actions, use singular gcpinspect for full metadata).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

helpWorkflow GuideA
Read-only
Inspect

Get workflow guidance for using InsideOut infrastructure tools. Call help() for a compact overview, or help(section=...) for a detailed guide. Sections: workflow, tools, examples, inspect. Responses include hints with next_actions and related_tools.

ParametersJSON Schema
NameRequiredDescriptionDefault
sectionNoOptional section to focus the response. One of: 'workflow', 'tools', 'examples', 'inspect'. When omitted, returns a compact overview (~500 tokens).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, indicating a safe read operation. The description adds that responses include hints with next_actions and related_tools, providing useful behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: three sentences total. It communicates purpose, usage variations, and expected response content without unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple help tool, the description adequately explains the parameter usage and response content (hints with next_actions and related_tools). No output schema exists, but the description sufficiently covers what to expect. Given the complexity, it is mostly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has one optional parameter 'section' with 0% description coverage. The description compensates by explaining that help() gives an overview and help(section=...) gives a detailed guide, listing the valid sections (workflow, tools, examples, inspect), effectively adding semantic meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides workflow guidance for InsideOut infrastructure tools. It specifies calling with no arguments for an overview or with a section parameter for a detailed guide, listing the available sections (workflow, tools, examples, inspect). This distinguishes it from sibling tools that perform specific infrastructure actions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says to call help() for a compact overview or help(section=...) for a detailed guide, and lists the sections. However, it does not provide explicit guidance on when to use this tool versus sibling tools like awsinspect or tfplan, leaving that implicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

stackdiffCompare Stack VersionsA
Read-only
Inspect

Structured diff showing what would be deployed if the user ran tfdeploy now. Returns component-level changes (added/removed/modified), field-level details, and pricing deltas.

Defaults (#1392): with no version arguments, compares the LAST SUCCESSFULLY DEPLOYED version against the user's CURRENT LIVE DESIGN (the same data the UI shows). Empty baseline if nothing has been deployed or after a destroy. Pending drafts are NOT used as the target — they go stale once the user edits past them; live IR via chat history is always current.

Pass explicit from_version and/or to_version integers to compare any two saved versions (e.g. v3 → v5).

REQUIRES: session_id from convoopen response (format: sess_v2_...).

ParametersJSON Schema
NameRequiredDescriptionDefault
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
to_versionNoEnding stack version number for the diff. Defaults to the current draft.
from_versionNoStarting stack version number for the diff. Defaults to the latest applied version.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, so the description's claim of showing diffs aligns. It adds the required session_id format but does not elaborate on other behaviors like authentication or rate limits. Some value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences: purpose, output details, parameters with defaults. Front-loaded and no extraneous content. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a diff tool, the description covers what it does, what it shows, parameter defaults, and a prerequisite. No output schema exists, but the description sufficiently describes the output. Could mention limitations but overall complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates by explaining defaults for from_version and to_version and specifying the required format for session_id. This adds significant meaning beyond the schema's empty descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Structured diff between two stack versions' and lists what it shows (component-level, field-level changes, pricing deltas). It distinguishes itself from siblings like stackversions (list versions) and tfplan (plan changes) by focusing on diffing specific versions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains optional parameters with defaults and explicitly requires 'session_id from convoopen response', providing clear context for when to use the tool. It does not explicitly contrast with siblings but sufficient context is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

stackrollbackRollback Stack VersionA
Idempotent
Inspect

Create a draft version by reverting to a previous version's config. Copies components, config, and pricing from the target version. If a draft already exists, updates it in-place (single-draft rule).

Use stackversions first to find available version numbers.

REQUIRES: session_id from convoopen response (format: sess_v2_...), version (target version number).

ParametersJSON Schema
NameRequiredDescriptionDefault
versionYesTarget stack version number to roll back to. Use stackversions to list available versions.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description explains that it copies components, config, and pricing, and has a single-draft rule (updates if exists). Annotations (idempotentHint true, destructiveHint false) are consistent with this behavior, and the description adds useful context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three short paragraphs, front-loaded with purpose, and no wasted words. Each sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple 2-parameter tool, the description covers the operation, behavior (single-draft rule), required context, and pre-requisite tool usage. It lacks mention of return value or error handling, but this is acceptable given the lack of output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates by explaining the format and source of session_id ('sess_v2_...') and that version is the target version number. Though brief, it provides essential context for correct usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it creates a draft version by reverting to a previous version's config. The verb 'revert' is specific, and the tool is distinct from siblings like stackversions (list versions) and stackdiff (compare stacks).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance to use stackversions first to find version numbers and specifies the requirement for session_id from convoopen. Does not explicitly state when not to use, but the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

stackversionsList Stack VersionsA
Read-only
Inspect

List all stack versions for a session (newest first). Shows version history including version number, status (draft/confirmed/applied), change summaries, and timestamps.

Use this tool to see the design history, review what changed between iterations, or find a version number to roll back to.

REQUIRES: session_id from convoopen response (format: sess_v2_...).

ParametersJSON Schema
NameRequiredDescriptionDefault
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, consistent with listing. The description adds key behavioral details: ordering (newest first), output fields, and a crucial requirement that session_id must come from convoopen response with format 'sess_v2_...'.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three brief sentences plus a requirement line. Each sentence adds distinct value: what it does, what it shows, and when to use it. No redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple tool with one parameter and no output schema, the description covers purpose, usage, parameter requirement, and output fields. Missing pagination or limits, but adequate for a list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description compensates by specifying the session_id parameter's source and format, adding semantic value beyond the raw schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists all stack versions for a session, ordered newest first, and details the output fields (version number, status, change summaries, timestamps). It is distinct from sibling tools like stackdiff or stackrollback.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit use cases: 'see the design history, review what changed between iterations, or find a version number to roll back to.' It doesn't explicitly mention when not to use it, but the context of sibling tools helps differentiate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

submit_feedbackSubmit FeedbackAInspect

FEEDBACK: Submit feedback, bug reports, or feature requests to Luther Systems Use this tool to forward user feedback directly to the Luther Systems team. This includes bug reports, feature requests, questions, or general feedback about InsideOut. The agent itself can also use this tool to report issues it encounters during operation.

REQUIRES: session_id, category, message OPTIONAL: user_email (for follow-up), user_name, source (default: 'mcp'), initiator ('user' or 'agent')

Categories: bug_report, feature_request, general_feedback, question, security

The 'initiator' field tracks who triggered the report:

  • 'user' — the user explicitly reported the issue or requested feedback submission

  • 'agent' — Riley detected an issue and initiated the feedback flow

Examples:

  • User says 'the deploy button is broken' → submit_feedback(category='bug_report', message='...', initiator='user')

  • User says 'I wish it had dark mode' → submit_feedback(category='feature_request', message='...', initiator='user')

  • Deployment failed with Terraform error → submit_feedback(category='bug_report', message='Deployment failed: Terraform apply error on aws_alb resource — timeout waiting for ALB provisioning', initiator='agent')

ParametersJSON Schema
NameRequiredDescriptionDefault
sourceNoOptional source channel: 'mcp', 'cli', or 'web'.
messageYesFeedback content. Free-form text describing the issue, request, or comment.
categoryYesFeedback category. One of: bug_report, feature_request, general_feedback, question.
initiatorNoOptional originator: 'user' (human triggered) or 'agent' (automated).
user_nameNoOptional display name for attribution.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing. Identifies the conversation the feedback is about.
user_emailNoOptional email address for follow-up.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate openWorldHint=true (tool may have side effects beyond output) and destructiveHint=false (not destructive). The description supplements this by disclosing that feedback is sent to the team, requiring session_id, category, and message. It explains the initiator field and its semantics, providing context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with sections (FEEDBACK, REQUIRES, OPTIONAL, Categories, initiator, Examples). It is slightly verbose but every sentence adds value, including examples that illustrate usage. It is front-loaded with purpose. Could be trimmed minimally but is effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 7 parameters (3 required), no output schema, and no schema descriptions, the description covers all necessary context: required/optional fields, valid categories, initiator semantics, and illustrative examples. It is complete for an agent to correctly select and invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% (no descriptions in schema properties). The description compensates fully by listing required and optional parameters, explaining each (e.g., user_email for follow-up, source default 'mcp'), defining categories (bug_report, feature_request, etc.), and detailing the initiator field with examples. This adds significant meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool submits feedback, bug reports, or feature requests to Luther Systems. It uses specific verbs ('submit', 'forward') and identifies the resource ('feedback to Luther Systems team'). It distinguishes from sibling tools which focus on infrastructure inspect, conversations, and terraform operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use: for forwarding user feedback, bug reports, feature requests, questions, or general feedback. It also notes the agent can use it to report issues. It provides clear context for use cases but does not explicitly mention when not to use it or alternatives, which is acceptable given siblings are unrelated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfdeployDeploy InfrastructureA
Destructive
Inspect

WORKFLOW: Step 4 of 4 - Deploy infrastructure to the cloud Deploy infrastructure by starting a Terraform job for an InsideOut session. This tool initiates the actual deployment process after Terraform files have been generated. IMPORTANT: This starts a long-running job (15+ minutes). Use tfstatus to monitor progress. SINGLE-FLIGHT: only one TF job (apply/plan/destroy/drift) runs per session at a time. If another job is already in flight, tfdeploy returns tf_job_conflict with the live job_id — attach with tfstatus/tflogs instead of retrying, or pass force_new=true to override. Returns confirmation that the deployment has started. REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: plan_id (string) — Apply a previously created plan from tfplan. Preview-then-apply workflow: tfplan → tflogs (review) → tfdeploy(plan_id=...). OPTIONAL: sandbox (boolean, default false) — deploys real generated Terraform. Set to true for cheap sandbox template (testing only). OPTIONAL: ignore_drift (boolean, default false) - when true, proceeds with deploy even if infrastructure drift is detected. By default, deploys fail on drift. Use after reviewing drift details via tfdrift or tflogs. OPTIONAL: force_new (boolean, default false) - bypass the session-level single-flight guard. Use only when the existing run is provably wedged. CREDENTIAL FLOW (if credentials are missing):

  1. Response includes a connect_url — present it to the user

  2. Call credawait(session_id=...) to poll for credentials

  3. When credawait returns success, retry tfdeploy Do NOT call credawait without first showing the connect URL to the user.

ParametersJSON Schema
NameRequiredDescriptionDefault
plan_idNoApply a previously created plan from tfplan. When set, project_id should also be provided.
sandboxNoWhen true (default for MCP), deploys a small sandbox stack instead of the real generated Terraform. Set false to deploy the actual user stack.
versionNoDeploy a specific stack version number. Defaults to the current draft.
force_newNoWhen true, bypass the session-level single-flight guard and start a new deploy even if another job is in flight. Use only when an existing run is provably wedged.
project_idNoProject ID returned by tfplan. Required alongside plan_id.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
ignore_driftNoWhen true, proceed with deploy even if drift is detected on the existing stack.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (destructiveHint, openWorldHint), the description discloses long-running nature (15+ min), single-flight constraint, conflict return behavior, credential requirement with connect_url, and drift handling. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with a workflow header, detailed parameter explanations, and a separate credential flow section. Every sentence adds value; no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Extremely thorough covering workflow, conflict handling, credential flow, and drift behavior. Minor gap: missing descriptions for version and project_id parameters. Otherwise complete for a complex deployment tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, description explains key parameters (session_id, plan_id, sandbox, ignore_drift, force_new) including formats and defaults. However, version and project_id are not described, leaving a gap for those parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Deploy infrastructure by starting a Terraform job' and positions it as Step 4 of 4. It distinguishes from siblings like tfplan (planning) and tfdestroy (destroy) by specifying the deployment role.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit workflow guidance (tfplan -> tflogs -> tfdeploy), when to use (after Terraform files generated), when not to use (single-flight conflict, recommends tfstatus/tflogs), and credential flow with credawait. Also explains optional params like plan_id, sandbox, ignore_drift.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfdestroyDestroy InfrastructureA
Destructive
Inspect

DESTROY: Tear down previously deployed infrastructure Destroys infrastructure by calling the Oracle destroy endpoint for a session that has a prior successful deployment. IMPORTANT: This starts a long-running job. Use tfstatus/tflogs to monitor progress. SINGLE-FLIGHT: only one TF job per session at a time. If another job is already in flight, tfdestroy returns tf_job_conflict with the live job_id — attach with tfstatus/tflogs, or pass force_new=true to override. REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: force_new (boolean, default false) - bypass the single-flight guard. Use only when the existing run is provably wedged. PREREQUISITE: The session must have a prior successful deployment with a project_id. After destroy completes, the session is kept for historical record but hasDeployment is set to false.

ParametersJSON Schema
NameRequiredDescriptionDefault
force_newNoWhen true, bypass the single-flight guard and force a new destroy even if another job is in flight.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing. The deployed stack for this session will be torn down.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description fully discloses behavioral traits: long-running job, single-flight guard with conflict return, optional force_new override, session kept after destroy with hasDeployment set to false. Annotations already mark destructiveHint=true, but the description adds rich context without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with sections and a prominent 'DESTROY:' label. It is thorough but could be slightly more concise; however, every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's destructive nature and dependencies, the description covers preconditions (session with prior success), runtime behavior (long job, single-flight), monitoring instructions, and post-destroy state. No output schema exists, but return behavior (conflict or job start) is described.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so the description must compensate. It explains the session_id format (sess_v2_...) and the force_new parameter (boolean, default false, purpose to bypass single-flight guard). This adds significant meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that tfdestroy destroys previously deployed infrastructure via the Oracle destroy endpoint. It uses a strong verb ('DESTROY') and distinguishes itself from sibling tools like tfdeploy (which deploys) and tfplan (which plans).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use (after a successful deployment), the single-flight constraint (only one job per session at a time), and alternatives (use tfstatus/tflogs to monitor, use force_new only if wedged). It also lists prerequisites and required session_id format.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfdriftCheck Infrastructure DriftA
Idempotent
Inspect

DRIFT CHECK: Run a read-only drift detection check Checks whether deployed infrastructure has drifted from the expected Terraform state. This is a read-only operation — it does NOT modify any infrastructure. Returns job_id. Use tflogs to stream the drift check results. SINGLE-FLIGHT: only one TF job per session at a time. If another job is already in flight, tfdrift returns tf_job_conflict with the live job_id — attach with tfstatus/tflogs, or pass force_new=true to override. REQUIRES: session_id from convoopen response (format: sess_v2_...). PREREQUISITE: The session must have a prior deployment with a project_id. OPTIONAL: force_new (boolean, default false) - bypass the single-flight guard. Use only when the existing run is provably wedged. If drift is detected, the user can either fix the drift or use tfdeploy(ignore_drift=true) to proceed.

ParametersJSON Schema
NameRequiredDescriptionDefault
force_newNoWhen true, bypass the single-flight guard and force a new drift check even if another job is in flight.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly and idempotent, and the description adds significant behavioral details: single-flight constraint, conflict response with job_id, force_new usage, and that it returns a job_id. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections and bullet points. Every sentence adds value, covering purpose, behavioral notes, prerequisites, and parameter details without being overly verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description explains the return value (job_id) and how to stream results (tflogs). It covers all essential aspects: operation, preconditions, single-flight behavior, error scenarios, and post-check actions. Complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Although schema description coverage is 0%, the description thoroughly explains both parameters: session_id (required, format from convoopen) and force_new (optional boolean with meaning and when to use). This compensates fully for the schema's lack of descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'DRIFT CHECK: Run a read-only drift detection check' and explains it checks whether deployed infrastructure has drifted from expected Terraform state. It uses specific verb and resource, distinguishing it from siblings like tfdeploy, tfdestroy, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly provides when to use (drift detection) and when not to (read-only, no modifications). It mentions prerequisites (session_id, prior deployment), alternatives (tflogs for streaming), and handling of single-flight conflicts. Also explains what to do if drift is detected (fix or use tfdeploy with ignore_drift=true).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfgenerateGenerate TerraformA
Idempotent
Inspect

WORKFLOW: Step 3 of 4 - Generate Terraform files from completed design Generate Terraform files from an InsideOut session that has completed infrastructure design.

⚠️ PREREQUISITE: Only call this AFTER convoreply returns with terraform_ready=true in the response metadata. DO NOT call this while convoreply is still running or before terraform_ready is confirmed! If you get 'session has not reached terraform-ready state', wait for convoreply to complete first.

🎯 USE THIS TOOL WHEN: convoreply has returned with terraform_ready=true, OR the user asks to 'see the terraforms', 'generate terraform', 'show me the code', etc.

DEFAULT RESPONSE: Returns summary table + download URL (keeps code out of LLM context). FALLBACK: Set include_code: true to get full code inline if curl/unzip fails.

CRITICAL WORKFLOW (default mode):

  1. Call this tool to get file summary and download URL

  2. ASK the user: 'Where would you like me to save the Terraform files? Default: ./insideout-infra/'

  3. WAIT for user confirmation before running the download command

  4. Run the curl/unzip command with the user's chosen directory

  5. If curl/unzip FAILS (sandbox, security, platform issues), retry with include_code: true

AFTER GENERATION: Ask user if they want to review the files and then deploy with tfdeploy

REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: include_code (boolean) - set true to return full code inline as fallback. 💡 TIP: Examine workflow.usage prompt for more context on how to properly use these tools.

ParametersJSON Schema
NameRequiredDescriptionDefault
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing. Riley must have signaled [TERRAFORM_READY: true] before calling this tool.
include_codeNoWhen true, the response inlines the full generated Terraform source. Use as a fallback when the host can't read the on-disk archive (sandbox or security restrictions).
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes default vs fallback behavior (include_code), workflow steps, and prerequisite. Annotations already indicate idempotent and non-destructive; description adds context about return format (summary+URL) and fallback logic. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with sections, emojis, bold. Every sentence adds value. Front-loaded with workflow step and prerequisite. Efficient despite thoroughness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers prerequisite, usage triggers, default response, fallback, full workflow steps, and post-generation actions. No output schema but describes return sufficiently. Complete for a complex multi-step tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 0% schema coverage, description fully explains both parameters: session_id format and required, include_code as fallback. Adds meaning beyond schema by clarifying when to use include_code.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: generating Terraform files from a completed InsideOut design, and positions it as step 3 of 4. It distinguishes from sibling tools like tfdeploy, tfplan, etc., by focusing on generation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit prerequisite: only call after convoreply returns with terraform_ready=true. Gives clear when-to-use and when-not-to instructions, and outlines the full workflow including asking user for save location. No ambiguity for the agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tflogsFetch Deploy LogsA
Read-only
Inspect

MONITORING: Fetch Terraform deployment logs with pagination Fetches logs from a running or completed Terraform deployment job. For completed jobs: uses REST endpoint for instant retrieval (supports tail for server-side filtering). For running jobs: streams via SSE with timeout-based pagination.

PAGINATION (running jobs only): Use last_event_id from the response to fetch more:

  1. First call: tflogs(session_id='...') → get logs + last_event_id

  2. Next call: tflogs(session_id='...', last_event_id='...') → get NEW logs only

  3. Repeat until complete: true in response

RESPONSE FIELDS:

  • logs: Array of log messages collected

  • last_event_id: Pass this back to get more logs (pagination cursor, SSE only)

  • complete: true if job finished, false if more logs may be available

  • total_logs: total log entries before tail truncation

REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: job_id to target a specific deployment (use tfruns to discover IDs), timeout (default 50s, max 55s), last_event_id (for pagination), tail (return only last N entries) ⚠️ CONTEXT WARNING: Deploy logs can be hundreds of lines. Use tail: 50 for completed jobs to avoid blowing up the context window.

ParametersJSON Schema
NameRequiredDescriptionDefault
tailNoReturn only the last N log entries. Use 0 (or omit) for all available entries.
job_idNoOptional. Target a specific job. Use tfruns to discover job IDs. When omitted, streams the latest job for the session.
timeoutNoMax seconds to collect logs. Default 50, max 55.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
last_event_idNoResume cursor for pagination. Pass back the last_event_id from a previous tflogs response to fetch only newer entries.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses detailed behavioral traits beyond annotations: REST vs SSE endpoints, pagination mechanism, response fields, and timeout behavior. The readOnlyHint is consistent; the description adds context about potential large logs and server-side filtering, enhancing transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with sections (MONITORING, PAGINATION, RESPONSE FIELDS, REQUIRES, OPTIONAL, CONTEXT WARNING), is front-loaded with the core purpose, and uses bold for emphasis. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Comprehensive for a tool with no output schema: it details all response fields (logs, last_event_id, complete, total_logs), explains pagination, and warns about context window. It also links to sibling tools for session_id and job_id, covering typical usage scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 0% schema coverage, the description thoroughly explains all five parameters: session_id (required), job_id, timeout, last_event_id, and tail. It provides defaults, limits, and usage context (e.g., tail for completed jobs), adding significant meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it fetches Terraform deployment logs with specific verbs like 'Fetch', and distinguishes between completed and running jobs. It is not a tautology and differentiates from sibling tools like tfruns by focusing on log retrieval rather than run listing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use guidance: contrasts completed vs running jobs, details pagination steps with concrete examples, and gives context warning about using tail to avoid large context windows. It also references required parameters like session_id from convoopen, aiding appropriate selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfoutputsGet Deploy OutputsA
Read-only
Inspect

INSPECTION: Retrieve Terraform outputs from a completed deployment Returns structured output values (VPC IDs, endpoints, cluster names, etc.) after a successful deploy. Sensitive outputs are redacted (shown as '(sensitive)').

By default returns outputs for the latest successful deploy. Optionally specify job_id to get outputs for a specific deployment.

REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: job_id (specific deployment), lifecycle (filter by step e.g. 'cloud-provision').

ParametersJSON Schema
NameRequiredDescriptionDefault
job_idNoOptional. Specific job ID to fetch outputs from. When omitted, returns outputs from the latest successful apply.
lifecycleNoOptional Oracle deploy-step filter for the outputs. Common values are 'provision', 'cloud-provision', 'k8s-provision' — these correspond to the lifecycle stages of the deployed stack. When omitted, returns outputs from all lifecycle steps.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and openWorldHint. Description adds valuable context: sensitive outputs are redacted, returns structured values. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is well-organized with labeled sections (INSPECTION, REQUIRES, OPTIONAL). Front-loaded purpose, then behavior, then parameters. No unnecessary words, each sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, description explains return type (structured output values, redacted sensitive). Covers all parameters, default behavior, and prerequisites. Sufficient for an AI agent to understand and invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 0% description coverage, so description compensates fully. Explains session_id format and source, job_id for specific deployment, and lifecycle filter with example ('cloud-provision'). All three parameters are clearly defined.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Retrieve Terraform outputs from a completed deployment', specifying verb and resource. Includes examples of output types (VPC IDs, endpoints) and distinguishes from sibling tools like tfplan and tfdeploy by focusing on post-deployment inspection.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear prerequisites (session_id from convoopen), default behavior (latest successful deploy), and optional parameters for specific deployment or lifecycle step. Does not explicitly state when not to use, but context makes it clear it's for inspection after a deploy.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfplanPreview Infrastructure PlanA
Idempotent
Inspect

PREVIEW: Run terraform plan to preview infrastructure changes Runs a terraform plan for an InsideOut session without applying any changes. This lets the user review what will be created/changed/destroyed before committing. Returns job_id, plan_id, and project_id. Use tflogs to stream the plan output. After the plan completes, use tfdeploy with plan_id to apply the exact plan. SINGLE-FLIGHT: only one TF job per session at a time. If another job is already in flight, tfplan returns tf_job_conflict with the live job_id — attach with tfstatus/tflogs, or pass force_new=true to override. REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: sandbox (boolean, default false) — plans real generated Terraform. Set to true for cheap sandbox template (testing only). OPTIONAL: force_new (boolean, default false) - bypass the single-flight guard. Use only when the existing run is provably wedged. CREDENTIAL HANDLING: Same as tfdeploy - credentials must be configured first.

ParametersJSON Schema
NameRequiredDescriptionDefault
sandboxNoWhen true, plan against the sandbox stack; when false (default), plan the real generated Terraform.
force_newNoWhen true, bypass the single-flight guard and force a new plan even if one is already running.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare idempotentHint=true and destructiveHint=false. Description adds useful behavioral details: no changes applied, returns job_id/plan_id/project_id, single-flight behavior with conflict handling, credential handling, and sandbox mode. While not all facets are exhaustively detailed (e.g., rate limits), it significantly augments the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is well-structured with bolded sections (PREVIEW, SINGLE-FLIGHT, REQUIRES, OPTIONAL) and front-loaded with main action. Slightly verbose but each sentence adds value. Could be trimmed for extreme conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, description covers return values (job_id, plan_id, project_id) and integration with other tools (tflogs, tfdeploy). Also explains single-flight, sandbox, and credentials. For a moderately complex tool, the description is complete and actionable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage, so description must compensate. It does so thoroughly: explains sandbox as 'cheap sandbox template (testing only)', force_new as bypass for wedged runs, and session_id as required from convoopen response. This adds essential meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Starts with 'PREVIEW: Run terraform plan to preview infrastructure changes', clearly stating the verb and resource. Distinguishes from siblings like tfdeploy, tfdestroy, and tfgenerate by focusing on preview without application.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: preview changes before committing. Provides instructions to use tflogs for streaming, and to apply with tfdeploy. Also explains single-flight constraints and when to use force_new, giving clear context about conflicts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfrunsList Deploy RunsA
Read-only
Inspect

INSPECTION: List all Terraform deployment runs for a session Returns job IDs, statuses, types (apply/destroy), and timestamps for every run. Use this to see deployment history, find job IDs for log inspection, or check which deployments succeeded or failed.

REQUIRES: session_id from convoopen response (format: sess_v2_...).

ParametersJSON Schema
NameRequiredDescriptionDefault
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing. Returns the deployment-job history (apply / destroy / plan / drift) for this session.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the tool is safe. The description adds context by listing the types of data returned (job IDs, statuses, types, timestamps) and the required session_id format. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each informative: first states purpose, second describes return data, third gives usage context and prerequisite. No filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with one parameter and no output schema, the description covers input requirement, return fields, and use cases. It does not specify ordering or pagination, but that is acceptable for such a tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, but the description compensates by specifying that session_id comes from convoopen and must be in format sess_v2_..., adding critical meaning beyond the schema's type definition.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists all Terraform deployment runs for a session, with specific return fields (job IDs, statuses, types, timestamps). This differentiates it from sibling inspect tools like awsinspect or convoinspect which focus on other resources.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage scenarios: see deployment history, find job IDs for logs, check successes/failures. It also states the requirement for session_id from convoopen. However, it does not mention when not to use this tool versus alternatives like tfstatus or tflogs.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tfstatusCheck Deploy StatusA
Read-only
Inspect

MONITORING: Quick status check for Terraform deployments Check the current status of a Terraform deployment job. Use this tool to quickly check if a deployment is running, completed, or failed. Returns job status, job_id, and other metadata without streaming logs. Use tflogs to stream the actual deployment logs. REQUIRES: session_id from convoopen response (format: sess_v2_...). OPTIONAL: job_id to target a specific deployment (use tfruns to discover IDs).

LIVENESS: The response carries two distinct timestamps:

  • updated_at — last semantic change (only bumped when status / drift / version actually differ). Useful for sorting deployments; NOT a per-poll heartbeat.

  • last_refresh_at — last successful Oracle decode (stamped on every poll where reliable reached Oracle, even if nothing in the row changed). Use this to confirm reliable is still actively talking to Oracle for a long-running RUNNING job. Absent on rows that haven't been refreshed since the column was added. 💡 TIP: Examine workflow.usage prompt for more context on how to properly use these tools.

ParametersJSON Schema
NameRequiredDescriptionDefault
job_idNoOptional. Specific job ID to inspect. When omitted, returns the status of the latest job for the session.
session_idYesSession ID from convoopen — pass back EXACTLY as returned, including the ?token=... suffix (format: sess_v2_*?token=*). The suffix is part of the session credential; never strip it when summarizing.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses it's a read operation (readOnlyHint true) and adds detail about two timestamps (updated_at and last_refresh_at) and their meaning, going beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise yet informative, with clear sections (MONITORING, then body, LIVENESS, TIP). Front-loaded with purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers return values (job status, metadata, timestamps) adequately for a simple tool. Missing explicit error handling or response structure, but sufficient given low complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage, description compensates by explaining session_id format and source, and job_id purpose and source. Could be more explicit about job_id format.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it's for checking Terraform deployment status, distinguishes from tflogs (stream logs) and tfruns (discover IDs), and specifies it returns job status and metadata.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use (quick status check), when not to (for logs use tflogs), and provides prerequisites (session_id from convoopen, optional job_id from tfruns). Also includes a tip about timestamps.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.