plith

by ai.plith

Ownership verified

Server Details

AI agent infrastructure: dedup, cost prediction, validation, governance, failure intelligence.

Status: Healthy
Last Tested: 2026-05-21 13:33
Transport: Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

A4.2/5.0

Tool DescriptionsA

Average 4.2/5 across 19 of 19 tools scored. Lowest: 3.5/5.

Server CoherenceA

Disambiguation5/5

Each tool has a distinct purpose with clear boundaries, organized into functional groups (burnrate, dedupq, guardrail, pitfalldb, qualitygate, rigor). There is no overlap in functionality; for example, burnrate_estimate, burnrate_optimize, and burnrate_track serve different stages of cost management without ambiguity.

Naming Consistency5/5

Tool names follow a consistent snake_case pattern with clear prefixes (e.g., burnrate_, dedupq_, guardrail_) and descriptive suffixes (e.g., _budget, _estimate, _check). This structured naming makes it easy to identify tool groups and purposes at a glance.

Tool Count4/5

With 19 tools, the count is slightly high but reasonable given the server's broad scope covering cost management, caching, governance, failure analysis, quality validation, and workflow execution. Each tool appears necessary for its domain, though the rigor group (6 tools) might be streamlined.

Completeness5/5

The toolset provides comprehensive coverage for agent lifecycle management, including cost estimation and tracking, caching, policy enforcement, failure reporting, output validation, and structured workflow execution. There are no obvious gaps; tools support planning, execution, monitoring, and optimization phases effectively.

Available Tools

15 tools

burnrate_budgetA

Read-onlyIdempotent

Inspect

Get today's tracked LLM spend, per-model breakdown, projection, and budget alerts. Free — no credits charged.

ParametersJSON Schema

Name	Required	Description	Default
`daily_limit`	No	Optional. Daily budget in USD (e.g., 10.0 for a $10/day cap). Enables budget alerts and remaining-balance calculation.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, non-destructive, and idempotent behavior. The description adds valuable context beyond annotations by specifying that it's 'Free — no credits charged' (addressing cost implications) and mentioning 'projection' and 'budget alerts' (clarifying output scope). It does not contradict annotations, as 'Get' aligns with readOnlyHint=true.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first clause and efficiently adds context in subsequent phrases. Every sentence ('Get today's tracked LLM spend, per-model breakdown, projection, and budget alerts. Free — no credits charged.') earns its place by providing essential information without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (budget tracking with alerts), rich annotations (readOnlyHint, idempotentHint, etc.), and no output schema, the description is mostly complete. It covers purpose, cost, and output scope but could benefit from more detail on return values (e.g., format of breakdown or alerts) since there's no output schema. However, it adequately supports agent usage with the provided context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the parameter 'daily_limit' fully documented. The description does not add any additional semantic details about parameters beyond what the schema provides (e.g., it doesn't explain how 'daily_limit' affects alerts in more depth). With high schema coverage, the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Get', 'tracked', 'projection') and resources ('LLM spend', 'per-model breakdown', 'budget alerts'). It distinguishes itself from siblings like burnrate_estimate, burnrate_optimize, and burnrate_track by focusing on daily tracking and budget alerts rather than estimation, optimization, or general tracking.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('Get today's tracked LLM spend') and includes a cost benefit ('Free — no credits charged'), but it does not explicitly state when not to use it or name specific alternatives among the sibling tools (e.g., burnrate_estimate for projections or burnrate_track for broader tracking).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

burnrate_estimateA

Idempotent

Inspect

Before executing a multi-step agent plan, estimate the total LLM cost. Returns per-step breakdown and optimization suggestions. If the estimate exceeds your budget, pipe the same plan into burnrate_optimize. Costs 1 credit.

ParametersJSON Schema

Name	Required	Description	Default
`plan`	Yes	Array of plan steps with provider, model, and token estimates.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide idempotentHint=true and destructiveHint=false, indicating safe, repeatable operations. The description adds valuable context beyond annotations by specifying it's for pre-execution estimation and includes optimization suggestions, though it doesn't detail rate limits or authentication needs. No contradiction with annotations exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded and concise with two sentences that efficiently convey purpose and output without waste. Every sentence earns its place by stating when to use the tool and what it returns, making it easy to scan and understand.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (estimation with optimization suggestions), rich annotations, and full schema coverage, the description is largely complete. However, the lack of an output schema means the description doesn't detail return values (e.g., cost breakdown format), leaving a minor gap in contextual information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, thoroughly documenting the 'plan' parameter and its nested properties. The description adds minimal semantic value beyond the schema, only implying the parameter is for estimation input without providing additional syntax or format details, meeting the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('estimate the total LLM cost') and resource ('multi-step agent plan'), distinguishing it from siblings like burnrate_budget, burnrate_optimize, and burnrate_track by focusing on pre-execution estimation rather than budgeting, optimization, or tracking.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool ('Before executing a multi-step agent plan') and what it provides ('per-step breakdown and optimization suggestions'), clearly differentiating it from alternatives such as burnrate_optimize (which might focus on optimization without estimation) or burnrate_track (which likely tracks costs during/after execution).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

burnrate_optimizeA

Idempotent

Inspect

Get a cheaper equivalent plan by substituting models with lower-cost alternatives. Call after burnrate_estimate if the estimated cost exceeds your budget. Returns the optimized plan with substituted models, new per-step costs, total savings, and whether the target_budget is met. Optionally set target_budget to constrain the optimization. Costs 1 credit.

ParametersJSON Schema

Name	Required	Description	Default
`plan`	Yes	Array of plan steps. Same schema as burnrate_estimate: each step needs step, provider, model, estimated_input_tokens, estimated_output_tokens.
`target_budget`	No	Optional. Target total cost in USD.

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations cover key traits: readOnlyHint=false (mutation), idempotentHint=true (safe to retry), destructiveHint=false (non-destructive). The description adds value by clarifying the optimization method ('model substitution') and optional budget fitting, but doesn't disclose rate limits, auth needs, or detailed behavioral impacts beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence, followed by an optional feature. Both sentences earn their place by clarifying functionality without redundancy, making it efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (optimization with model substitution), rich annotations, and full schema coverage, the description is mostly complete. It lacks output schema details (e.g., what the optimized plan looks like), but annotations and context provide sufficient guidance for an agent to use it effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents parameters. The description adds minimal semantics: 'plan' is for optimization via substitution, and 'target_budget' is optional for cost checking. This aligns with the schema but doesn't provide significant extra meaning, warranting the baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get a cheaper equivalent plan via model substitution' specifies the verb ('get'), resource ('cheaper equivalent plan'), and method ('model substitution'). It distinguishes from siblings like burnrate_estimate (which estimates costs) and burnrate_budget (which likely sets budgets) by focusing on optimization through substitution.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context: use this tool to find cheaper alternatives via model substitution, optionally with a budget check. It implies usage after cost estimation (burnrate_estimate) but doesn't explicitly state when-not-to-use or name specific alternatives among siblings, keeping it at a 4.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

burnrate_trackA

Idempotent

Inspect

Log the actual cost of an LLM call after execution. Call this after every LLM request to build calibration data that improves burnrate_estimate accuracy over time. Free — no credits charged. Returns the recorded cost entry with computed margin versus the prior estimate when one exists for this model and token range.

ParametersJSON Schema

Name	Required	Description
`model`	Yes	Model identifier as returned by the provider. Examples: claude-sonnet-4-6, gpt-4o, gemini-2.0-flash, mistral-large-latest. Unknown models are accepted but cost may show as $0.
`task_id`	No	Optional task ID for cross-referencing spend with DedupQ deduplication results. Use the same task_id passed to dedupq_check to link cost tracking with deduplication.
`provider`	Yes	LLM provider identifier. Supported: anthropic, openai, google, mistral, cohere, deepseek, together, fireworks, groq. Must match the provider of the model used.
`input_tokens`	Yes	Actual prompt tokens used. Must be >= 0.
`output_tokens`	Yes	Actual completion tokens used. Must be >= 0.
`cache_read_tokens`	No	Optional. Cache-read tokens.

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable context beyond annotations: it clarifies that cost is computed server-side and the tool is free with no credits charged. Annotations provide hints (e.g., idempotentHint: true, readOnlyHint: false), but the description enhances understanding by explaining the cost aspect and free nature, which isn't covered by annotations. No contradiction with annotations is present.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and front-loaded: three short sentences that efficiently convey purpose, cost computation, and free nature. Every sentence adds value without redundancy, making it easy for an agent to quickly grasp the tool's essence.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (6 parameters, no output schema), the description is reasonably complete. It covers the core purpose, cost handling, and pricing, which complements the well-documented input schema and annotations. However, it lacks details on return values or error cases, which could be useful since there's no output schema, leaving some gaps in full contextual understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all 6 parameters (e.g., provider, model, input_tokens). The description doesn't add any parameter-specific details beyond what's in the schema, such as explaining relationships between parameters or usage nuances. With high schema coverage, the baseline score of 3 is appropriate as the description doesn't compensate but doesn't need to.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'record actual token usage' after an LLM call. It specifies the action (record) and resource (token usage) with context (after LLM call). However, it doesn't explicitly differentiate from siblings like burnrate_estimate or burnrate_budget, which likely serve different functions in the burnrate tracking system.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context ('After an LLM call') and mentions it's 'Free — no credits charged,' which provides some guidance. However, it doesn't explicitly state when to use this tool versus alternatives like burnrate_estimate (for estimation) or burnrate_budget (for budgeting), nor does it mention any exclusions or prerequisites beyond the implied post-LLM-call timing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dedupq_checkA

Idempotent

Inspect

Before executing any LLM task, check if an identical or semantically similar task has already been completed. Returns cached result on hit, saving one LLM call. On a miss, execute your task and call dedupq_complete to cache the result for future hits. Costs 1 credit.

ParametersJSON Schema

Name	Required	Description
`content`	Yes	The task content to check for duplicates. This is hashed and embedded for matching.
`task_id`	No	Optional caller task ID for tracing and cross-referencing with BurnRate.
`hash_only`	No	If true, skip vector similarity search and use exact hash matching only. Default: false.
`similarity_threshold`	No	Cosine similarity threshold for semantic matching, 0.0 to 1.0. Default: 0.80.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond annotations: it explains that the tool 'Returns cached result on hit, saving one LLM call', which clarifies the performance benefit and caching behavior. Annotations provide hints (readOnlyHint=false, openWorldHint=true, idempotentHint=true, destructiveHint=false), but the description enhances this by detailing the outcome (cached results) and purpose (avoiding duplicate LLM calls). No contradiction with annotations is present.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded, consisting of two sentences that directly state the purpose and benefit. Every sentence earns its place: the first explains what the tool does, and the second highlights the value (saving LLM calls). No unnecessary details or redundancy are present.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (checking for duplicates with semantic matching) and rich annotations (readOnlyHint=false, openWorldHint=true, etc.), the description is mostly complete. It covers the core functionality and behavioral outcome. However, without an output schema, it doesn't detail return values (e.g., structure of cached results), leaving a minor gap. The annotations help compensate, making it sufficient but not exhaustive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all parameters. The description does not add any parameter-specific details beyond what the schema provides (e.g., it doesn't explain 'content' further or provide examples). However, it implicitly references parameters by mentioning 'hashed and embedded for matching', which aligns with the schema but doesn't offer new semantics. Baseline score of 3 is appropriate given high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('check if an identical or semantically similar task has already been completed') and resource ('LLM task'), and distinguishes it from siblings by focusing on duplicate checking rather than budget tracking or guardrail management. It explicitly mentions returning cached results on hit, which differentiates it from execution tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance with 'Before executing any LLM task', indicating when to use this tool. It also implies alternatives by suggesting it 'saving one LLM call', which hints at using this instead of directly executing tasks when duplicates might exist. The context of sibling tools (like dedupq_complete) further clarifies its role in a workflow.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dedupq_completeA

Idempotent

Inspect

After executing a task, store the result so future identical or similar tasks return a cache hit via dedupq_check. Costs 2 credits.

ParametersJSON Schema

Name	Required	Description
`result`	Yes	The task result to cache. Can be any JSON value.
`content`	Yes	Original task content. Used to compute hash and embedding for future matching.
`task_id`	No	Optional task ID. Used as the database row ID if provided.
`hash_only`	No	If true, skip embedding generation. Default: false.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable context beyond annotations: it explains the caching mechanism (hash and embedding for future matching) and the purpose of storing results. Annotations cover idempotency and non-destructive behavior, but the description complements this by detailing the tool's role in a deduplication workflow. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the key action ('store the result') and explains the purpose without waste. Every word contributes to understanding the tool's role in caching.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (caching with hash/embedding), annotations provide good coverage (idempotent, non-destructive), and the description explains the caching mechanism. However, without an output schema, it doesn't describe return values or potential errors, leaving a minor gap. Overall, it's mostly complete for a caching tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents parameters. The description mentions 'content' and 'result' but doesn't add meaning beyond the schema's descriptions. It implies parameter usage (e.g., content for hash/embedding) but doesn't provide additional syntax or format details. Baseline 3 is appropriate given high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('store') and resource ('result'), and distinguishes it from its sibling dedupq_check by focusing on caching completed tasks rather than checking for duplicates. It explains the caching mechanism for future identical or similar tasks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool: 'After executing a task, store the result so future identical or similar tasks return a cache hit.' It provides clear context for usage (post-task execution) and implies an alternative (dedupq_check for pre-task checking), though it doesn't explicitly name alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

guardrail_checkA

Idempotent

Inspect

Evaluate a proposed agent action against your governance policies. Returns allow or deny with the matched policy reason. Requires at least one active policy created via guardrail_create_policy. Deterministic rule evaluation — no LLM. Costs 1 credit.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes	Agent identifier.
`proposed_action`	Yes	Action to evaluate. Must contain a 'type' field. Example: {"type": "http_request", "url": "https://external.example.com"} or {"type": "file_write", "path": "/etc/config"}.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide hints (idempotent, non-destructive, etc.), but the description adds valuable behavioral context: it specifies that the evaluation is deterministic (no LLM involvement) and returns a binary outcome (allow or deny). This goes beyond annotations by clarifying the decision-making process and output format, though it doesn't detail rate limits or auth needs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded and concise, with two sentences that efficiently convey the tool's purpose, behavior, and outcome. Every sentence adds value without redundancy, making it easy to understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (policy evaluation with deterministic rules), annotations cover safety aspects, and the description adds key behavioral details. However, without an output schema, the description only mentions 'returns allow or deny' but doesn't elaborate on the response structure or potential error cases. It's mostly complete but could benefit from more output context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters thoroughly. The description doesn't add meaning beyond what the schema provides (e.g., it doesn't explain how 'agent_id' or 'proposed_action' are used in evaluation). Baseline 3 is appropriate as the schema handles parameter documentation adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('evaluate', 'returns') and resources ('proposed agent action', 'governance policies'), and distinguishes it from siblings by focusing on policy evaluation rather than budget tracking, deduplication, or other functions. It explicitly mentions the deterministic rule evaluation approach, which further clarifies its scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by specifying it's for evaluating proposed agent actions against policies, but it doesn't explicitly state when to use this tool versus alternatives like 'guardrail_create_policy' or 'guardrail_list_policies'. It provides clear guidance on its function but lacks explicit comparisons or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

guardrail_create_policyAInspect

Create a persistent governance policy that guardrail_check evaluates on every subsequent call. Define rules using and/or/not operators over action types, resource patterns, and budget thresholds. Call this before using guardrail_check — checks require at least one active policy. Policies persist until explicitly deleted. Duplicate policy names return an error. Returns the created policy with its ID and active status.

ParametersJSON Schema

Name	Required	Description
`name`	Yes	Unique policy name per org. Examples: 'no-delete-in-prod', 'budget-cap-50', 'pii-block'.
`rules`	Yes	Array of rule objects evaluated against the proposed_action in guardrail_check. Leaf operators: eq, starts_with, contains, gt, lt (compare field to value). Compound operators: and, or, not (nest sub-rules in a rules array). Example: [{operator:'eq', field:'type', value:'file_write'}] blocks all file writes. Nested example: [{operator:'and', rules:[{operator:'eq',field:'type',value:'api_call'},{operator:'contains',field:'url',value:'prod'}]}] blocks prod API calls.
`priority`	No	Optional. Evaluation order. Default: 0.
`description`	No	Optional human-readable summary of what this policy enforces. Returned in guardrail_check responses and guardrail_list_policies output for auditability.
`action_types`	No	Optional. Restrict this policy to only evaluate when proposed_action.type matches one of these values. Examples: ['file_write', 'api_call', 'db_delete']. Omit to apply the policy to all action types regardless of type field.

Tool Definition Quality

A3.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond annotations: it specifies that names must be unique per org (a constraint not in annotations) and that the operation is free with no credits charged (cost implication). Annotations cover basic hints (e.g., readOnlyHint: false, destructiveHint: false), but the description enhances this with practical details. No contradiction with annotations exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two sentences that are front-loaded and waste no words. The first sentence states the core action, and the second adds key constraints and cost information. Every sentence earns its place by providing essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (creation with 5 parameters, no output schema), the description is somewhat complete but has gaps. It covers uniqueness and cost, but lacks details on what a governance policy does, how it's used, or what happens after creation. Annotations provide basic hints, but more behavioral context (e.g., error handling, typical use cases) would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all 5 parameters. The description does not add any parameter-specific semantics beyond what's in the schema (e.g., it doesn't explain 'rules' or 'priority' further). Baseline is 3 as the schema handles parameter documentation adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Create' and resource 'governance policy', making the purpose explicit. It distinguishes from sibling tools like 'guardrail_list_policies' by focusing on creation rather than listing. However, it doesn't specify what a 'governance policy' governs in this context, leaving some ambiguity about the domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'guardrail_check' or other sibling tools. It mentions that names must be unique per org, which is a constraint but not usage guidance. There's no indication of prerequisites, typical scenarios, or comparisons to other tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pitfalldb_queryA

Idempotent

Inspect

Check for known failure patterns before executing a task type. Returns pitfalls with severity, fix suggestions, and confidence scores. After your agent runs, submit failures via pitfalldb_report so others benefit. Costs 2 credits.

ParametersJSON Schema

Name	Required	Description
`filters`	No	Optional filters.
`task_type`	Yes	Task category: code_generation, web_search, data_analysis, etc.
`task_description`	No	Optional. Natural-language task description for semantic search.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide hints (readOnlyHint=false, openWorldHint=true, idempotentHint=true, destructiveHint=false), but the description adds valuable context beyond this: it specifies the tool is for 'checking' failure patterns, which aligns with non-destructive behavior, and mentions the return format (severity, suggestions, confidence), giving insight into output behavior not covered by annotations. No contradiction with annotations is present.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence and efficiently details the return values in the second, with no wasted words. Every sentence adds value, making it appropriately sized and well-structured for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (3 parameters, nested objects, no output schema) and rich annotations, the description is mostly complete: it explains the purpose, usage context, and return format. However, it could benefit from more detail on behavioral aspects like error handling or rate limits, which are not covered by annotations or the description, leaving a minor gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters (task_type, filters, task_description) thoroughly. The description does not add meaning beyond the schema, such as explaining how parameters interact or providing usage examples, but it implies the tool uses 'task_type' for categorization and optional inputs for filtering, which is consistent with the schema. Baseline 3 is appropriate given high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Check for known failure patterns') and resources ('before executing a task type'), and distinguishes it from siblings like 'pitfalldb_report' or 'pitfalldb_stats' by focusing on querying rather than reporting or statistical analysis. It explicitly mentions what it returns ('pitfalls with severity, fix suggestions, and confidence scores'), making the purpose highly specific and differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for usage ('before executing a task type'), implying it should be used proactively to identify potential issues. However, it does not explicitly state when not to use it or name alternatives among siblings (e.g., 'guardrail_check' might be a related tool), so it lacks full exclusion or comparative guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pitfalldb_reportA

Idempotent

Inspect

Report an agent failure. PII-scrubbed before storage. Linked to existing pitfalls if similar. Free — no credits charged.

ParametersJSON Schema

Name	Required	Description
`failure`	Yes	Failure details.
`task_type`	Yes	Task category.
`task_description`	Yes	Description of the failed task.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond what annotations provide: it discloses that data is 'PII-scrubbed before storage' (a privacy/security behavior), mentions 'Linked to existing pitfalls if similar' (a system behavior), and states 'Free — no credits charged' (a cost implication). While annotations cover idempotency and non-destructive nature, the description adds meaningful operational details that help an agent understand what happens when invoking this tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise at 3 short sentences with zero wasted words. Each sentence adds distinct value: the core purpose, data handling behavior, and cost implication. The structure is front-loaded with the primary function, followed by important operational details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a reporting tool with good annotations (idempotent, non-destructive, open world) and complete schema coverage, the description provides adequate context. It covers the key behavioral aspects (PII scrubbing, linking to existing data, no cost) that an agent needs to understand. The main gap is the lack of output schema, but the description compensates somewhat by indicating what happens to the data after submission.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the input schema already provides complete parameter documentation. The description doesn't add any specific parameter semantics beyond what's in the schema - it doesn't explain the structure of the 'failure' object or provide examples of valid 'task_type' values. The baseline of 3 is appropriate since the schema does the heavy lifting for parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Report an agent failure') and resource (the pitfalldb system), distinguishing it from sibling tools like 'pitfalldb_query' or 'pitfalldb_stats' which query or analyze data rather than submit reports. The description explicitly mentions PII-scrubbing and linking to existing pitfalls, providing additional context about what the tool does beyond just the name.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: when an agent failure occurs that needs to be reported. It mentions 'Linked to existing pitfalls if similar' which implies this tool is for reporting new failures that may connect to known issues. However, it doesn't explicitly state when NOT to use it or name specific alternatives among the sibling tools, though the context suggests it's for reporting rather than querying existing data.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

qualitygate_validateA

Idempotent

Inspect

After your agent generates output, validate it against your rules before shipping. Runs deterministic checks (regex, JSON schema, syntax) plus optional LLM-powered tone and factual analysis. Returns a structured verdict (pass, warn, or fail) with a 0-100 score and per-check issue details. Use qualitygate_trends to spot recurring failure patterns over time. Variable cost: 1 credit per deterministic check, 8 credits per LLM check.

ParametersJSON Schema

Name	Required	Description
`output`	Yes	The agent output text to validate.
`schema`	No	JSON Schema to validate output against.
`language`	No	Code language for syntax check: json, python, javascript, typescript.
`override`	No	Force pass. Requires override_reason.
`directives`	No	Directive objects. Types: must_include, must_not_include, must_match, must_not_match, must_contain, must_not_contain, min_length, max_length.
`check_types`	No	Checks to run. Auto-inferred if omitted.
`override_reason`	No	Required when override is true.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond what annotations provide. While annotations indicate non-destructive, idempotent operations with open-world hints, the description reveals this is a validation tool with both deterministic (regex, JSON schema, syntax) and optional LLM-powered analysis capabilities. It doesn't mention rate limits or authentication needs, but provides useful operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise and front-loaded in a single sentence that immediately communicates the core functionality. Every word earns its place, with no redundant information or unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters including nested objects) and lack of output schema, the description provides good context about what the tool does. However, it doesn't explain what the validation results look like or how to interpret them, which would be helpful since there's no output schema. The annotations help with behavioral understanding, but result interpretation remains unclear.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the input schema already documents all 7 parameters thoroughly. The description mentions 'deterministic checks (regex, JSON schema, syntax) plus optional LLM-powered tone and factual analysis' which maps to the check_types parameter, but doesn't add significant semantic value beyond what's already in the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('validate agent output against your rules') and resources ('deterministic checks plus optional LLM-powered tone and factual analysis'). It distinguishes itself from sibling tools like guardrail_check or dedupq_check by focusing specifically on output validation with both deterministic and LLM-powered analysis.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context about when to use this tool ('validate agent output against your rules') and what types of checks it performs. However, it doesn't explicitly state when NOT to use it or name specific alternatives among the sibling tools (e.g., guardrail_check might be for different types of validation).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rigor_executeAInspect

Execute a structured workflow end-to-end. Call rigor_plan first (free) to preview the step sequence and cost estimate before committing credits. Classifies the task, selects the optimal tool sequence, and executes each step with the right LLM model. Returns a complete deliverable — solution designs, competitive analyses, governance documents, and more. Supports SSE streaming for real-time progress, webhook callback, or polling.

ParametersJSON Schema

Name	Required	Description
`context`	No	Additional context for the workflow.
`delivery`	No	Delivery method. Default: polling (MCP clients typically can't handle SSE).
`task_type`	No	Optional hint to bypass automatic classification. Values: solution_design, requirements_analysis, code_implementation, code_review, bug_fix, root_cause_analysis, incident_response, deployment_execution, competitive_scan, financial_analysis, research_task, documentation, governance_change, compliance_audit, data_security_assessment, performance_optimization, user_story_definition, implementation_prompt_generation.
`preferences`	No	Optional workflow preferences.
`task_description`	Yes	Natural language description of the task. Be specific — include what you want produced, constraints, and context. Example: 'Design a caching layer for our API gateway with Redis integration.'

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond annotations, such as real-time progress via SSE streaming, webhook callback, or polling options, and the tool's ability to classify tasks and select optimal tool sequences. Annotations cover basic hints (e.g., readOnlyHint: false, openWorldHint: true), but the description enriches this with execution details and delivery methods, though it doesn't mention rate limits or auth needs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence, followed by supporting details in a logical flow. Every sentence adds value—explaining functionality, outputs, and delivery options—with zero waste. It's appropriately sized for a complex tool, making it highly efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity, rich annotations, and 100% schema coverage, the description is mostly complete, covering purpose, behavior, and outputs. However, the lack of an output schema means the description doesn't detail return values (e.g., format of deliverables), leaving a minor gap. It compensates well but isn't fully exhaustive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the input schema already documents all 5 parameters thoroughly. The description adds minimal parameter semantics, only implying task_description usage ('Natural language description of the task') and delivery methods. It doesn't provide additional meaning beyond the schema, so the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Execute a structured workflow end-to-end') and resources ('workflow'), distinguishing it from siblings like rigor_plan (planning) and rigor_status (status checking). It explicitly mentions what it produces ('complete deliverable — solution designs, competitive analyses, governance documents, and more'), making the purpose highly specific and differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('Execute a structured workflow end-to-end') and implies alternatives through sibling tool names (e.g., rigor_plan for planning instead of execution). However, it lacks explicit guidance on when not to use it or detailed comparisons with alternatives like rigor_status, which prevents a perfect score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rigor_planA

Read-onlyIdempotent

Inspect

Before executing a complex task, get a structured workflow plan with per-step cost estimates. Classifies your task, selects the optimal tool sequence from 110+ validated tools, and returns the full plan without executing anything. Free — no credits charged.

ParametersJSON Schema

Name	Required	Description
`task_type`	No	Optional hint to bypass automatic classification. Values: solution_design, requirements_analysis, code_implementation, code_review, bug_fix, root_cause_analysis, incident_response, deployment_execution, competitive_scan, financial_analysis, research_task, documentation, governance_change, compliance_audit, data_security_assessment, performance_optimization, user_story_definition, implementation_prompt_generation.
`preferences`	No	Optional workflow preferences.
`task_description`	Yes	Natural language description of the task. Be specific — include what you want produced, constraints, and context. Example: 'Design a caching layer for our API gateway with Redis integration.'

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide key behavioral hints (readOnlyHint: true, destructiveHint: false, idempotentHint: true, openWorldHint: true), covering safety and idempotency. The description adds valuable context beyond this: it specifies that the tool 'classifies your task, selects the optimal tool sequence,' and notes it's 'Free — no credits charged,' which are not covered by annotations. No contradictions with annotations are present.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured, with three sentences that efficiently convey purpose, functionality, and key benefits. Each sentence adds value: the first explains the core function, the second details the process, and the third highlights cost and execution status. There is no wasted or redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (planning with cost estimates and tool selection) and the absence of an output schema, the description provides good contextual completeness. It covers the tool's purpose, usage timing, and behavioral aspects like being free and non-executing. However, it does not detail the output format or plan structure, which could be helpful since there's no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, providing detailed documentation for all parameters. The description does not add any parameter-specific semantics beyond what the schema already explains, such as the purpose of 'task_description' or 'preferences.' With high schema coverage, the baseline score of 3 is appropriate, as the description relies on the schema for parameter details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('get a structured workflow plan with per-step cost estimates') and resources ('from 110+ validated tools'). It distinguishes itself from sibling tools like 'rigor_execute' by emphasizing it 'returns the full plan without executing anything,' making its role distinct in the workflow planning phase.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool: 'Before executing a complex task.' It also distinguishes it from alternatives by noting it 'returns the full plan without executing anything,' implying that 'rigor_execute' is for execution. The mention of 'Free — no credits charged' further clarifies usage context by highlighting cost implications.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rigor_statusA

Read-onlyIdempotent

Inspect

Check the status of a running or completed Rigor workflow. Returns progress, step results, and the full deliverable when complete. Use after rigor_execute with polling delivery to retrieve results.

ParametersJSON Schema

Name	Required	Description	Default
`workflow_id`	Yes	The workflow ID returned by rigor_execute (format: wr_xxx).

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds valuable behavioral context beyond the annotations: it explains that this tool retrieves results from previously executed workflows, describes what information is returned (progress, step results, deliverables), and specifies the polling use case. While annotations cover read-only, non-destructive, and idempotent properties, the description enhances understanding of the tool's role in workflow monitoring.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in two sentences: the first explains the tool's purpose and outputs, and the second provides clear usage guidelines. Every word serves a purpose with no redundancy, making it easy to parse while being comprehensive.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter tool with comprehensive annotations (read-only, non-destructive, idempotent) and clear sibling differentiation, the description is nearly complete. It explains the tool's role in the workflow, what it returns, and when to use it. The only minor gap is the lack of an output schema, but the description compensates by detailing the return values (progress, step results, deliverables).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the input schema fully documents the single required parameter. The description doesn't add additional parameter semantics beyond what's in the schema, but it does reinforce the parameter's purpose by mentioning 'workflow ID returned by rigor_execute,' which aligns with the schema description. This meets the baseline expectation for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Check the status'), target resource ('a running or completed Rigor workflow'), and output details ('progress, step results, and the full deliverable when complete'). It distinguishes from sibling tools like rigor_execute and rigor_plan by focusing on status retrieval rather than execution or planning.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('Use after rigor_execute with polling delivery to retrieve results'), which clearly differentiates it from rigor_execute and other siblings. It establishes a clear workflow dependency without being misleading about alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rigor_workflowsA

Read-onlyIdempotent

Inspect

List all Rigor workflows for your organization with filtering and pagination. Returns status, progress, capacity usage, and available actions per workflow. Use to monitor workflow state, understand concurrent limit usage, and identify stuck or completed workflows.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Page size (default 20, max 100)
`cursor`	No	Pagination cursor (created_at timestamp from previous page)
`status`	No	Filter by status (comma-separated). Valid values: executing, step_executing, completed, failed, halted, pending_approval, cancelled. E.g. "halted,failed,pending_approval"
`task_type`	No	Filter by classified task type
`counts_toward_limit`	No	Filter to workflows counting toward the concurrent limit

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided. Description implies a safe read operation but does not explicitly declare read-only behavior or disclose any side effects, rate limits, or data freshness.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences that efficiently convey purpose, return value, and use cases. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Describes return information (status, progress, capacity, actions) despite no output schema. Lacks mention of authentication or error handling, but parameters are fully documented in schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with individual parameter descriptions. The tool description adds general context about filtering and pagination but does not add specific meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List all Rigor workflows' with specific verb and resource. It distinguishes from siblings like rigor_execute and rigor_plan which perform other actions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states usage for monitoring workflow state, concurrent limit usage, and identifying stuck/completed workflows. Does not mention when not to use or explicitly name alternatives, but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Resources

Need Help?