Wever Labs Agentic Rails
Server Details
Agentic rails for complex workflows with receipts, fees, and MCP tool access.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 2.4/5 across 66 of 66 tools scored. Lowest: 1.1/5.
Many tools share similar action prefixes like 'build_', 'run_', 'read_', 'verify_', and 'validate_', leading to potential confusion. For example, 'build_agent_discovery_distribution_plan', 'build_external_agent_client_package', and 'build_external_agent_client_runner' are all distinct but have overlapping scopes. Long tool names with multiple qualifiers still create ambiguity.
Naming is inconsistent: some tools follow verb_noun pattern (e.g., 'activate_external_directory_credentials', 'list_rails'), while others use longer descriptive phrases (e.g., 'bind_x402_facilitator_settlement', 'run_multi_agent_rail_benchmark'). The mix of short and verbose naming, along with varied word order, reduces predictability.
66 tools is excessive for a single server, especially given the apparent domain of agentic rails and credentials. This likely indicates that the server is trying to cover too many sub-domains or granular operations, making it unwieldy for an agent. The tool surface should be split into smaller, focused servers.
The tool set appears to cover a wide range of functionalities—directory, credentials, payments, receipts, verification, benchmarks, etc. However, the extreme granularity and overlap suggest that the surface may be overcomplete, with many unnecessary tools. Gaps are hard to identify due to the sheer number, but the lack of clear CRUD patterns (e.g., no 'delete' tools for many resources) indicates incomplete lifecycle coverage.
Available Tools
66 toolsactivate_external_directory_credentialsActivate external directory credentialsCInspect
Check target credential environment and create credential activation records.
| Name | Required | Description | Default |
|---|---|---|---|
| targets | No | ||
| submission_target | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=false, so the description adds no behavioral context beyond stating it creates records. Missing details like side effects, prerequisites, or what 'checking' entails.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Extremely short, but the single sentence is grammatically incomplete and lacks structure. Could be improved by breaking into clearer statements.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
No output schema, so description should explain return values or side effects. It does not mention what 'checking' results in or what 'activation records' contain.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, and the description fails to explain the two parameters 'targets' and 'submission_target'. No meaning added beyond schema names.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses verbs 'check' and 'create' and specifies resources 'credential environment' and 'credential activation records', providing a clear purpose. However, it does not differentiate from sibling tools like 'issue_agent_credential' or 'register_issued_credential'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus alternatives. No exclusions or context provided, making it hard for an agent to decide when to invoke this tool.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
adapt_directory_submissionAdapt directory submissionCRead-onlyInspect
Return target-specific directory adapter payload.
| Name | Required | Description | Default |
|---|---|---|---|
| submission_target | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, so the read-only nature is covered. However, the description adds no extra behavioral context (e.g., what 'adapter payload' entails, auth needs, or side effects), failing to add value beyond structured data.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single efficient sentence with no wasted words. However, it sacrifices informativeness for brevity, as it lacks detail needed for a tool with many siblings.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of many sibling tools and the tool's low complexity (1 param, no output schema), the description is too minimal. It does not explain what the payload is for, how to use the target, or what distinguishes this from similar tools, leaving the agent underinformed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage, the single parameter 'submission_target' is only hinted at by the word 'target-specific' in the description. No format, allowable values, or examples are provided, insufficiently compensating for the schema gap.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description 'Return target-specific directory adapter payload' states the verb and resource but is vague. It does not clearly differentiate from siblings like 'run_directory_submission_adapter' or 'build_external_discovery_submission_pack', making it ambiguous in a large sibling list.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus the many related siblings. No when-to-use, when-not-to-use, or alternative recommendations, leaving the agent without decision-making context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
bind_x402_facilitator_settlementBind x402 facilitator settlementCInspect
Bind x402 challenge metadata to facilitator readiness or sandbox settlement state.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_key | No | ||
| challenge_id | No | ||
| movement_type | No | ||
| movement_amount_units | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description indicates a write operation (consistent with readOnlyHint: false) but fails to disclose any behavioral traits like side effects, required permissions, or whether changes are reversible.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single sentence but is vague and not front-loaded with the essential purpose. It could be more concise if it contained clearer information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema and zero parameter descriptions, the description is critically incomplete, leaving the agent without sufficient context to use the tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, yet the description provides no explanation for any of the four parameters, including the meaning of 'rail_key', 'challenge_id', 'movement_type', or 'movement_amount_units'.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states a specific verb 'bind' and mentions 'x402 challenge metadata to facilitator readiness or sandbox settlement state', but the terms are technical and undefined, leaving ambiguity about the exact operation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus siblings such as 'create_x402_payment_challenge' or 'get_movement_fee_schedule', nor any prerequisites or limitations.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
build_a2a_callback_envelopeBuild A2A callback envelopeDInspect
Build callback packet.
| Name | Required | Description | Default |
|---|---|---|---|
| event_type | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=false (i.e., not read-only), but the description adds no behavioral details such as persistence requirements, side effects, or authorization needs. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Extremely short but under-specified; conciseness should not sacrifice necessary detail. The description fails to earn its place by providing any meaningful information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the lack of output schema, no parameter explanations, and no behavioral context, the description is completely insufficient for an agent to understand the tool's purpose and usage.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0% and the description does not explain the single parameter 'event_type' or the allowed additional properties. The agent has no information on how to correctly populate the input.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description is 'Build callback packet,' which is a tautology of the tool name 'Build A2A callback envelope.' It does not specify what a callback packet or envelope is, nor does it distinguish from siblings like build_external_agent_client_package or create_a2a_task_lifecycle.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool vs. alternatives. With over 60 sibling tools, the lack of usage context makes it impossible for an agent to decide correctly.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
build_agent_discovery_distribution_planBuild agent discovery distribution planCRead-onlyInspect
Return target surfaces, required files, verification checks, and cadence.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The annotation readOnlyHint: true indicates read-only behavior, and the description states 'Return...' which aligns. However, the description adds no further behavioral context beyond the annotation, so it earns a baseline score of 3.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
While extremely short, the description is under-specified and lacks essential context. It is not concise in a helpful way; it omits critical information needed for correct tool selection.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity indicated by many sibling tools and no output schema, the description is insufficient. It does not explain what the plan includes, how to interpret the output, or when to use this tool over similar build tools.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
There are no parameters, so the schema coverage is 100%. The description does not need to add parameter details. Baseline 4 applies per the rule for zero parameters.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description 'Return target surfaces, required files, verification checks, and cadence' is vague and does not specify what tool this is for or how it differs from siblings like build_external_discovery_submission_pack. It lacks a clear verb-resource relationship.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives. The description does not mention any context, prerequisites, or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
build_external_agent_client_packageBuild external agent client packageDRead-onlyInspect
Return client manifest and smoke sequence.
| Name | Required | Description | Default |
|---|---|---|---|
| package_target | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds no behavioral insight beyond the annotations. The readOnlyHint is already present, and the description does not clarify whether 'build' implies mutation or if the tool simply reads existing data. The name 'build' and description 'Return' create ambiguity.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is overly concise to the point of being uninformative. While it is short, it sacrifices clarity and actionable detail.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity implied by the tool name, one parameter with no schema descriptions, and no output schema, the description is severely incomplete. It fails to explain what 'client manifest and smoke sequence' are or what the tool accomplishes.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, and the description does not mention the 'package_target' parameter at all. The agent receives no guidance on what this parameter means or how to use it.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description 'Return client manifest and smoke sequence' uses the vague verb 'return' and does not clearly state the tool's action (build vs. return). It names the outputs but fails to differentiate from siblings like build_external_agent_client_runner or build_external_discovery_submission_pack.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives. The description lacks any context, preconditions, or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
build_external_agent_client_runnerBuild external agent client runnerCRead-onlyInspect
Return external client files and pass markers.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint=true, and 'Return' aligns with a read-only operation. However, no additional behavioral traits are disclosed beyond what annotations provide, so the description adds minimal value.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single short sentence, which is concise. However, it is too sparse to be informative.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema and sparse annotations, the description is incomplete. It does not explain what the returned files or markers represent, nor the tool's role in the build context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
There are zero parameters, and schema coverage is 100%. The baseline for 0 parameters is 4, and the description does not need to add parameter semantics.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description 'Return external client files and pass markers' is vague. It does not specify what 'external client files' or 'pass markers' are, nor does it distinguish this tool from siblings like 'build_external_agent_client_package' or 'build_public_agent_client_repo'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives. The description lacks context for when it is appropriate to invoke this tool.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
build_external_discovery_submission_packBuild external discovery packDRead-onlyInspect
Build directory and registry objects.
| Name | Required | Description | Default |
|---|---|---|---|
| submission_target | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description 'Build directory and registry objects' implies creating or modifying state, which contradicts the annotation 'readOnlyHint: true' indicating the tool is read-only. No other behavioral traits (e.g., side effects, auth needs) are disclosed.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise (4 words) but at the cost of being under-specified. It is not informative enough to guide tool invocation.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite a simple parameter set, the description fails to explain the tool's purpose, parameter meaning, return values, or how it fits among many sibling build tools. It is completely inadequate for effective use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The sole parameter 'submission_target' has no schema description (0% coverage) and the tool description adds no explanation of its purpose, allowed values, or behavior.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the verb 'build' and targets 'directory and registry objects', but fails to specify what a 'discovery submission pack' is or how it differs from sibling build tools like 'build_agent_discovery_distribution_plan'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus its siblings. There are no exclusions or context cues to help an agent choose between similar build tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
build_mcp_package_metadataBuild MCP package metadataCRead-onlyInspect
Return registry-grade MCP package metadata and public files.
| Name | Required | Description | Default |
|---|---|---|---|
| package_version | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint=true, consistent with 'Return'. However, the description adds no further behavioral context (e.g., permissions, rate limits, or what 'public files' entails).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence is concise but overly minimal. No structure or logical grouping, though acceptable for simple tools.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With only one parameter and no output schema, the description should at least document the parameter. It fails to do so, leaving the tool incompletely specified.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage for parameters is 0% and the description does not explain the 'package_version' parameter—its format, required format, or role. This leaves the agent guessing.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states 'Return registry-grade MCP package metadata and public files', clearly indicating the verb and resource. It distinguishes from sibling tools like 'build_external_agent_client_package' by specifying metadata focus, but does not explicitly differentiate.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus alternatives. The description does not mention context, prerequisites, or exclusions, leaving the agent to infer usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
build_public_agent_client_repoBuild public agent client repoCRead-onlyInspect
Return GitHub-ready external client repo package and smoke tests.
| Name | Required | Description | Default |
|---|---|---|---|
| release_version | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint=true, which aligns with the 'Return' verb. The description adds that the output is a package and smoke tests, but no further behavioral context (e.g., caching, state changes) is provided.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
A single sentence with 10 words, no wasted verbiage. It is front-loaded, but its brevity sacrifices necessary detail.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool name suggests a build process and the sibling tools imply a complex ecosystem, the description omits expected output details, success criteria, and parameter explanation, leaving the agent under-informed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has one parameter (release_version) with 0% description coverage, and the description does not mention it at all. The agent has no clue what the parameter means or how to use it.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description 'Return GitHub-ready external client repo package and smoke tests' clearly states the action (return) and the resource (repo package and tests). However, it does not distinguish from sibling tools like 'build_external_agent_client_package' or 'build_external_agent_client_runner', which have overlapping purposes.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus alternatives. The description lacks context on prerequisites or scenarios, leaving the agent without decision support.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
create_a2a_task_lifecycleCreate A2A task lifecycleDInspect
Create, update, complete, or read A2A task lifecycle records.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_key | No | ||
| receipt_id | No | ||
| a2a_task_id | No | ||
| lifecycle_mode | No | ||
| production_run_id | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations only provide readOnlyHint=false, so the description carries the burden of behavioral disclosure. It mentions multiple operations but does not clarify side effects, authentication needs, or rate limits. The inclusion of 'read' alongside mutations is not contradictory but adds ambiguity.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single sentence, which is concise, but it lacks necessary detail. It does not effectively communicate the tool's capabilities, making it under-specified rather than efficiently informative.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (5 parameters, no output schema, no behavioral annotations), the description is severely incomplete. It fails to define what a lifecycle record is, how to use the parameters, or what the tool returns.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 0%, meaning no parameter descriptions exist. The description does not explain any of the five parameters (rail_key, receipt_id, etc.), leaving the agent without semantic understanding beyond the raw schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description lists multiple operations (create, update, complete, read) but the tool name suggests solely creation. It does not differentiate from sibling tools like build_a2a_callback_envelope or read_callback_receipt_records, leaving the agent uncertain about when to use this tool.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives, nor are there any prerequisites or exclusions. With many sibling tools, the agent lacks context for appropriate invocation.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
create_payment_challengeCreate payment challengeCInspect
Return payment-required object for rail movement.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_key | No | ||
| movement_amount_units | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=false, suggesting mutation. The description only says 'Return', which is ambiguous about whether it creates or reads. It does not disclose effects like whether a payment is initiated or if the operation is idempotent.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise at one sentence, but it is too minimal and lacks key information. It is not front-loaded with the most critical details.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema and minimal annotations, the description should describe the return structure and required parameters. It only vaguely mentions a 'payment-required object' and does not address the two parameters or behavior.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 0% and the description entirely omits parameter details. The parameters 'rail_key' (enum) and 'movement_amount_units' (number) are not explained, leaving the agent without necessary guidance.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the tool returns a 'payment-required object for rail movement,' which clearly specifies the verb and resource. However, it does not differentiate from the sibling tool 'create_x402_payment_challenge', which has a similar purpose.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
There is no guidance on when to use this tool versus alternatives like 'create_x402_payment_challenge'. The description lacks context for selecting this tool.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
create_stripe_movement_fee_checkoutCreate Stripe movement fee checkoutCInspect
Create Stripe Checkout Session for movement fee collection, or return env setup requirements.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_key | No | ||
| movement_type | No | ||
| idempotency_key | No | ||
| movement_amount_units | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds some behavioral context by indicating conditional behavior (returns env setup if prerequisites missing), but does not disclose side effects, authorization needs, or error states. Annotations already indicate it is a write operation (readOnlyHint false), so the description adds limited value beyond that.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is very concise (one sentence), but this conciseness sacrifices essential details about parameters, return values, and edge cases, making it less useful for an AI agent.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with 4 parameters, no output schema, and conditional behavior, the description is incomplete. It fails to explain the success return value (e.g., session URL) and provides no parameter details, leaving significant gaps for correct invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage, the description must compensate but provides no explanation for any of the four parameters (rail_key, movement_type, idempotency_key, movement_amount_units). The agent has no semantic guidance on how to populate them.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool creates a Stripe Checkout Session for movement fee collection and mentions an alternative outcome (returning env setup requirements). This provides a specific verb and resource, but does not explicitly differentiate from sibling tools like create_payment_challenge.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context by hinting that it may return environment setup requirements instead of creating a session, but it does not provide explicit guidance on when to use this tool vs. alternatives, nor does it list prerequisites or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
create_x402_payment_challengeCreate x402 payment challengeDInspect
Return optional HTTP 402/x402-compatible payment challenge envelope with payment_reference fallback.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_key | No | ||
| movement_type | No | ||
| movement_amount_units | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=false (write operation), but the description says 'Return ... envelope' which implies no side effects. It does not disclose if state changes occur (e.g., record creation, payment reference storage). The description does not add behavioral context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single sentence but vague and ambiguous. It sacrifices clarity for brevity. 'Optional' and 'fallback' are unclear. The structure does not front-load the key action.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With a complex tool (3 params, no output schema, additionalProperties allowed), the description is critically incomplete. It omits purpose, parameter meanings, return value structure, and side effects. Essential context is missing.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%. The description provides no explanation of the three parameters (rail_key, movement_type, movement_amount_units) or the additional properties allowed. The agent gets no help understanding what values to provide.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses 'Return' which is a passive verb, while the name suggests 'Create'. It does not clearly state what action the tool performs (generate, request, store?) and does not differentiate from the sibling 'create_payment_challenge' which likely has similar purpose.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is given on when to use this tool versus alternatives like 'create_payment_challenge' or other payment-related siblings. There is no context about prerequisites or conditional use.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
credential_aware_mcp_writeCredential-aware MCP writeCInspect
Validate credential before returning a write envelope.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_key | No | ||
| agent_key | No | ||
| write_tool | No | ||
| idempotency_key | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description only mentions credential validation and returning a write envelope, but does not disclose side effects, required permissions, or what the write envelope represents. Annotations only provide readOnlyHint=false, so the description should add more behavioral context but fails to do so.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single sentence, which is concise, but it lacks structure and does not front-load enough critical information. It could benefit from being slightly longer to include parameter context.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with 4 parameters, no output schema, and no schema descriptions, the description is severely incomplete. It does not provide enough context for an agent to correctly select and invoke the tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage, the description should explain the meaning and usage of the four parameters (rail_key, agent_key, write_tool, idempotency_key). It provides no information, leaving the agent without guidance.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the verb 'validate credential' and the resource 'write envelope', which is specific. However, it does not distinguish the tool from sibling tools like enforce_agent_credential or validate_production_write, which have similar purposes.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives. The description lacks any context about prerequisites, when to invoke it, or when not to use it.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
enforce_agent_credentialEnforce agent credentialCInspect
Accept or reject scoped write.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_key | No | ||
| agent_key | No | ||
| idempotency_key | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations set readOnlyHint=false, confirming a write operation. The description adds that it accepts or rejects a write, but does not disclose side effects, authorization needs, or what 'scoped write' entails.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single sentence, making it concise but too brief to convey necessary details for a 3-parameter tool.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has 3 parameters, no output schema, and no guidance, the description is completely inadequate for an AI agent to select and invoke correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 3 parameters (rail_key, agent_key, idempotency_key) with 0% description coverage. The description provides no parameter explanations, failing to add value beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description 'Accept or reject scoped write' indicates a decision action on a write, but 'scoped write' is vague and does not differentiate from siblings like 'issue_agent_credential' or 'register_issued_credential'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool vs alternatives. The description lacks context for tool selection.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
execute_directory_submissionExecute directory submissionCInspect
Create target-specific directory submission record.
| Name | Required | Description | Default |
|---|---|---|---|
| submission_target | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=false, implying mutation. Description adds 'Create' but no details on side effects, permissions, or what 'target-specific' entails. Lacks behavioral depth.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence is concise but lacks necessary detail. Being too brief reduces effectiveness; it is not well-structured for agent comprehension.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of many sibling tools and no output schema, the description is insufficient. It does not explain return values, prerequisites, or how this relates to other directory submission tools.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 0% and description does not explain the 'submission_target' parameter. The term 'target-specific' is vague; agent cannot infer valid values or constraints. AdditionalProperties allowed but undocumented.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description uses verb 'Create' and resource 'directory submission record', clearly indicating its action. The qualifier 'target-specific' distinguishes it from general submission records. However, the title says 'Execute' while description says 'Create', causing slight inconsistency.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus sibling tools like adapt_directory_submission or run_directory_submission_adapter. No exclusions or context provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
execute_external_registry_submission_runExecute external registry submission runCInspect
Create target-specific registry submission and directory status records.
| Name | Required | Description | Default |
|---|---|---|---|
| live_submit | No | ||
| target_endpoint | No | ||
| submission_target | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The annotation confirms a write operation (readOnlyHint=false), which aligns with the description. However, no additional behavioral details are given, such as side effects, authentication needs, or rate limits. The description adds minimal context beyond what the annotation already provides.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise, consisting of a single sentence that conveys the core purpose without extraneous information. It is front-loaded but lacks structural elements like examples or parameter details.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the lack of parameter documentation and output schema, the description is insufficient for an agent to correctly invoke the tool. Essential information about parameter values and behavior is missing.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, yet the description provides no explanation for any of the three parameters ('live_submit', 'target_endpoint', 'submission_target'). The agent has no way to understand their semantics or required format.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool creates target-specific registry submission and directory status records, using a specific verb and resource. However, it does not differentiate from similar sibling tools like 'execute_directory_submission' beyond the 'external' context implied by the name.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives. There is no mention of prerequisites, context, or exclusions, leaving the agent to infer usage solely from the name.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
execute_first_paid_production_runExecute paid production runDInspect
Execute proof run through quote, package, receipt, and verification.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_key | No | ||
| idempotency_key | No | ||
| payment_reference | No | ||
| movement_amount_units | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=false, confirming it's a write operation, but description provides no additional behavioral context. Does not disclose side effects, like what resources are created (e.g., a run record, receipt), or any required authentication or payment flow.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence, short, but lacks clarity and structure. Could be more concise if clearer, but currently under-informative rather than concise.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 4 parameters, no output schema, and complexity of a paid production run involving quote, package, receipt, and verification, the description is entirely inadequate. Does not explain the workflow, expected inputs/outputs, or how it relates to other tools like 'quote_movement_fee' or 'record_receipt'.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 4 parameters with 0% description coverage. Description adds no meaning; does not explain any parameter (e.g., what 'rail_key' enum represents, how 'idempotency_key' works, format of 'payment_reference', or units of 'movement_amount_units').
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description states 'Execute proof run through quote, package, receipt, and verification' but is vague and does not clearly differentiate from sibling tools like 'execute_directory_submission' or 'start_rail_run'. The term 'proof run' is confusing and likely a internal jargon without explanation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus alternatives. Does not specify prerequisites, such as requiring a quote or payment method, or when not to use it.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
fetch_return_packageFetch return packageDRead-onlyInspect
Return package contract and hash.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_key | No | ||
| return_package_id | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The annotations indicate readOnlyHint=true, so the tool is read-only. The description adds no behavioral details beyond that, such as what gets fetched, any side effects, or limitations. Since annotations already declare the read-only nature, the description contributes little.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is very short but fails to be concise because it is unclear. It wastes the single sentence without providing useful information. Better to have a clearer purpose or be omitted.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has two parameters with no descriptions, no output schema, and the description is insufficient, the tool definition is far from complete. The agent would not understand what the tool returns or how to use the parameters correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The schema has no descriptions for the two parameters (rail_key with enum and return_package_id string). The description does not explain what either parameter means or how they affect the operation. With 0% schema description coverage, the description fails to compensate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description 'Return package contract and hash.' is vague and does not clearly state what the tool does. It could mean returning a package's contract and hash, but the phrasing is ambiguous. The name suggests fetching, but the description is not specific about the resource or action.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus any of the many sibling tools. There is no mention of context, prerequisites, or alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_active_rail_catalogGet active rail catalogARead-onlyInspect
Return all active rails and required fields.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The readOnlyHint annotation already indicates a safe read operation. The description adds that it returns active rails and required fields, which provides context beyond the annotation but does not disclose additional behavioral traits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
A single, front-loaded sentence with no waste. Every word is necessary.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a parameterless read-only tool, the description adequately conveys the output. No output schema exists, but the description is sufficient for an agent to understand the returned data.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
No parameters exist (schema coverage 100%). The baseline for zero parameters is 4, and the description does not need to add parameter details.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool returns 'all active rails and required fields', which is specific and informative. However, it does not distinguish from sibling 'list_rails', which may serve a similar role.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus alternatives like 'list_rails' is provided. The agent must infer usage from the name alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_free_agent_tools_indexGet free agent tools indexCRead-onlyInspect
Return zero-cost tools and movement fee trigger.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The readOnlyHint annotation already indicates the tool is read-only. The description adds no behavioral context beyond the functional output, such as null return behavior, authentication requirements, or rate limits. No additional traits are disclosed.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single short sentence, which is concise and front-loaded with the main point. However, it lacks structure and could benefit from additional details about output format.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The tool has no output schema, so the description should convey what the return value looks like. It only says 'zero-cost tools and movement fee trigger' without specifying if it's a list, object, or other structure. This is incomplete for an agent to understand the result.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
There are no parameters in the schema, so the baseline is 4. The description does not need to add parameter meaning. It correctly does not attempt to describe parameters that don't exist.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states it 'Return zero-cost tools and movement fee trigger,' which gives a general idea of the output but is vague. It does not specify the format of the index or how the 'movement fee trigger' is represented, making the purpose only moderately clear.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus siblings like get_active_rail_catalog or get_movement_fee_schedule. The description lacks context for appropriate usage, which is critical given the large number of sibling tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_movement_fee_scheduleGet movement fee scheduleBRead-onlyInspect
Return transaction and escrow movement fee schedule.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_key | No | ||
| movement_type | No | ||
| movement_amount_units | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide readOnlyHint=true. The description says 'Return', consistent with read-only. No additional behavioral traits disclosed, but for a simple read operation, the description adds minimal value beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single sentence, concise and front-loaded. However, it could be expanded with parameter info without losing conciseness.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has 3 parameters, no output schema, and only a basic annotation, the description lacks sufficient context. It doesn't explain return value structure, parameter dependencies, or typical usage scenarios.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage, the description must compensate but fails to explain the three parameters (rail_key, movement_type, movement_amount_units). The enums are self-explanatory, but the role of movement_amount_units is unclear from the description alone.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Return' and the resource 'transaction and escrow movement fee schedule'. It distinguishes the tool from siblings like 'quote_movement_fee' by indicating it returns the schedule, not a quote.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool vs alternatives such as 'quote_movement_fee'. No prerequisites or context provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_run_statusGet run statusDRead-onlyInspect
Return run state.
| Name | Required | Description | Default |
|---|---|---|---|
| production_run_id | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=true, and the description 'Return run state.' does not contradict this. However, it adds no additional behavioral context such as side effects, permissions, or rate limits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely short (3 words) but under-specifies the tool's behavior. It is concise but not informative, failing to earn its minimal length.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the lack of output schema and the simple input schema, the description should provide context on the return format or possible states. It provides none, leaving the agent with insufficient information.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The description fails to mention the single parameter 'production_run_id' at all. With 0% schema description coverage, the description should compensate but does not.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description 'Return run state.' adds minimal information beyond the tool name 'get_run_status'. It uses a vague term 'state' without specifying what it includes, and does not distinguish from sibling tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives. The description lacks context on prerequisites, typical use cases, or when not to use it.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_tool_costsGet tool costsCRead-onlyInspect
Return zero tool access cost and movement fee policy.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_key | No | ||
| movement_type | No | ||
| movement_amount_units | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The annotations already declare readOnlyHint=true, indicating a safe read operation. The description adds that it returns cost and fee policy, which is consistent. However, it does not reveal any additional behavioral traits beyond what annotations imply.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single short sentence, which is concise but lacks clarity and structure. It does not front-load key information effectively and omits necessary details about the tool's purpose and parameters.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has three parameters, no output schema, and no explanations, the description is incomplete. It does not specify what the return value looks like, how the parameters affect the result, or what 'movement fee policy' entails. The agent lacks sufficient information to use the tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, meaning the parameters have no descriptions in the schema. The tool description does not explain the meaning of 'rail_key', 'movement_type', or 'movement_amount_units', nor does it provide context on how to use them. This leaves the agent with no semantic understanding of the parameters.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description 'Return zero tool access cost and movement fee policy' is ambiguous. It is unclear whether it returns the cost (which is zero) or the policy about zero cost. The verb 'return' is clear but the object is confusing, and it does not differentiate from sibling tools like 'get_movement_fee_schedule' or 'quote_movement_fee'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. There is no mention of prerequisites, when not to use it, or how it relates to sibling tools such as 'get_movement_fee_schedule'.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
issue_agent_credentialIssue agent credentialDInspect
Issue scoped credential envelope.
| Name | Required | Description | Default |
|---|---|---|---|
| scopes | No | ||
| agent_id | No | ||
| allowed_rails | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Beyond the annotation readOnlyHint=false indicating a write operation, the description adds no behavioral details. No side effects, security requirements, or response format are mentioned.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is only three words, which is excessively brief. It sacrifices clarity for conciseness, leaving the agent uninformed.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With no output schema, no parameter descriptions, and limited annotations, the description fails to provide a complete picture. The tool has three unannotated parameters and likely returns a credential, but none of this is addressed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 0% and the description does not explain any of the three parameters (scopes, agent_id, allowed_rails). The term 'scoped' loosely hints at scopes but adds no substantive meaning.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description is 'Issue scoped credential envelope.' This is vague; 'scoped credential envelope' is unclear and does not differentiate from sibling tools like 'enforce_agent_credential' or 'register_issued_credential'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus alternatives. No mention of prerequisites or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_railsList railsBRead-onlyInspect
Return active rail contracts and movement fee schedule.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, so the description adds no behavioral context beyond what is already known. It correctly implies a read-only operation but lacks details on data freshness or pagination.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence with no extraneous information. Front-loaded purpose and efficient phrasing make it highly concise.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With no output schema and simple input, the description is adequate but could be improved by explaining what 'active' means or indicating the output format. Given the tool's simplicity, it meets minimum expectations but leaves some ambiguity.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
No parameters, so baseline score of 4 applies. The description adds no additional parameter information, but none is needed due to zero parameters and 100% schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool returns active rail contracts and movement fee schedule. However, it does not distinguish between similar sibling tools like get_active_rail_catalog and get_movement_fee_schedule, which likely overlap in functionality.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus similar alternatives. Sibling tools include get_active_rail_catalog and get_movement_fee_schedule, but the description provides no context for choosing this specific combined view.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
promote_benchmark_reportPromote benchmark reportCInspect
Persist multi-agent benchmark record as a release gate.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_count | No | ||
| passed_count | No | ||
| benchmark_hash | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Description says 'persist' which aligns with readOnlyHint=false, but provides no additional behavioral detail. No mention of side effects (e.g., overwrites existing release gates?), required permissions, or error conditions. Without annotations, the description should compensate but fails to.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
One short sentence, no verbosity, but it omits crucial information. Conciseness is good but at the expense of completeness.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description is too minimal for a tool with 3 parameters, no output schema, and no explicit annotations. It lacks context like what constitutes a valid benchmark record, return value, or failure modes. An agent would struggle to use it correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Three parameters (rail_count, passed_count, benchmark_hash) with 0% schema description coverage. The description does not explain their purpose or constraints. For example, it does not clarify if rail_count and passed_count are counts of successful runs or something else.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states it persists a multi-agent benchmark record as a release gate. Verb 'persist' and resource are specific, and it distinguishes from sibling tools like run_multi_agent_rail_benchmark (runs benchmark) and store_regression_report (stores reports).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No explicit guidance on when to use this tool vs alternatives. With 60+ siblings, the description should indicate prerequisites or context (e.g., after running a benchmark). It does not say when not to use or provide exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
publish_public_repo_ci_badgePublish public repo CI badgeCInspect
Return GitHub Actions smoke workflow and persist CI badge status.
| Name | Required | Description | Default |
|---|---|---|---|
| repo_name | No | ||
| release_version | No | ||
| transcript_hash | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=false, so the description's mention of persisting aligns. However, it discloses no additional behavioral traits such as side effects, required permissions, or rate limits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The single sentence is terse but wastes the opportunity to convey clarity. It packs two verbs without helping the agent understand the tool's core purpose.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With no output schema and 0% schema coverage, the description fails to specify return values, the meaning of 'persist', or the badge status format. Critical information is missing.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, and the description does not explain any parameter (repo_name, release_version, transcript_hash). The agent has no clue what these parameters represent.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description mentions two actions (return and persist) but does not clearly define a single, specific operation. The title 'Publish' and verb 'Return' create confusion. It is distinct from sibling tools but lacks specificity.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives, nor does it mention prerequisites or exclusions. The agent is left without context for appropriate invocation.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
quote_movement_feeQuote movement feeCRead-onlyInspect
Return fee for a movement amount and type.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_key | No | ||
| movement_type | No | ||
| movement_amount_units | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The annotation 'readOnlyHint: true' already indicates a safe read operation. The description adds no new behavioral context beyond what the annotation provides. It does not disclose any side effects, authentication needs, or response format, so transparency is minimal.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single sentence of 8 words, front-loaded with the verb 'Return'. Every word is necessary, and there is no verbose or redundant phrasing.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite having no output schema and non-trivial parameters (enums, rail_key), the description offers no explanation of return values, measurement units, or what the fee represents. The agent would need additional context to use this tool effectively.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The description mentions 'movement amount and type' which weakly hints at two parameters (movement_type, movement_amount_units) but ignores 'rail_key'. With 0% schema description coverage, the description should compensate but fails to provide meaningful parameter semantics or examples.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool returns a fee for a movement amount and type. It uses a specific verb ('return') and identifies the resource. However, it does not explicitly distinguish between sibling tools like 'get_movement_fee_schedule', so clarity is high but not maximal.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives. There is no mention of prerequisites, exclusions, or recommended scenarios, leaving the agent to infer usage from context alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
quote_runQuote rail runCRead-onlyInspect
Return tool_access_units, movement_fee_units, total_units, currency, and return fields.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_key | No | ||
| movement_type | No | ||
| movement_amount_units | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations provide readOnlyHint=true. Description adds context about return fields, confirming it is a read operation. However, no details about side effects, auth requirements, or rate limits beyond what annotations provide.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence, front-loaded with return fields. Concise and direct, but could be more informative without adding length.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Missing key context: what a 'rail run' is, how this tool fits into a workflow, and interpretation of return fields. With many siblings and no output schema, the description is too minimal for an agent to fully understand usage.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 0%, meaning description adds no explanation of parameters like rail_key or movement_type. The description focuses on return fields, offering no guidance on how to set inputs.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description specifies the exact fields returned: tool_access_units, movement_fee_units, total_units, currency, and return fields. Clearly indicates a quoting operation for rail runs, but does not differentiate from sibling tools like quote_movement_fee.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when or why to use this tool over alternatives (e.g., quote_movement_fee, start_rail_run). There is no mention of prerequisites, typical scenarios, or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
read_callback_receipt_recordsRead callback receipt recordsBRead-onlyInspect
Return recent callback and receipt records.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, and the description adds 'recent' but provides no further behavioral details (e.g., how recency is determined, pagination, limits). With annotations present, the bar is lower, yet the description adds minimal value.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence, no redundancy. Efficient but under-specified for a read operation. Could be improved with a second sentence clarifying scope or constraints.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema, the description should compensate with details about return content, sorting, or limits. 'Recent' is undefined. For a simple read tool, this is incomplete, especially with many sibling tools that may overlap.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
There are zero parameters, and schema coverage is 100%. Description adds no parameter-specific meaning; baseline for no params is 4. The word 'recent' is vague but doesn't negatively impact this dimension.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly identifies the verb 'Return' and the resource 'recent callback and receipt records', distinguishing it from sibling tools like store or sign operations. However, 'recent' is imprecise, and the scope is not fully explicit.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus alternatives (e.g., read_receipt_ledger). No context on prerequisites or exclusions, leaving the agent to infer usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
read_directory_status_consoleRead directory status consoleBRead-onlyInspect
Return recent directory status records.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description 'Return...' is consistent with the readOnlyHint annotation, but it adds no additional behavioral context. It does not clarify what 'recent' means (e.g., time window, ordering) or disclose any limitations or side effects.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence with no wasted words. However, the term 'recent' could be more precise, and the sentence is slightly vague.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the absence of an output schema, the description should explain what the returned records contain, but it only states 'directory status records' without detailing fields or structure. The tool is simple, but completeness is lacking.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With zero parameters and 100% schema description coverage, the description does not need to explain parameters. The baseline score of 4 is appropriate as there are no parameters to document.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Return' and resource 'directory status records,' indicating a read operation. However, it does not differentiate from sibling read tools like 'read_callback_receipt_records' or 'read_receipt_ledger', as the term 'recent' is vague and no specifics are given.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives. There is no mention of context, prerequisites, or exclusions, leaving the agent to infer usage solely from the name and description.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
read_live_supabase_verificationRead live Supabase verificationBRead-onlyInspect
Return recent live Supabase verification records.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide readOnlyHint=true. Description adds 'recent', implying time-based filtering but no further behavioral details (pagination, freshness, data source). Minimal addition beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence is concise and front-loaded with purpose. No wasted words, but lacks structural elements (e.g., bullet points) that could improve readability.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
No output schema provided. Description only mentions 'records' but does not specify return format, fields, or what 'recent' means. For a read tool, this is insufficient for understanding what the tool returns.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema is empty with additionalProperties:true, indicating no defined parameters. Schema coverage is 100%. Description does not clarify any optional parameters or configuration, baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Return' and the resource 'recent live Supabase verification records'. It distinguishes from sibling 'run_live_supabase_verification' which performs a verification run, while this tool reads existing records.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool vs alternatives. Does not specify that it is read-only (though annotation provides hint) or when to choose it over other read tools. Lacks any contextual usage instructions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
read_payment_settlement_recordsRead payment settlement recordsARead-onlyInspect
Return recent Stripe movement fee payment event records.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description adds minimal behavioral context beyond the readOnlyHint annotation. It specifies the resource type ('Stripe movement fee payment event records') but does not disclose any additional traits such as pagination, rate limits, the definition of 'recent', or what happens if no records exist. Annotations already convey safety, so a score of 3 is appropriate.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single sentence with no wasted words. It is optimally concise and front-loaded with the key action and resource.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a read-only tool with no parameters and a simple purpose, the description is nearly complete. However, the term 'recent' is ambiguous and lacks a defined time range. The absence of an output schema means the agent must infer the return format, but given the simplicity, this is a minor gap.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The tool has no parameters and the schema is empty (100% coverage). With zero parameters, the baseline is 4. The description does not need to add parameter meaning since none exist.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it returns 'recent Stripe movement fee payment event records', using a specific verb+resource combination. It distinguishes itself from sibling tools like read_callback_receipt_records and read_receipt_ledger by specifying the exact type of records (Stripe movement fee payments).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus other read tools (e.g., read_callback_receipt_records, read_receipt_ledger). There is no explicit or implied context about appropriate usage scenarios or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
read_receipt_ledgerRead receipt ledgerARead-onlyInspect
Return recent receipt ledger entries.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide readOnlyHint=true, so the description correctly does not contradict. It adds 'recent' to indicate ordering, but lacks details on recency definition, limit, or pagination.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence with no fluff; every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple read operation with no parameters and no output schema, the description provides adequate context. Lacks pagination info but acceptable for 'recent' entries.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has zero parameters with 100% coverage, so description does not need to add parameter details. Baseline of 4 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it returns recent receipt ledger entries, using a specific verb and resource. It distinguishes from sibling tools like verify_receipt or sign_receipt.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus alternatives such as read_callback_receipt_records or read_payment_settlement_records. No explicit context or exclusions provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
read_recent_regressionsRead recent regressionsCRead-onlyInspect
Return stored or sandbox regression history.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The annotation readOnlyHint=true already indicates a safe read operation. The description adds minimal value by mentioning 'stored or sandbox' sources but does not explain the implications of these sources, nor does it disclose any other behavioral traits like result ordering or side effects.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, concise sentence that immediately conveys the core function. There is no extraneous information, making it highly efficient for an agent to parse.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
No output schema exists, and the description does not hint at the return format or fields of the regression history. Additionally, the tool accepts arbitrary parameters (additionalProperties: true) but the description fails to explain how to use them, leaving significant gaps for an agent to infer.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% since there are no defined parameters. The baseline score of 3 applies. The description does not address the additionalProperties: true in the schema, which allows arbitrary parameters; an agent would not know how to use those from the description alone.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses 'Return' as the verb and clearly specifies the resource as 'stored or sandbox regression history', which distinguishes it from other read tools like read_callback_receipt_records. However, it could be more explicit about what constitutes a regression history, potentially confusing agents unfamiliar with the context.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives such as other read tools or regression-related tools. With a long list of sibling tools, this omission forces the agent to guess the appropriate context for invocation.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
record_receiptRecord receiptDInspect
Bind receipt to run and package.
| Name | Required | Description | Default |
|---|---|---|---|
| total_units | No | ||
| production_run_id | No | ||
| return_package_id | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=false (write operation). Description adds no behavioral context beyond 'bind receipt'. No mention of idempotency, preconditions, side effects, or return values.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Extremely short (5 words) but at the expense of clarity. Conciseness should not sacrifice informativeness. Lacks structure or front-loading of key details.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given multiple similar sibling tools and lack of output schema, description is entirely insufficient. Does not clarify what the tool does, what it returns, or how it differs from alternatives.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 3 parameters (total_units, production_run_id, return_package_id) with 0% schema description coverage. Description provides no explanation of any parameter meaning or format. Baseline 3 is not applicable due to lack of compensation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description 'Bind receipt to run and package' is vague. Does not specify what a receipt is or how it relates to run/package. Name suggests recording, but description says 'bind'. No differentiation from sibling tools like 'store_receipt_ledger_entry' or 'verify_receipt'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus alternatives. Sibling tools include many receipt-related operations (store, read, verify, sign) with no distinction provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
register_issued_credentialRegister issued credentialCInspect
Persist issued credential fingerprint, scopes, allowed rails, and expiration.
| Name | Required | Description | Default |
|---|---|---|---|
| agent_id | No | ||
| key_fingerprint | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description confirms a write operation (consistent with readOnlyHint=false) but adds minimal behavioral context. It does not disclose side effects, authorization needs, or rate limits. With no annotation detail beyond readOnlyHint, the description should have provided more transparency.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence, front-loaded with the verb, and includes key details. It is not verbose, though it could be expanded slightly for clarity without losing conciseness.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity and many sibling tools, the description is too sparse. It does not explain how this tool fits into the credential lifecycle, what the input parameters represent, or what outcome to expect. No output schema or reference to related operations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 2 parameters (agent_id, key_fingerprint) with 0% description coverage. The description mentions persisting scopes, allowed rails, and expiration, but these are not in the schema (though additionalProperties: true allows them). There is no explanation of what agent_id or key_fingerprint mean, leaving the parameters ambiguous.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses the verb 'persist' and identifies the resource 'issued credential', listing specific attributes (fingerprint, scopes, allowed rails, expiration). This clearly states the tool's purpose and distinguishes it from related siblings like 'issue_agent_credential' which likely creates credentials.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool vs alternatives. It does not mention prerequisites, exclusions, or when not to use. Given the large sibling list, explicit usage context is missing.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
rotate_signing_secretRotate signing secretCInspect
Return new active signing version.
| Name | Required | Description | Default |
|---|---|---|---|
| new_signing_version | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=false, so mutation is expected. The description implies a state change ('new active'), but does not disclose side effects, such as invalidation of previous secrets, permission requirements, or impact on ongoing operations. It adds minimal behavioral context beyond the annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise (one sentence, seven words), front-loading the key action. While it earns its place, it sacrifices completeness for brevity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple mutation tool with no output schema, the description lacks information about return value format, parameter optionality (schema shows no required params), and potential errors. It is not complete enough for an agent to confidently invoke.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has one parameter ('new_signing_version') with no description (0% coverage). The description does not explain this parameter's purpose, format, or constraints. The agent receives no semantic guidance beyond the parameter name.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description 'Return new active signing version' clearly states the tool's action and resource. It uses a specific verb ('Return') and identifies the resource ('new active signing version'). However, it does not differentiate from sibling tools like 'verify_signature_by_version' or 'rotate' in other contexts.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives. There is no mention of prerequisites, context, or situations where rotation is appropriate. Siblings include signature-related tools, but no explicit distinction is made.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
run_agent_to_agent_testRun agent-to-agent testCInspect
Run one rail through MCP proof path.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_key | No | ||
| idempotency_key | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The annotations show readOnlyHint=false, indicating mutation, but the description adds no additional behavioral details such as side effects, authorization needs, or rate limits. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single sentence, brief but too sparse to be informative. While concise, it sacrifices clarity and completeness, making it minimally adequate.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With no output schema, 0% parameter coverage, and many sibling tools, the description fails to provide essential context. It does not explain 'rail', 'proof path', or idempotency, nor how this tool fits into the workflow.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The description does not mention parameters, despite 0% schema coverage. Both 'rail_key' and 'idempotency_key' lack any explanation in the description, leaving the agent with no guidance on valid inputs or usage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description 'Run one rail through MCP proof path' indicates the tool executes a single rail operation, but 'MCP proof path' is unclear and doesn't differentiate from sibling tools like 'start_rail_run' or 'get_run_status'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives, nor any prerequisites or conditions. The description lacks any context for appropriate usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
run_directory_submission_adapterRun directory submission adapterCRead-onlyInspect
Return target-specific discovery payload.
| Name | Required | Description | Default |
|---|---|---|---|
| submission_target | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The annotations already declare readOnlyHint=true, so the description's claim of 'Return' is consistent and does not add hidden behavioral traits. However, it does not disclose what 'target-specific' entails or any constraints on the parameter, missing an opportunity to add value beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single sentence of 5 words, which is too brief to be useful. While brevity can be a virtue, here it results in under-specification rather than conciseness, as critical information is omitted.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simplicity of the tool (1 optional parameter, no output schema), the description still fails to explain what the discovery payload is, how it is used, or what 'target-specific' means. The lack of output schema information further reduces completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage, the description must compensate but completely ignores the 'submission_target' parameter. No information is provided about what value to pass, its format, or its effect on the payload. This leaves the agent guessing about parameter usage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description 'Return target-specific discovery payload' indicates the tool produces a payload, but the verb 'Return' is generic and 'target-specific discovery payload' is vague. Given the sibling tool 'adapt_directory_submission' and other submission-related tools, this description does not differentiate the tool's distinct purpose clearly.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives like 'adapt_directory_submission' or 'execute_directory_submission'. The description lacks any context about prerequisites or suitable scenarios, leaving the agent without decision-making support.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
run_external_agent_client_runnerRun external agent client runnerCInspect
Run an external-style client path and return transcript_hash.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_key | No | ||
| idempotency_key | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description mentions 'run' and returning a hash, but does not disclose side effects, auth needs, or other behavioral traits. Annotations indicate it is not read-only, but the description adds minimal value beyond that.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single short sentence, but it is too terse and omits essential information, sacrificing clarity for brevity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (two parameters, no output schema, side effects), the description fails to explain concepts like 'external-style client path' or 'transcript_hash', leaving the agent poorly informed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage, the description must explain parameters, but it does not mention 'rail_key' or 'idempotency_key' at all, leaving the agent to infer their meaning from the schema alone.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool runs an external-style client path and returns a transcript_hash, but it does not differentiate from sibling tools like 'run_external_agent_invocation_test' or 'build_external_agent_client_runner'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives, nor any exclusion conditions or prerequisites.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
run_external_agent_invocation_testRun external agent invocation testCInspect
Run external MCP client path and return transcript_hash.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_key | No | ||
| external_agent_id | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=false, implying side effects, but the description does not clarify what behaviors occur (e.g., state changes, authorization needs, failure modes). It only mentions running a path and returning a hash, which is insufficient disclosure.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is short and to the point, but it lacks essential details to be useful. It is concise but at the expense of clarity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With no output schema, minimal annotations, and a vague one-sentence description, the tool is severely underdocumented. It fails to explain inputs, outputs, behavior, or how to interpret results.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 2 parameters with 0% description coverage, and the tool description adds no information about their meaning, allowed values, or behavior. The enum for rail_key is not explained, and external_agent_id is opaque.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the tool runs an external MCP client path and returns a transcript_hash, giving a general idea of its function. However, it is vague about what 'external MCP client path' means and does not distinguish it from similar siblings like 'run_external_agent_client_runner'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives, when not to use it, or any prerequisites. The usage context is entirely implicit from the name.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
run_live_supabase_verificationRun live Supabase verificationDInspect
Write and read every durable table as release proof.
| Name | Required | Description | Default |
|---|---|---|---|
| tables | No | ||
| rail_key | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations only set readOnlyHint=false, which is consistent with the description's 'write and read' claim. However, the description does not disclose any behavioral traits such as side effects, required permissions, rate limits, or whether the write is destructive. With limited annotation coverage, the description should provide more transparency.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise (one sentence) but lacks essential content. It is not front-loaded with key information and does not earn its place as it omits crucial details about purpose and usage.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (write/read operations, two parameters with no descriptions, no output schema), the description is severely incomplete. It provides no context on how to use the tool, what the response looks like, or what 'release proof' entails. A much richer description is needed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, yet the description does not explain the parameters (tables, rail_key) at all. The agent has no information on what tables means or how to use the rail_key enum. The description entirely fails to add meaning beyond the schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states 'Write and read every durable table as release proof,' which is vague and does not clearly explain the tool's purpose or outcome. It mentions actions (write/read) on a resource (durable tables), but the term 'release proof' is ambiguous, and it fails to distinguish from sibling tools like read_live_supabase_verification.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives. There is no mention of context, prerequisites, or exclusions, leaving the agent with no decision-making support.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
run_multi_agent_rail_benchmarkRun multi-agent rail benchmarkCInspect
Run active rails across multiple agent roles.
| Name | Required | Description | Default |
|---|---|---|---|
| agent_ids | No | ||
| idempotency_key | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=false but no further behavioral traits are disclosed. The description does not mention side effects, authentication, or rate limits, leaving the agent uninformed about the tool's impact.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is short and front-loaded, but it sacrifices necessary detail. It is concise to a fault, lacking essential information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With no output schema, low schema coverage, and no behavioral disclosure, the description is severely incomplete. The agent cannot determine return values, prerequisites, or side effects.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0% and the description does not explain any parameters. The agent has no semantic information about 'agent_ids' or 'idempotency_key' beyond their names and types.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the verb 'run' and the resource 'active rails across multiple agent roles', which is specific. However, it does not differentiate this from sibling tools like 'run_agent_to_agent_test' or 'start_rail_run'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool instead of alternatives. The description lacks any context about prerequisites or scenarios.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
run_release_gate_automationRun release gate automationCInspect
Evaluate benchmark, Supabase, receipt ledger, directory, Stripe, and boundary checks.
| Name | Required | Description | Default |
|---|---|---|---|
| benchmark_state | No | ||
| release_version | No | ||
| expected_tool_count | No | ||
| allow_stripe_env_missing | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations only provide readOnlyHint=false, but the description does not clarify whether the tool has side effects, what happens on failure, or how checks are aggregated. Minimal transparency beyond the default.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is very short (one sentence, 12 words), which is concise but sacrifices necessary detail. It is front-loaded but insufficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 4 parameters with no schema descriptions, no output schema, and no required fields, the description is extremely incomplete. Does not explain return values, parameter roles, or tool behavior.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0% and the description does not explain any of the four parameters (benchmark_state, release_version, expected_tool_count, allow_stripe_env_missing). No meaning added.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description lists specific domains (benchmark, Supabase, receipt ledger, directory, Stripe, boundary checks) but does not clearly state the overall purpose or how it differs from siblings that run individual checks. The verb 'evaluate' is vague.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus alternatives like run_live_supabase_verification or validate_directory_submission_credentials. The description lacks any contextual advice.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
run_synthetic_agent_regressionRun synthetic agent regressionCInspect
Run active rails and return one release report.
| Name | Required | Description | Default |
|---|---|---|---|
| rails | No | ||
| idempotency_key | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The annotation 'readOnlyHint: false' indicates mutation, and the description implies side effects ('run' and 'return report'), but no further behavioral details are given—no mention of permissions, destructiveness, rate limits, or output structure. The description adds minimal value beyond the annotation.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely short (one sentence), but sacrifices clarity for brevity. It lacks organization and does not effectively communicate essential information. Every word is present but not sufficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of the tool (many siblings, two undocumented parameters, no output schema), the description is severely incomplete. It leaves critical gaps about what 'rails' are, what a 'release report' contains, and how to invoke the tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has two parameters ('rails' and 'idempotency_key') with 0% schema description coverage. The tool description does not explain what these parameters represent or how to use them, failing to compensate for the lack of schema documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states 'Run active rails and return one release report,' which gives a general sense of the tool's function but is vague about what 'active rails' and 'release report' mean. It does not differentiate from sibling tools like 'run_multi_agent_rail_benchmark' or 'run_release_gate_automation.'
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives. There is no explicit context, preconditions, or exclusions mentioned, leaving the agent without direction for proper selection.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
sign_directory_submission_receiptSign directory submission receiptCInspect
Bind submission hash, verification URL, and status to a signed directory receipt.
| Name | Required | Description | Default |
|---|---|---|---|
| submission_hash | No | ||
| verification_url | No | ||
| submission_target | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations only provide readOnlyHint=false, so the description should disclose side effects, permissions, or idempotency. It does not mention what happens to the inputs, whether a record is created, or any error conditions. The additionalProperties: true in the schema is not addressed.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single sentence but is too vague and fails to convey essential information. It is under-specified rather than concise.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the lack of output schema, zero schema coverage, and many siblings, the description is completely insufficient. It does not explain the output, how to interpret results, or how this tool fits into the workflow of directory submission receipts.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, but the description only lists 'submission hash, verification URL, and status' while the schema has 'submission_hash', 'verification_url', and 'submission_target' (status not present). This misalignment is misleading. No details on format, constraint, or role of each parameter.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses the verb 'bind' which is non-standard and ambiguous. It lists inputs (submission hash, verification URL, status) but does not clearly state that the tool creates a signed receipt. It does not distinguish from sibling tools like 'sign_receipt' or 'record_receipt'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool vs alternatives like 'sign_receipt'. No context about prerequisites, expected input format, or conditions for use.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
sign_receiptSign receiptCInspect
Sign receipt with managed signing metadata.
| Name | Required | Description | Default |
|---|---|---|---|
| receipt_id | No | ||
| production_run_id | No | ||
| return_package_id | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=false, confirming a write operation. The description adds no further behavioral traits (e.g., side effects, permissions, idempotency). With minimal annotation coverage, the description fails to disclose important context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely short at six words, which is concise but at the cost of omitting necessary details. It is front-loaded but sacrifices completeness.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 3 parameters with no descriptions, no output schema, and no annotations beyond readOnlyHint, the description is grossly incomplete. It does not specify required parameters (none are required, which is unusual) nor what the tool returns.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, and the description does not explain any of the three parameters (receipt_id, production_run_id, return_package_id). It provides no guidance on their purpose or format, leaving the agent to guess.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Sign' and the resource 'receipt', indicating a write operation. However, it does not differentiate from sibling tools like 'verify_receipt' or 'record_receipt' beyond the action. The phrase 'with managed signing metadata' adds some context but is vague.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives such as 'sign_directory_submission_receipt' or 'verify_managed_receipt_signature'. The description offers no context about prerequisites or scenarios.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
start_rail_runStart rail runCInspect
Start a rail run with idempotency and payment reference.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_key | No | ||
| idempotency_key | No | ||
| payment_reference | No | ||
| movement_amount_units | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations only indicate readOnlyHint=false; the description adds minimal behavioral info (mentions idempotency and payment reference) but does not disclose side effects, failure modes, or results. For a mutation tool, more transparency is needed.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
A single sentence is concise but lacks structure. It is front-loaded with the action and key concepts, but does not elaborate further. Could be expanded without losing conciseness.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With 4 parameters, no output schema, and low schema coverage, the description is too brief. It does not explain return behavior, required parameters (none required), or how the tool fits in a workflow. Incomplete for effective use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%. The description names two parameters (idempotency_key, payment_reference) but provides no semantics or constraints. It omits rail_key (enum) and movement_amount_units entirely. Insufficient added value.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'start' and the resource 'rail run', and mentions key parameters (idempotency, payment reference). However, it does not differentiate from sibling tools, though no sibling has 'start' in their name, so it's somewhat distinct.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool, prerequisites, or alternatives. The description is purely declarative without context.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
store_callback_receipt_recordStore callback receipt recordCInspect
Store callback, receipt, verification, and transcript references.
| Name | Required | Description | Default |
|---|---|---|---|
| event_type | No | ||
| receipt_id | No | ||
| verification_id | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=false, aligning with 'store' as a write operation. However, the description adds no further behavioral details, such as idempotency, persistence guarantees, side effects, or required permissions. Given the absence of annotation coverage beyond readOnlyHint, the description should disclose more about the tool's behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single short sentence, which is concise. However, it sacrifices necessary detail for brevity. It is front-loaded but insufficiently informative.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of 3 parameters with no required fields, no enums, no output schema, and multiple similar sibling tools, the description is highly incomplete. It fails to explain the tool's purpose, parameter usage, return value, or relationship to other tools. The context signals indicate a rich ecosystem where this tool needs better contextualization.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 3 parameters (event_type, receipt_id, verification_id) with no descriptions and 0% schema description coverage. The description mentions 'callback, receipt, verification, and transcript references' but only receipt_id and verification_id are explicitly named; event_type is omitted and 'transcript' is not represented in the schema. No parameter format or constraints are explained.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description 'Store callback, receipt, verification, and transcript references' is vague and does not clearly state what the tool does. It lists multiple reference types without specifying the primary action. The tool name suggests storing a 'callback receipt record', but the description broadens to include verification and transcript, causing confusion. It does not differentiate from sibling tools like 'record_receipt' or 'store_receipt_ledger_entry'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives such as 'record_receipt' or 'store_receipt_ledger_entry'. There is no mention of prerequisites, required inputs, or context in which this tool is appropriate. The description lacks exclusions or conditional usage hints.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
store_receipt_ledger_entryStore receipt ledger entryCInspect
Persist receipt, movement fee, verification, callback, and transcript references.
| Name | Required | Description | Default |
|---|---|---|---|
| receipt_id | No | ||
| production_run_id | No | ||
| return_package_id | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With only readOnlyHint=false in annotations, the description confirms a write operation ('persist') but discloses no behavioral traits such as side effects, authorization needs, or data handling. The description adds minimal value beyond the annotation.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single sentence that is front-loaded and concise, containing no unnecessary words. It is appropriately short for the context.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of 3 parameters, no output schema, and many siblings, the description is insufficient. It does not explain return values, side effects, or how this tool fits into the larger workflow, leaving the agent underinformed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The schema has 3 parameters with 0% description coverage. The description does not explain the parameters or their roles, failing to compensate for the low coverage. A score lower than 3 is warranted as the description neglects parameter clarity.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the verb 'persist' and lists types of references (receipt, movement fee, etc.), but it is vague about the specific resource and does not clearly differentiate from sibling tools like 'store_callback_receipt_record' or 'record_receipt'. The purpose is partially clear but lacks precision.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. There are no exclusions, prerequisites, or context for decision-making, which is critical given the many sibling tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
store_regression_reportStore regression reportCInspect
Store release report and transcript hashes.
| Name | Required | Description | Default |
|---|---|---|---|
| release_state | No | ||
| regression_report_id | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=false, consistent with 'store' implying a write operation. However, no additional behavioral details (e.g., idempotency, overwrite behavior, required permissions) are provided.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single short sentence, which is concise but at the expense of necessary detail. It front-loads the verb but lacks specificity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With no output schema, minimal annotations, and two un-described parameters, the description is woefully incomplete for an agent to use this tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, and the description does not explain the meaning of 'release_state' or 'regression_report_id'. The agent has no information about what values to provide.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description 'Store release report and transcript hashes' provides a verb and resource, but is vague about what 'store' means (save, upload, record?). It does not distinguish from sibling tools like store_callback_receipt_record.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool vs alternatives. No mention of prerequisites, conditions, or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
update_directory_status_recordUpdate directory status recordCInspect
Persist target, credential state, submission hash, external status, and verification URL.
| Name | Required | Description | Default |
|---|---|---|---|
| submission_state | No | ||
| verification_url | No | ||
| submission_target | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations provide readOnlyHint: false, indicating a write operation. The description adds minimal value beyond this by listing fields, but does not disclose side effects, permissions, or behavioral constraints.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence that is too terse for a tool with 3 parameters. Lacks structured explanation of each field's role or constraints, making it less informative than necessary.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema and simple parameters, the description should still explain the record's context and update behavior. It fails to mention that it updates an existing record or any prerequisites, leaving gaps in the agent's understanding.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 0% description coverage, so the description must compensate. It lists fields like 'credential state' and 'submission hash' that are not in the schema, and the mapping to schema properties is unclear, causing confusion rather than clarity.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states it 'persists' certain fields, aligning with the update verb in the name. However, it does not differentiate from sibling tools like 'adapt_directory_submission' or 'execute_directory_submission', and the listed fields are inconsistently mapped to the schema.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus alternatives. The description neither specifies prerequisites nor excludes scenarios, leaving the agent to infer usage context from the tool name alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
validate_a2a_agent_cardValidate A2A agent cardCInspect
Validate and optionally persist the hardened A2A Agent Card.
| Name | Required | Description | Default |
|---|---|---|---|
| agent_card | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=false, so the description's mention of 'optionally persist' adds minimal context. No disclosure of validation behavior, error handling, or persistence semantics. More detail is needed for a mutation-capable tool.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single sentence, which is concise but lacks essential details. While it earns points for brevity, the omission of parameter and behavioral information reduces its effectiveness.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (nested object parameter, no output schema, potential mutation via persistence), the description is severely incomplete. It fails to explain validation criteria, persistence conditions, return values, or typical usage patterns.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, and the description offers no explanation of the 'agent_card' parameter structure, format, or constraints. The parameter is a nested object with no required fields, yet the description provides no guidance.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool validates and optionally persists the 'hardened A2A Agent Card', which distinguishes it from sibling validation tools that target different resources. However, it does not explicitly differentiate from similar-named validation tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives, no prerequisites, and no exclusions. Siblings include many other validation tools, but no comparative information is given.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
validate_directory_submission_credentialsValidate directory submission credentialsCRead-onlyInspect
Return target-specific credential readiness without exposing secrets.
| Name | Required | Description | Default |
|---|---|---|---|
| submission_target | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=true; description adds that it does not expose secrets, but does not disclose what 'readiness' means or how it handles invalid targets.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
One sentence, front-loaded with key action and constraint. Efficient but lacking depth.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
No output schema, but description does not explain return value or error conditions. Incomplete for a validation tool where outcomes matter.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Single parameter 'submission_target' has no description in schema (0% coverage) and the tool description fails to clarify its format or purpose (e.g., URL, identifier, or name).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool validates directory submission credentials and returns readiness without exposing secrets, distinguishing it from credential activation or issuance tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool vs alternatives like activate_external_directory_credentials or issue_agent_credential. Lacks context for when credential readiness checks are appropriate.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
validate_persistent_storeValidate persistent storeBRead-onlyInspect
Return Supabase env and required table manifest.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description 'Return...' aligns with the readOnlyHint=true annotation, indicating a safe, read-only operation. It adds value by specifying what data is returned (environment and manifest), but does not disclose any other behavioral traits like error conditions or permission requirements.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise—two noun phrases with a clear verb—and contains no superfluous information, making it efficient for agent consumption.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple read-only tool with no parameters and existing annotations, the description is minimally adequate. However, given the large sibling set, a bit more context about the validation meaning or what triggers its use would improve completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The tool has no parameters (schema is empty with additionalProperties true), and schema coverage is 100%. No param clarification is needed; the description appropriately focuses on output.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Return') and the resources ('Supabase env and required table manifest'), making the tool's output explicit. However, it does not differentiate from sibling tools like 'validate_supabase_rls_policy_pack' or 'read_live_supabase_verification', which may have overlapping purposes.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. There is no mention of prerequisites, context, or when not to use it, leaving the agent without decision-making support among many sibling tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
validate_production_writeValidate production writeCInspect
Validate rail, idempotency, and target surface.
| Name | Required | Description | Default |
|---|---|---|---|
| rail_key | No | ||
| target_surface | No | ||
| idempotency_key | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations indicate readOnlyHint=false, implying the tool may perform writes, but the description does not clarify this. It does not disclose whether validation mutates state, what side effects occur, or what happens on validation failure.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely short (one phrase) and front-loaded, but it is too terse to be effective. Every word is used, but the phrase lacks substantive information beyond echoing the tool's purpose.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With no output schema, no parameter descriptions, and a vague description, the tool lacks sufficient context for an AI agent to understand its behavior, return values, or how to invoke it correctly among 60+ sibling tools.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 0% with no parameter descriptions. The description loosely maps 'rail, idempotency, and target surface' to the parameters rail_key, idempotency_key, and target_surface but provides no explanation of their meanings, allowed values (e.g., rail_key enum), or usage requirements.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description 'Validate rail, idempotency, and target surface' indicates the tool performs validation on three aspects, which aligns with the tool name. However, it does not clarify what 'validate' entails (e.g., returns a boolean, writes a result) or differentiate from sibling validation tools like validate_a2a_agent_card.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives (e.g., validate_a2a_agent_card, validate_persistent_store). There is no mention of prerequisites, success conditions, or when not to use it.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
validate_supabase_rls_policy_packValidate Supabase RLS policy packARead-onlyInspect
Return persistent table manifest, RLS policy names, and schema cache reload instruction.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint as true, and the description confirms a read-only operation by stating it 'returns' data. It adds behavioral context beyond annotations by specifying the exact return items (manifest, policy names, cache instruction). No contradiction.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, clear sentence that efficiently lists three return items. No unnecessary words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema and no parameters, the description explains what is returned but does not elaborate on the validation purpose or how the output is used. Sibling tools context suggests more detail could be helpful.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has no defined parameters but allows additionalProperties. Context shows parameter count is 0. The description does not clarify the input expectations or the meaning of additionalProperties, but the baseline for 0 params is 4; however, ambiguity about allowed input reduces it to 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool returns a persistent table manifest, RLS policy names, and schema cache reload instruction, which aligns with the tool name 'validate'. However, it does not distinguish this tool from sibling validation tools like 'validate_persistent_store' or 'validate_production_write'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It does not mention prerequisites, contexts, or when not to use it.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
verify_live_registry_listingVerify live registry listingCInspect
Fetch descriptor and listing URLs, then persist verification status.
| Name | Required | Description | Default |
|---|---|---|---|
| urls | No | ||
| listing_url | No | ||
| submission_target | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The annotation readOnlyHint=false already indicates a write operation. The description confirms it persists something, but adds little beyond that—no mention of side effects, idempotency, authorization needs, or what occurs with existing data.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is very short (one sentence), which is concise but lacking structure and front-loading of key details. It is not wasteful but sacrifices completeness for brevity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema, 0% parameter descriptions, and 3 unenforced parameters, the description fails to provide sufficient context for correct invocation. It does not explain the return value, persistence mechanism, or how to interpret 'verification status'.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 3 parameters (urls, listing_url, submission_target) with 0% description coverage. The description does not explain any parameter's purpose, relationship to 'descriptor' or 'listing URLs', or how they should be used together.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states it fetches descriptor and listing URLs and persists verification status, clearly indicating the tool's action and resource. However, it lacks differentiation from sibling verification tools like 'run_live_supabase_verification' and 'verify_receipt'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool vs alternatives, prerequisites, or exclusions. The description gives no context for appropriate use cases.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
verify_managed_receipt_signatureVerify managed receipt signatureDInspect
Verify versioned receipt signature.
| Name | Required | Description | Default |
|---|---|---|---|
| receipt_id | No | ||
| receipt_signature | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description claims 'verify', implying a read-only operation, but the annotation readOnlyHint is false, suggesting possible side effects. This contradiction is not resolved, and no behavioral traits are disclosed beyond the name.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single sentence, but it is under-specified. It lacks necessary detail and does not earn its place due to missing critical information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (many siblings, no output schema, no param descriptions), the description is entirely inadequate. It fails to explain the versioning concept, return values, or how this tool differs from similar ones.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, and the description provides no meaning for 'receipt_id' or 'receipt_signature'. It fails to add any semantic value beyond the schema field names.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the tool verifies a versioned receipt signature, which is clear but does not distinguish it from sibling tools like 'verify_receipt' or 'verify_signature_by_version'. The term 'managed' is unexplained, leaving ambiguity about its specific role.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
There is no guidance on when to use this tool versus alternatives. Given many sibling verification tools, the description should specify context or conditions for choosing this one.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
verify_receiptVerify receiptCInspect
Verify receipt hash and signature linkage.
| Name | Required | Description | Default |
|---|---|---|---|
| receipt_id | No | ||
| receipt_hash | No | ||
| production_run_id | No | ||
| receipt_signature | No | ||
| return_package_id | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description implies a read-only verification operation, but annotations set readOnlyHint=false, indicating the tool may have side effects. This contradiction undermines transparency. No additional behavioral traits are disclosed.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is very short (one sentence), which is concise but at the expense of necessary detail. It is front-loaded but incomplete.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the 5 parameters and no output schema, the description is woefully insufficient. It does not explain the return value, parameter semantics, or any required state for the tool to function.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 5 parameters with 0% description coverage, and the tool description fails to elaborate on any parameter meaning. The schema itself lacks descriptions, so the agent has no guidance on what values to provide.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description 'Verify receipt hash and signature linkage' clearly states the action (verify) and the resource (receipt hash and signature linkage), but it does not differentiate from sibling tools like 'verify_managed_receipt_signature' or 'verify_signature_by_version'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives such as 'sign_receipt' or 'record_receipt'. There is no mention of prerequisites or context for invocation.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
verify_signature_by_versionVerify signature by versionCInspect
Verify receipt signature by declared signing_version.
| Name | Required | Description | Default |
|---|---|---|---|
| signing_version | No | ||
| receipt_signature | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description only says 'Verify', which implies a read operation, but annotations indicate readOnlyHint=false. The description does not clarify whether the tool modifies state, requires special permissions, or any side effects. It adds minimal behavioral context beyond the annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single sentence that is front-loaded with the main idea. However, it lacks necessary details, making it too concise to be fully informative. It is efficient but incomplete.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simple tool with 2 parameters and no output schema, the description should explain what constitutes a receipt signature and signing version, and how they relate. The current one-liner leaves these critical details unaddressed, making it incomplete for an AI agent.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 0% description coverage, and the description only mentions 'signing_version' without defining what it is or specifying constraints. 'receipt_signature' is not even referenced. The description adds no semantic value to the schema's parameter definitions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Verify', the resource 'receipt signature', and the distinguishing method 'by declared signing_version'. This differentiates it from sibling tools like 'verify_managed_receipt_signature' and 'verify_receipt' by specifying a version-based approach.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives, no preconditions, and no context about when not to use it. It simply states the action without any decision support.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
verify_stripe_movement_fee_paymentVerify Stripe movement fee paymentCInspect
Read Checkout Session status and persist payment state.
| Name | Required | Description | Default |
|---|---|---|---|
| payment_reference | No | ||
| stripe_session_id | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The description indicates a mutating operation ('persist'), which matches readOnlyHint=false. However, it does not disclose side effects (e.g., state changes, required permissions, or impact on related records). Given the absence of annotations beyond readOnlyHint, more behavioral context would be beneficial.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is very short (one phrase), which is concise, but it omits critical information that could be added without significant verbosity. It is front-loaded with the main action, but the lack of structure reduces clarity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (read + write, multiple parameters), the description is insufficient. There is no output schema, so the return value is unknown. Parameter semantics are absent. The tool seems important for payment verification, yet the description provides minimal context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage, the description must explain the two parameters (payment_reference, stripe_session_id). It fails to do so, leaving their meaning, format, and relationship to the operation entirely unspecified.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states actions ('Read Checkout Session status and persist payment state') and identifies the resource (Checkout Session status, payment state). The tool name also reinforces its specific purpose, distinguishing it from generic verification tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus alternatives like verify_receipt or verify_signature. The description does not mention context, prerequisites, or exclusions, leaving the agent to infer usage from the name alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
write_persistence_test_recordWrite persistence test recordCInspect
Attempt to write test record to Supabase.
| Name | Required | Description | Default |
|---|---|---|---|
| release_state | No | ||
| transcript_hashes | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
The word 'Attempt' suggests potential failure, but no details on error handling, idempotency, or side effects. Annotations already indicate it's not read-only, so the description adds little beyond that.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely short (6 words), but it is front-loaded with key action and resource. However, it is too brief to be fully useful, failing to add value in the limited space.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema, 0% parameter coverage, and many sibling tools, the description is severely under-specified. It does not convey what happens after writing, expected inputs/outputs, or how it fits into the broader system.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 0% and the description does not explain the parameters 'release_state' or 'transcript_hashes'. An agent cannot infer their purpose without additional context.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly identifies the action (write), the resource (test record), and the target (Supabase). However, it does not distinguish this from sibling tools like 'validate_persistent_store' or 'store_receipt_ledger_entry' that might also write to databases.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool versus alternatives. The description lacks context about prerequisites such as needing a connection to Supabase or the intended testing scenarios.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!