VaultCrux Platform

by io.github.CueCrux

Server Details

VaultCrux Platform — 60 tools: retrieval, proof, intel, economy, watch, org

Status: Healthy
Last Tested: 2026-05-27 01:31
Transport: Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

B3.1/5.0

Tool DescriptionsB

Average 3.2/5 across 76 of 76 tools scored. Lowest: 2.6/5.

Server CoherenceB

Disambiguation3/5

Tools have distinct names and purposes (e.g., get_credit_balance vs. query_vault), but the sheer number (76) makes it difficult for an agent to choose, and the constant recommendation to use cuecrux_session suggests the direct tools are not the intended surface, causing confusion.

Naming Consistency4/5

All tools follow a snake_case verb_noun pattern (e.g., get_proof_receipt, create_work). A few names like github_comments_since are less standard but still consistent. Minor deviations exist, but overall predictable.

Tool Count2/5

76 tools far exceeds typical MCP server scope (3-15). While the server appears to be a comprehensive platform, the excessive count suggests poor scoping and potential redundancy, especially given the emphasis on a single session tool.

Completeness4/5

The tool set covers a wide range of capabilities: session management, memory, proof, economy, GitHub integration, project management, etc. Minor gaps exist (e.g., no tool to delete work items), but overall it's quite comprehensive for its domain.

Available Tools

76 tools

accept_handoff_packageAccept Handoff PackageCInspect

Accept an incoming handoff package.

Prefer cuecrux_session as your first and only direct MCP call. It returns a typed capability plan that routes this tool (and every other) to its preferred channel, tier, and cost class. One call per session is enough; the plan is the source of routing truth for all subsequent work. This tool remains directly callable for backward compatibility; the collapsed surface is the intended surface.

ParametersJSON Schema

Name	Required	Description	Default
`package_id`	Yes	The handoff package ID to accept.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden but provides minimal behavioral insight. It implies a mutation (accepting changes state) but doesn't disclose permissions needed, side effects, whether it's reversible, or what happens to the package post-acceptance. This leaves critical gaps for an agent to understand the tool's behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with zero wasted words. It's front-loaded with the core action and resource, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is inadequate. It doesn't explain what 'accept' means operationally, what the expected outcome is, or any error conditions. Given the complexity implied by handoff workflows, more context is needed for the agent to use this effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 100% description coverage, with the single parameter 'package_id' clearly documented. The description adds no additional parameter context beyond what the schema provides, so it meets the baseline of 3 for high schema coverage without compensating value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('accept') and resource ('incoming handoff package'), making the tool's purpose immediately understandable. However, it doesn't differentiate from sibling tools like 'create_handoff_package' or explain what acceptance entails versus other handoff-related operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. There's no mention of prerequisites (e.g., needing an existing handoff package), what happens after acceptance, or how it differs from other handoff-related tools like 'create_handoff_package' in the sibling list.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

action_journal_queryQuery Action Journal ReceiptsAInspect

Search action journal receipts with filters for time range, outcome, and tool name.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of receipts to return.
`since`	No	ISO 8601 datetime for the start of the query window.
`until`	No	ISO 8601 datetime for the end of the query window.
`offset`	No	Number of receipts to skip for pagination.
`outcome`	No	Filter by outcome status.
`tool_name`	No	Filter by the name of the tool that produced the receipt.

Tool Definition Quality

A3.5/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, and the description does not disclose behavioral traits such as side effects, authentication requirements, rate limits, or any other important behavioral details. It only describes the query function.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise, with the first sentence stating the core functionality. The second paragraph about cuecrux_session is somewhat lengthy but relevant for guidance. No superfluous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description lacks details about return values, pagination behavior, or what a receipt looks like. Without an output schema, the agent is left uninformed about what the tool returns, making the tool definition incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema covers 100% of parameters with descriptions, so the description adds no extra meaning. The description merely mentions filters without additional semantic value, resulting in a baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches action journal receipts with specific filters (time range, outcome, tool name). It is specific enough to convey the purpose, though it doesn't explicitly differentiate from sibling query tools like query_vault or get_journal.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance: prefer cuecrux_session for routing, and this tool is for backward compatibility. This clearly tells when to use this tool vs. alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

annotate_sessionAnnotate SessionCInspect

Add an annotation to the current session.

ParametersJSON Schema

Name	Required	Description
`content`	Yes	Annotation content.
`session_id`	No	Session ID (defaults to 'default').
`annotation_type`	No	Type of annotation.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. While 'Add an annotation' implies a write/mutation operation, it doesn't specify permissions needed, whether annotations are editable/deletable, rate limits, or what happens on success/failure. The description is minimal and lacks important operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that communicates the core purpose without any wasted words. It's appropriately sized for a simple annotation tool and front-loads the essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is insufficient. It doesn't explain what an 'annotation' means in this context, how annotations are stored/retrieved, or what the tool returns. The minimal description leaves too many operational questions unanswered.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters (content, session_id, annotation_type). The description adds no parameter-specific information beyond what's in the schema, meeting the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Add an annotation') and target resource ('to the current session'), making the purpose immediately understandable. However, it doesn't differentiate this tool from potential annotation-related siblings (none are listed, but the description doesn't explicitly address uniqueness).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, prerequisites, or contextual constraints. It simply states what the tool does without indicating appropriate scenarios or limitations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_bundlesBrowse BundlesCInspect

List available credit bundles for purchase.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Maximum number of bundles to return.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions listing bundles but does not cover key traits like whether this is a read-only operation, if it requires authentication, rate limits, pagination behavior, or what the return format looks like. This leaves significant gaps in understanding how the tool behaves beyond its basic function.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It is front-loaded and wastes no space, making it easy to parse quickly while conveying the essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of annotations and output schema, the description is incomplete for a tool that likely returns a list of bundles. It does not explain the return format, error conditions, or behavioral aspects like authentication needs. For a tool with no structured metadata, the description should provide more context to be fully actionable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for the single parameter 'limit', so the schema already documents it fully. The description does not add any parameter-specific details beyond what the schema provides, such as default values or usage context. With high schema coverage, a baseline score of 3 is appropriate, as the description adds no extra semantic value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'List' and resource 'available credit bundles for purchase', making the purpose specific and understandable. However, it does not explicitly differentiate from sibling tools like 'get_credit_balance' or 'purchase_bundle', which might handle related credit operations, leaving some ambiguity in distinguishing its exact scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as 'get_credit_balance' for checking current balances or 'purchase_bundle' for buying bundles. It lacks context on prerequisites, exclusions, or specific scenarios, offering only a basic functional statement without usage instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

change_seat_roleChange Seat RoleCInspect

Change the role assigned to an existing organisation seat.

ParametersJSON Schema

Name	Required	Description	Default
`role`	Yes	The new role to assign.
`seat_id`	Yes	The ID of the seat to update.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It states this is a mutation operation ('Change'), but doesn't mention required permissions, whether changes are reversible, potential side effects, or what happens on success/failure. For a mutation tool with zero annotation coverage, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that gets straight to the point with zero wasted words. It's appropriately sized for a simple mutation operation and front-loads the essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is incomplete. It doesn't address behavioral aspects like permissions, side effects, or response format. While concise, it lacks the contextual information needed for safe and effective tool invocation given the complexity of role changes in organizational systems.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already fully documents both parameters (seat_id and role). The description doesn't add any parameter-specific information beyond what's in the schema, such as role format constraints or seat_id validation rules. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Change the role') and target resource ('an existing organisation seat'), making the purpose immediately understandable. It doesn't specifically differentiate from sibling tools like 'invite_seat' or 'revoke_seat', but the verb 'change' versus 'invite' or 'revoke' provides some implicit distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'invite_seat' (for new seats) or 'revoke_seat' (for removal). It mentions 'existing organisation seat' which implies a prerequisite but doesn't explicitly state when this tool is appropriate or inappropriate compared to siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

comment_on_workComment on Crux Work ItemAInspect

Post a comment on a work item — leave context for the next agent or human.

ParametersJSON Schema

Name	Required	Description	Default
`body`	Yes
`work_id`	Yes
`author_passport`	Yes

Tool Definition Quality

A3.5/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are present, so the description carries the full burden. It only states the action without describing side effects, permissions, rate limits, or response structure. Insufficient for an agent to understand behavioral implications.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences front-loading purpose and usage guidance. No filler or repetition; every sentence adds value relative to the tool's role.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Missing details about return value (no output schema), required authentication, and parameter semantics. With no annotations and sparse description, the definition is incomplete for an agent to invoke the tool correctly without external knowledge.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage and the description offers no additional meaning for any of the three required parameters (body, work_id, author_passport). The agent relies solely on parameter names, which is inadequate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states verb ('Post a comment'), resource ('on a work item'), and intent ('leave context for the next agent or human'). Distinct from siblings which handle creation, state updates, or queries.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly recommends cuecrux_session as preferred routing and notes this tool remains for backward compatibility. Provides clear when-to-use vs. alternative guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_coalitionCreate CoalitionCInspect

Create a multi-agent coalition to address a knowledge gap.

ParametersJSON Schema

Name	Required	Description
`expires_at`	No	ISO 8601 expiry timestamp.
`budget_cap_crux`	No	Budget cap in crux credits (defaults to 10).
`gap_description`	Yes	Description of the knowledge gap.
`initial_pledge_crux`	No	Initial pledge in crux credits (defaults to 1).

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but offers minimal behavioral insight. 'Create' implies a write/mutation operation, but the description doesn't disclose permission requirements, what happens after creation, whether coalitions are persistent, or any side effects. It mentions the purpose but not the operational behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that states the core purpose without unnecessary words. It's appropriately sized for a creation tool and front-loads the essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a creation tool with 4 parameters, no annotations, and no output schema, the description is insufficient. It doesn't explain what a 'coalition' entails operationally, what happens after creation, success/failure conditions, or return values. The context signals indicate this is a non-trivial tool that needs more complete documentation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so all parameters are documented in the schema. The description adds no additional parameter context beyond what's in the schema - it doesn't explain relationships between parameters (e.g., how budget_cap_crux relates to initial_pledge_crux) or provide usage examples. Baseline 3 is appropriate when schema does the documentation work.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Create') and resource ('multi-agent coalition') with a specific purpose ('to address a knowledge gap'). It distinguishes from obvious siblings like 'join_coalition' but doesn't explicitly differentiate from other creation tools like 'create_handoff_package'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides minimal guidance - it implies this tool should be used when there's a knowledge gap to address, but offers no explicit when/when-not criteria, prerequisites, or alternatives. No comparison to sibling tools like 'get_knowledge_gaps' or 'join_coalition' is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_handoff_packageCreate Handoff PackageCInspect

Create a handoff package for multi-agent session transfer.

ParametersJSON Schema

Name	Required	Description	Default
`scope`	No	Scope object for the handoff.
`session_id`	No	Session ID (defaults to agent ID).

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It mentions creation but doesn't specify whether this is a write operation, what permissions are needed, if it's idempotent, what happens on failure, or what the output looks like. For a creation tool with zero annotation coverage, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without any unnecessary words or structural complexity. It's appropriately sized and front-loaded with the essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a creation tool with no annotations and no output schema, the description is insufficient. It doesn't explain what a handoff package contains, how it's used, what the creation process entails, or what happens after creation. The context signals indicate complexity (nested objects) that isn't addressed in the minimal description.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters thoroughly. The description adds no additional parameter information beyond what's in the schema, maintaining the baseline score of 3 for adequate but not enhanced parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'create' and the resource 'handoff package', specifying it's for 'multi-agent session transfer'. This is specific enough to understand the tool's function, though it doesn't explicitly differentiate from its sibling 'accept_handoff_package' beyond the action direction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, nor does it mention prerequisites, timing considerations, or exclusions. It simply states what the tool does without contextual usage information.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_workCreate Crux Work ItemBInspect

Create a new work item under a project (defaults: state=planned).

ParametersJSON Schema

Name	Required	Description	Default
`body`	No
`state`	No
`title`	Yes
`linked_pr`	No
`tenant_id`	No
`project_id`	Yes
`linked_issue`	No
`assignee_passport`	No
`created_by_passport`	Yes

Tool Definition Quality

B3.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must carry the full burden. It only discloses the default state and the recommendation to use `cuecrux_session`. It does not mention creation side effects, idempotency, required permissions, or error handling, leaving major behavioral traits undisclosed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the main purpose. The first sentence is concise; the second is longer but provides important usage context. It is not overly verbose, but the second sentence could be slightly more terse. Overall, it is well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 9 parameters, no output schema, and no annotations, the description should offer comprehensive context. However, it only covers the default state and the session preference. It lacks information on return values, error cases, parameter semantics, and preconditions, making it incomplete for an AI to use confidently.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 9 parameters with 0% schema description coverage. The description only mentions that 'state' defaults to 'planned'. It does not explain the meaning or purpose of 'body', 'title', 'linked_pr', 'tenant_id', 'project_id', 'linked_issue', 'assignee_passport', or 'created_by_passport', leaving the agent without sufficient parameter-level guidance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a new work item under a project, with a default state of 'planned'. The verb 'create' and resource 'work item' are unambiguous. While there are many sibling tools, the description implicitly distinguishes by describing a core creation action, which is different from update or query tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises preferring `cuecrux_session` for routing, providing a clear usage alternative. It mentions this tool remains directly callable for backward compatibility, which implies when to use it directly. However, it does not contrast with other sibling tools like 'update_work_state' or 'comment_on_work', so some guidance is missing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cuecrux_sessionCueCrux SessionAInspect

Opens a CueCrux session and returns a typed capability plan (retrieval, proofing, memory, journaling, audit) across VaultCrux and MemoryCrux. Call this first, once. Every subsequent action routes through the channels the plan returns — do not browse the legacy per-service tool list when a plan channel already covers the capability. Identical behaviour for local Crux CE and hosted CueCrux. Hosted deployments stage v1 flat-list or v2 typed-graph plan shapes behind feature flags; callers treat the returned plan as the single source of routing truth. Bulk-capable agents transparently use the HTTP/2 binary channel; MCP-only agents use the MCP fallback URLs the plan provides. Implements RCX-Protocol v1.0.

ParametersJSON Schema

Name	Required	Description
`hints`	No	Optional routing / shaping hints.
`model`	No	Optional model declaration used for capability-model policy gating.
`intent`	No	Optional intent hint (e.g., 'audit_review', 'document_ingest'). Lets the capability graph be reordered to put intent-relevant capabilities first.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behavioral traits: the tool returns a capability plan with typed routing hints, works identically for local and hosted deployments, and specifies protocol implementation ('Implements RCX-Protocol v1.0'). It also explains how different agents (bulk-capable vs. MCP-only) interact with the plan. However, it lacks details on error handling or performance characteristics, preventing a perfect score.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, starting with the core purpose and usage guidelines. Every sentence adds value, such as explaining deployment compatibility and agent interactions. However, some sentences are complex and could be slightly streamlined (e.g., the one about hosted deployments and feature flags), making it very good but not perfectly concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (session initialization with routing plans) and no annotations or output schema, the description does a strong job of covering context. It explains the tool's role, return value (capability plan), deployment scenarios, and agent interactions. However, it lacks details on the plan's structure or example outputs, which would be helpful for a tool with no output schema, keeping it from a perfect score.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, so the baseline is 3. The description adds value by contextualizing the parameters: it mentions 'hints' as 'routing / shaping hints' and explains that 'hosted deployments can stage a legacy-compatible handshake shape or the newer v2 graph shape behind feature flags,' which relates to the 'hints' parameter's effects. However, it doesn't elaborate on 'model' or 'intent' parameters beyond what the schema provides, keeping it from a score of 5.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Opens a CueCrux session' and specifies it 'Returns a capability plan covering retrieval, proofing, memory, journaling, and audit across VaultCrux and MemoryCrux.' It distinguishes itself from siblings by being the foundational tool that must be 'Call[ed] this first' before using other tools, making the purpose specific and well-differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidelines: 'Call this first. All subsequent work flows through the plan's channels.' It also distinguishes when to use this tool versus alternatives by noting that 'hosted deployments can stage a legacy-compatible handshake shape or the newer v2 graph shape behind feature flags, but callers should always treat the returned plan as the source of routing truth,' guiding users on how to handle different deployment scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

declare_revenue_willingnessDeclare Revenue WillingnessBInspect

Declare willingness to pay for a feature or category, helping prioritize the product roadmap.

ParametersJSON Schema

Name	Required	Description
`notes`	No	Free-form notes about the declaration.
`category`	No	Category of the declaration (default: other).
`metadata`	No	Additional metadata to attach.
`confidence`	No	Confidence level in the willingness declaration (default: medium).
`request_id`	No	The ID of a specific feature request this declaration relates to.
`billing_cycle`	No	Preferred billing cycle (default: monthly).
`willingness_band`	No	Price band the agent is willing to pay (default: lt_100).

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool helps 'prioritize the product roadmap,' suggesting it's a write operation that may influence product decisions, but it doesn't disclose critical behavioral traits such as authentication requirements, rate limits, whether the declaration is reversible, or what happens after submission (e.g., confirmation, impact). For a tool with potential product impact and no annotations, this is a significant gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence: 'Declare willingness to pay for a feature or category, helping prioritize the product roadmap.' It is front-loaded with the core action and outcome, with no wasted words. Every part of the sentence contributes to understanding the tool's purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, no output schema, no annotations), the description is adequate but incomplete. It explains the high-level purpose but lacks details on behavioral aspects (e.g., mutability, side effects) and doesn't leverage the rich parameter schema to guide usage. For a tool that could influence product decisions, more context on outcomes and constraints would be beneficial.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with detailed parameter descriptions (e.g., 'Free-form notes about the declaration,' 'Category of the declaration (default: other)'). The description doesn't add any parameter-specific information beyond what's in the schema, such as examples or usage tips. With high schema coverage, the baseline score is 3, as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Declare willingness to pay for a feature or category, helping prioritize the product roadmap.' It specifies the verb ('declare') and resource ('willingness to pay'), and mentions the outcome ('prioritize the product roadmap'). However, it doesn't explicitly differentiate from sibling tools like 'submit_feature_request' or 'vote_feature_request', which might have overlapping purposes in product feedback.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides minimal usage guidance. It implies this tool is for declaring payment willingness to influence roadmap prioritization, but it doesn't specify when to use it versus alternatives like 'submit_feature_request' or 'vote_feature_request' (which are sibling tools). No exclusions, prerequisites, or explicit alternatives are mentioned, leaving the agent with little context for tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

diff_receiptsDiff ReceiptsCInspect

Compare two provenance receipts and highlight differences.

ParametersJSON Schema

Name	Required	Description	Default
`receipt_id_a`	Yes	First receipt ID.
`receipt_id_b`	Yes	Second receipt ID.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It states the tool compares and highlights differences, but doesn't disclose behavioral traits such as whether it's read-only, what format the output takes, if there are rate limits, or if it requires specific permissions. This is inadequate for a tool with no annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core functionality. There is no wasted verbiage, and it directly communicates the tool's purpose without unnecessary details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no annotations and no output schema, the description is insufficient. It doesn't explain what a 'provenance receipt' is, what 'highlight differences' entails (e.g., output format), or any behavioral constraints. This leaves significant gaps for an agent to understand and use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with both parameters ('receipt_id_a' and 'receipt_id_b') documented as receipt IDs. The description adds no additional semantic meaning beyond this, such as explaining what a 'provenance receipt' is or how IDs should be formatted. Baseline 3 is appropriate given the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Compare two provenance receipts and highlight differences.' It specifies the verb ('compare'), resource ('provenance receipts'), and outcome ('highlight differences'). However, it doesn't explicitly differentiate from sibling tools like 'find_contradictions' or 'get_proof_receipt', which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, context, or exclusions. Given sibling tools like 'find_contradictions' and 'get_proof_receipt', the lack of differentiation leaves the agent without clear usage criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

explain_last_answerExplain Last AnswerCInspect

Explain how the last answer was derived.

ParametersJSON Schema

Name	Required	Description	Default
`answer_id`	No	The answer ID to explain (defaults to last answer).

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It states the tool explains derivations but doesn't disclose behavioral traits such as whether it's read-only, requires specific permissions, has rate limits, or what happens if no last answer exists. For a tool with zero annotation coverage, this leaves significant gaps in understanding its operation and constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's function. It's front-loaded with the core purpose and avoids unnecessary details, making it easy to parse. Every word earns its place, with no waste or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of explaining answer derivations, no annotations, and no output schema, the description is incomplete. It doesn't cover what the explanation includes (e.g., steps, sources, confidence), error conditions, or output format. For a tool that likely returns detailed reasoning, this lack of context makes it inadequate for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with one parameter 'answer_id' documented as defaulting to the last answer. The description adds no additional meaning beyond this, as it only mentions 'last answer' without elaborating on parameter usage or semantics. Baseline 3 is appropriate since the schema adequately covers the parameter, but no extra value is provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool's purpose as explaining how an answer was derived, which is clear but vague. It specifies 'last answer' as the default target, but doesn't detail what constitutes an 'answer' or 'explanation' in this context. Compared to siblings like 'get_counterfactual_summary' or 'get_proof_status', it's distinguishable but lacks specificity about the explanation format or content.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The description doesn't mention prerequisites (e.g., requiring a prior answer), exclusions, or related tools like 'get_reasoning_profile' or 'get_proof_status' that might offer similar insights. Usage is implied only by the tool name and description, with no explicit context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

find_contradictionsFind ContradictionsCInspect

Scan for contradictions across knowledge sources.

ParametersJSON Schema

Name	Required	Description	Default
`depth`	No	Scan depth (e.g. 'shallow', 'deep').
`scope`	No	Scope object to narrow the scan.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but offers minimal behavioral insight. It mentions 'scan' but doesn't clarify what 'contradictions' mean operationally, potential side effects, performance characteristics, or output format. For a tool with no annotation coverage, this is inadequate disclosure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with zero wasted words. It's appropriately sized and front-loaded, directly stating the tool's function without unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations, no output schema, and a tool that presumably performs non-trivial analysis ('scan for contradictions'), the description is incomplete. It lacks details on what constitutes a contradiction, how results are returned, or any behavioral context, making it insufficient for an agent to use effectively without additional information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters ('depth' and 'scope'). The description adds no additional parameter semantics beyond what's in the schema, such as examples of 'knowledge sources' or how parameters affect the scan. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('scan') and target ('contradictions across knowledge sources'), making the purpose understandable. It doesn't explicitly differentiate from sibling tools, but given the unique nature of 'find_contradictions' among the listed siblings, the purpose is sufficiently clear without direct comparison.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, prerequisites, or specific contexts. It lacks any mention of related tools or scenarios where this scanning operation is appropriate, leaving usage entirely implicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

forecast_obsolescenceForecast ObsolescenceCInspect

Forecast which artefacts are likely to become obsolete.

ParametersJSON Schema

Name	Required	Description	Default
`domain`	No	Domain to scope the forecast.
`artefacts`	No	Artefacts to evaluate.

Tool Definition Quality

C2.7/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden for behavioral disclosure. It fails to describe what the forecast returns (e.g., scores, rankings, explanations), how it's computed, any limitations (e.g., data recency, confidence intervals), or side effects. This is inadequate for a forecasting tool with no structured behavioral hints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It's appropriately sized and front-loaded, with zero wasted content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete for a forecasting tool. It lacks details on return values, behavioral traits, and usage context, leaving significant gaps for an agent to understand how to effectively invoke and interpret results.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are fully documented in the schema. The description adds no additional meaning about 'domain' or 'artefacts' beyond what the schema provides, such as domain examples or artefact types. Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('forecast') and resource ('artefacts'), and specifies the forecast target ('likely to become obsolete'). It doesn't distinguish from sibling tools, but none appear to be direct alternatives for obsolescence forecasting, making the purpose adequately clear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, prerequisites, or exclusions. It simply states what the tool does without contextual usage information, leaving the agent to infer appropriate scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_active_policyGet Active PolicyCInspect

Get the currently active policy.

ParametersJSON Schema

Name	Required	Description	Default
`policy_name`	No	Policy name to retrieve.

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions 'currently active policy,' which hints at real-time or latest data, but doesn't disclose behavioral traits like permissions needed, rate limits, error conditions, or what 'active' means (e.g., effective date, user-specific). The description is minimal and lacks operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with no wasted words. It's front-loaded with the core action and resource, making it easy to parse. Every word earns its place, though it could benefit from more detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations, no output schema, and a simple input schema, the description is incomplete. It doesn't explain what a 'policy' is in this context, what data is returned, or how 'active' is determined. For a tool with potential complexity (e.g., policy management), more context is needed to guide effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with the parameter 'policy_name' documented in the schema. The description adds no parameter-specific information beyond implying retrieval of an active policy. Since the schema does the heavy lifting, the baseline score of 3 is appropriate, as the description doesn't compensate but also doesn't detract.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the action ('Get') and resource ('currently active policy'), which is clear but vague. It doesn't specify what a 'policy' entails in this context or differentiate from sibling tools like 'set_policy' beyond the verb difference. The purpose is understandable but lacks specificity about the policy domain or content.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. With sibling tools like 'set_policy' and 'get_policy_status' (implied by context), the description doesn't indicate if this retrieves a default policy, the latest policy, or how it relates to other policy-related tools. Usage is implied only by the verb 'Get'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_beliefsGet BeliefsBInspect

List registered beliefs with optional filters.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of results.
`since`	No	ISO 8601 timestamp to filter beliefs after.
`offset`	No	Offset for pagination.
`answer_ids`	No	Filter by answer IDs.

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but offers minimal behavioral insight. It implies a read-only operation ('List') but doesn't disclose pagination behavior (beyond the 'offset' parameter in schema), rate limits, authentication needs, or what 'registered beliefs' entails. The agent must infer behavior from the tool name and sparse description.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with zero wasted words. It front-loads the core purpose ('List registered beliefs') and adds only essential qualification ('with optional filters'). This is appropriately sized for a list tool with well-documented parameters.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a list tool with no annotations and no output schema, the description is minimally adequate. It states what the tool does but lacks context about the belief system, return format, or error conditions. Given the 4 parameters and 100% schema coverage, it's complete enough for basic use but leaves behavioral aspects to inference.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all four parameters. The description adds no additional meaning beyond mentioning 'optional filters', which is already implied by the parameter names. This meets the baseline of 3 where the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('List') and resource ('registered beliefs'), and mentions optional filtering. It distinguishes from sibling tools like 'register_belief' by focusing on retrieval rather than creation. However, it doesn't explicitly differentiate from other list-style tools like 'get_watches' or 'get_watch_alerts' beyond the resource type.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, context for filtering, or compare it to other list/query tools in the sibling set. The phrase 'with optional filters' is generic and doesn't help the agent choose between this and tools like 'query_vault' or 'get_journal'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_blast_radiusGet Blast RadiusBInspect

Estimate the impact radius if an artefact or receipt changes.

ParametersJSON Schema

Name	Required	Description
`domain`	No	Domain scope for the analysis.
`receipt_id`	No	The receipt ID to analyze.
`artefact_id`	No	The artefact ID to analyze.

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions 'Estimate' which implies a read-only calculation, but doesn't specify whether this requires specific permissions, what format the estimation returns, whether it's a real-time or cached analysis, or any rate limits. For an analysis tool with zero annotation coverage, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that gets straight to the point with zero wasted words. It's appropriately sized for a tool with three parameters and no complex behavioral nuances to explain, making it perfectly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with three parameters, 100% schema coverage, but no annotations and no output schema, the description is minimally adequate. It explains what the tool does but lacks crucial context about the estimation methodology, output format, and when to use it versus similar analysis tools. The absence of output schema means the description should ideally explain what 'impact radius' estimation returns, which it doesn't.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so all three parameters are documented in the schema. The description mentions 'artefact or receipt changes' which aligns with the artefact_id and receipt_id parameters, but doesn't add meaningful semantic context beyond what the schema already provides. The baseline score of 3 reflects adequate but minimal value addition.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Estimate') and resource ('impact radius'), and specifies the trigger condition ('if an artefact or receipt changes'). However, it doesn't distinguish this tool from potential siblings like 'get_break_analysis' or 'forecast_obsolescence' that might also analyze impacts, which prevents a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With many sibling tools like 'get_break_analysis' and 'forecast_obsolescence' that might serve similar analytical purposes, there's no indication of when this specific impact estimation is appropriate or what distinguishes it from other analysis tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_break_analysisGet Break AnalysisCInspect

Analyze what would break if a given answer or receipt is invalidated.

ParametersJSON Schema

Name	Required	Description
`domain`	No	Domain scope for the analysis.
`answer_id`	No	The answer ID to analyze.
`receipt_id`	No	The receipt ID to analyze.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the tool performs analysis but doesn't describe what 'break' means, whether this is a read-only operation, what the output format is, or any side effects. This is inadequate for a tool with potential complexity.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence with zero waste. It's appropriately sized and front-loaded, efficiently conveying the core purpose without unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations, no output schema, and a potentially complex analysis tool, the description is incomplete. It doesn't explain what 'break' entails, the scope of analysis, or the return format, leaving significant gaps for the agent to understand the tool's behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters (domain, answer_id, receipt_id). The description doesn't add any meaning beyond the schema, such as explaining relationships between parameters or usage constraints. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Analyze what would break if a given answer or receipt is invalidated.' It specifies the action ('analyze') and the resource ('what would break'), but doesn't explicitly differentiate from siblings like 'get_blast_radius' or 'forecast_obsolescence' which might have related functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, context, or exclusions, leaving the agent to infer usage based on the purpose alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_counterfactual_summaryGet Counterfactual SummaryCInspect

Generate a counterfactual summary for an answer.

ParametersJSON Schema

Name	Required	Description	Default
`answer_id`	No	The answer ID to summarize.
`receipt_id`	No	The receipt ID to summarize.

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions 'Generate a counterfactual summary,' implying a read-only operation, but does not specify if it requires permissions, affects data, has rate limits, or what the output format might be. For a tool with no annotations, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with no wasted words, making it easy to parse. However, it lacks front-loaded critical details like context or differentiation, which slightly reduces its effectiveness despite its brevity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of generating a 'counterfactual summary' and the absence of annotations and output schema, the description is incomplete. It does not explain what a counterfactual summary is, how it differs from other summary tools, or what the output entails. This leaves significant gaps for an AI agent to understand and use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with parameters 'answer_id' and 'receipt_id' clearly documented. The description does not add any meaning beyond the schema, such as explaining the relationship between these IDs or which is prioritized. Given the high schema coverage, a baseline score of 3 is appropriate, as the schema handles the parameter documentation adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool's purpose as 'Generate a counterfactual summary for an answer,' which is clear but vague. It specifies the action (generate) and target (counterfactual summary for an answer), but does not define what a 'counterfactual summary' entails or how it differs from similar tools like 'explain_last_answer' or 'get_proof_receipt.' This lack of differentiation from siblings reduces its effectiveness.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It does not mention prerequisites, context, or exclusions, such as whether it requires specific answer states or how it relates to other tools like 'explain_last_answer' or 'get_reasoning_profile.' This absence of usage context leaves the agent without clear direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_credit_balanceGet Credit BalanceBInspect

Retrieve the current credit balance for the agent, including receipt verification and passport data.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions 'retrieve' which implies a read-only operation, but doesn't disclose behavioral traits like authentication needs, rate limits, or what 'including receipt verification and passport data' entails in terms of output or side effects. This leaves gaps in understanding how the tool behaves beyond its basic function.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose. However, the phrase 'including receipt verification and passport data' could be more precise or structured to avoid ambiguity, slightly reducing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (0 parameters, no output schema, no annotations), the description is minimally adequate but incomplete. It lacks details on output format, error handling, or how 'receipt verification and passport data' integrate, leaving the agent with unanswered questions about the tool's full context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0 parameters and 100% schema description coverage, the baseline is high. The description adds value by specifying that the retrieval includes 'receipt verification and passport data', which provides context beyond the empty schema, though it doesn't fully explain what these inclusions mean semantically.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'retrieve' and resource 'current credit balance for the agent', making the purpose understandable. It distinguishes from siblings like 'get_credit_escrow' by focusing on balance rather than escrow, though it doesn't explicitly contrast with all siblings. The inclusion of 'including receipt verification and passport data' adds specificity but slightly muddles the primary purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'get_credit_escrow' or 'get_spend_receipt'. It lacks context such as prerequisites, frequency of use, or scenarios where it's most applicable, leaving the agent without clear usage instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_credit_escrowGet Credit EscrowBInspect

List active escrow holds for the tenant.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states 'List active escrow holds', which implies a read-only operation, but does not specify details like pagination, error handling, authentication needs, or rate limits. This leaves significant gaps for a tool with zero annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It is front-loaded and appropriately sized for a simple tool, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (0 parameters, no output schema, no annotations), the description is minimally adequate but lacks depth. It does not explain what 'active escrow holds' entail or the return format, which could be important for an AI agent to understand the output fully. With no annotations, more context would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has 0 parameters, and the schema description coverage is 100%, so there is no need for parameter details in the description. The description does not add or detract from parameter semantics, aligning with the baseline for zero parameters, but it could have mentioned any implicit filters (e.g., 'active' status) for a higher score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('List') and resource ('active escrow holds for the tenant'), making the purpose specific and understandable. However, it does not explicitly differentiate from sibling tools like 'get_credit_balance', which might be related, so it lacks sibling distinction for a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as other 'get_' tools in the sibling list. It implies usage for listing escrow holds but offers no context on prerequisites, exclusions, or comparisons to similar tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_daily_briefingGet Daily BriefingCInspect

Get the daily knowledge briefing.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure but offers minimal insight. It implies a read-only operation ('Get') but doesn't specify if it requires authentication, has rate limits, returns structured or unstructured data, or involves any side effects. This leaves significant behavioral gaps for a tool with zero annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with no wasted words. It's appropriately sized for a simple tool and front-loaded with the core action, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's apparent simplicity (0 parameters) but lack of annotations and output schema, the description is incomplete. It doesn't explain what the briefing contains, its format, or how it's generated, leaving the agent unsure of the tool's utility or return value in a context with many sibling tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has 0 parameters, and the input schema has 100% description coverage (though empty). The description doesn't need to compensate for missing parameter info, so it meets the baseline for a parameterless tool. No additional semantic value is required or provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose2/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Get the daily knowledge briefing' is essentially a tautology that restates the tool name 'get_daily_briefing' without adding meaningful specificity. It doesn't explain what constitutes a 'knowledge briefing' or what content it contains, making the purpose vague despite the clear verb+resource structure.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The description doesn't mention frequency (e.g., once per day), prerequisites, or how it differs from sibling tools like 'get_journal' or 'get_session_context', leaving the agent without contextual usage cues.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_domain_affinityGet Domain AffinityBInspect

Get the agent's domain affinity scores.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool retrieves scores but doesn't explain what 'domain affinity' means, how scores are calculated, whether this is a read-only operation, or what the output format might be. For a tool with zero annotation coverage, this leaves critical behavioral traits unspecified.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It is front-loaded and wastes no space, making it easy for an AI agent to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 0 parameters and no output schema, the description is minimally adequate but lacks depth. It doesn't explain the concept of 'domain affinity' or provide context on usage, which could hinder an AI agent's ability to invoke it correctly in complex scenarios. The absence of annotations exacerbates this gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0 parameters with 100% coverage, so no parameter documentation is needed. The description doesn't add any parameter information, which is appropriate. Baseline 4 is applied as per the rules for 0 parameters, indicating the description doesn't need to compensate for schema gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Get') and resource ('domain affinity scores'), specifying what the tool does. It distinguishes from siblings by focusing on domain affinity scores, which is a unique concept among the listed tools. However, it doesn't explicitly differentiate from similar-sounding tools like 'get_trust_level' or 'get_reasoning_profile', which might also involve scoring or assessment.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, context, or exclusions, such as whether it requires specific permissions or is only applicable in certain scenarios. Given the many sibling tools, this lack of differentiation is a significant gap.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_economy_dashboardGet Economy DashboardBInspect

Retrieve the economy dashboard for the agent, showing balances, recent transactions, and spending summaries.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It states 'retrieve' which implies a read-only operation, but doesn't disclose behavioral traits such as authentication needs, rate limits, data freshness, or whether it's cached. For a tool with zero annotation coverage, this is insufficient, as it misses key operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the purpose and details the data shown. Every word earns its place with no redundancy or fluff, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity is low (0 parameters, no output schema), the description is complete enough to understand the basic purpose. However, without annotations or output schema, it lacks details on behavioral aspects and return format, leaving gaps for the agent to operate effectively in a real context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has 0 parameters with 100% schema description coverage, so the baseline is 4. The description doesn't need to add parameter details, and it doesn't introduce any confusion about inputs, making this adequate for a parameterless tool.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'retrieve' and the resource 'economy dashboard', specifying what data it shows: balances, recent transactions, and spending summaries. It distinguishes from siblings like get_credit_balance by providing a broader dashboard view rather than a single metric. However, it doesn't explicitly differentiate from all potential dashboard-related tools, keeping it at 4.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, timing, or compare to siblings like get_credit_balance or get_spend_receipt, leaving the agent to infer usage based on the name alone. This lack of explicit when/when-not statements results in a score of 2.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_feature_requestsGet Feature RequestsBInspect

List feature requests with optional filtering by category and status.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of requests to return.
`cursor`	No	Pagination cursor from a previous response.
`status`	No	Filter by feature request status.
`category`	No	Filter by feature request category.

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. While 'List' implies a read-only operation, it doesn't explicitly state whether this is safe, whether it requires authentication, what the return format looks like, or if there are rate limits. For a tool with 4 parameters and no annotation coverage, this leaves significant behavioral questions unanswered.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that states the core action and key capabilities without any wasted words. It's appropriately sized for a straightforward list operation and gets directly to the point.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a list tool with no annotations and no output schema, the description is minimally adequate. It covers the basic purpose and hints at filtering, but doesn't address behavioral aspects like pagination (implied by 'cursor' parameter), return format, or error conditions. Given the 4 parameters and lack of structured metadata, it should provide more context about what the tool actually returns.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, meaning all parameters are documented in the schema itself. The description mentions 'optional filtering by category and status,' which aligns with two of the four parameters but doesn't add meaningful semantic context beyond what the schema already provides. This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('List') and resource ('feature requests'), making the purpose immediately understandable. It also mentions optional filtering capabilities, which adds specificity. However, it doesn't distinguish this tool from potential sibling list operations (like 'get_watches' or 'get_watch_alerts'), so it doesn't reach the highest score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention any prerequisites, context for filtering, or relationship to other tools like 'submit_feature_request' or 'vote_feature_request' that appear in the sibling list. The agent receives no help in choosing between this and other list operations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_journalGet JournalCInspect

Fetch journal entries for the active agent, with optional filtering by time range, entry type, and pagination.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of entries to return.
`since`	No	ISO 8601 datetime to fetch entries after.
`types`	No	Entry types to filter by.
`cursor`	No	Pagination cursor from a previous response.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the tool fetches entries with filtering and pagination, but doesn't cover critical aspects like whether this is a read-only operation, potential rate limits, authentication requirements, or what the return format looks like. This leaves significant gaps for an agent to understand the tool's behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose ('fetch journal entries for the active agent') followed by key capabilities. There's no wasted language or redundancy, making it appropriately concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 4 parameters, no annotations, and no output schema, the description is insufficient. It doesn't explain what journal entries contain, how results are structured, whether there are default limits, or error conditions. The agent lacks critical context to use this tool effectively despite the good parameter documentation in the schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description mentions filtering by time range, entry type, and pagination, which maps to the 'since', 'types', and 'cursor' parameters. However, with 100% schema description coverage, the schema already documents all four parameters thoroughly. The description adds minimal value beyond what's in the schema, meeting the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('fetch journal entries') and target resource ('for the active agent'), making the purpose unambiguous. However, it doesn't differentiate from sibling tools like 'action_journal.query' or 'get_session_context' which might have overlapping functionality, preventing a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions optional filtering by time range, entry type, and pagination, which implies some usage context, but provides no explicit guidance on when to use this tool versus alternatives like 'action_journal.query' or 'get_session_context'. There's no mention of prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_knowledge_gapsGet Knowledge GapsCInspect

Identify knowledge gaps across domains.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of results.
`domain`	No	Filter by domain.
`offset`	No	Offset for pagination.

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. 'Identify knowledge gaps' suggests a read-only analysis operation, but doesn't disclose whether this requires specific permissions, what format the results take, whether it's computationally expensive, or how 'knowledge gaps' are determined. The description lacks behavioral details about the tool's operation beyond its basic purpose.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise at just one sentence with 5 words. While this is efficient, it may be too brief given the complexity of identifying 'knowledge gaps' - a concept that likely requires more explanation. Every word earns its place, but the description might benefit from slightly more elaboration to be truly helpful.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool that presumably analyzes complex knowledge structures across domains, the description is insufficient. With no annotations, no output schema, and a vague purpose statement, an agent would struggle to understand what this tool actually returns or how to interpret its results. The description doesn't compensate for the lack of structured metadata about the tool's behavior and outputs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with all three parameters (limit, domain, offset) having clear descriptions in the schema. The tool description adds no parameter information beyond what's already documented in the structured schema. According to scoring rules, when schema coverage is high (>80%), the baseline is 3 even with no param info in the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Identify knowledge gaps across domains' states a general purpose but lacks specificity about what constitutes a 'knowledge gap' or how they are identified. It mentions 'across domains' which provides some scope, but doesn't distinguish this tool from potential siblings that might analyze knowledge in other ways. The verb 'identify' is clear but the object 'knowledge gaps' is vague without operational definition.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With many sibling tools like 'get_beliefs', 'get_domain_affinity', 'get_reasoning_profile', and 'find_contradictions' that might relate to knowledge analysis, there's no indication of when this specific gap identification tool is appropriate versus those other tools. No context, prerequisites, or exclusions are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_passportGet PassportBInspect

Retrieve the agent's trust passport.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the action ('Retrieve') but doesn't describe what a 'trust passport' entails, how it's formatted, any authentication requirements, rate limits, or error conditions. This leaves significant gaps for a tool that likely returns sensitive or structured data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without any fluff or redundancy. It's front-loaded and wastes no words, making it highly concise and well-structured for quick understanding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of annotations and output schema, the description is incomplete. It doesn't explain what a 'trust passport' is, what data it returns, or how to interpret the result. For a tool that likely provides critical agent information, more context is needed to guide effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has 0 parameters, and the schema description coverage is 100%, so there are no parameters to document. The description doesn't need to add parameter semantics, and it appropriately avoids unnecessary details, earning a baseline score of 4 for this dimension.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Retrieve') and resource ('the agent's trust passport'), making the purpose specific and understandable. It distinguishes this from sibling tools like 'verify_passport' by focusing on retrieval rather than verification, though it doesn't explicitly contrast with all siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'get_trust_level' or 'verify_passport'. It lacks context about prerequisites, timing, or scenarios where this tool is appropriate, leaving usage entirely implicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_pricingGet PricingBInspect

Retrieve current pricing information for the tenant.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It states 'Retrieve' which implies a read operation, but doesn't disclose behavioral traits like whether this requires authentication, returns real-time or cached data, has rate limits, or what format the pricing information comes in. This leaves significant gaps for a tool that likely involves sensitive financial data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that gets straight to the point with no wasted words. It's appropriately sized for a simple retrieval tool and front-loads the essential information without unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no annotations, no output schema, and potentially complex financial data, the description is insufficient. It doesn't explain what 'pricing information' includes (e.g., plans, rates, tiers), how current the data is, or what format it returns. The agent would need to guess about the tool's behavior and output structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has 0 parameters with 100% schema description coverage, so the schema already fully documents the input requirements. The description doesn't need to add parameter information, and it appropriately doesn't mention any parameters. A baseline of 4 is appropriate for parameterless tools.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Retrieve') and resource ('current pricing information for the tenant'), making the purpose understandable. However, it doesn't differentiate from potential sibling tools like 'get_credit_balance' or 'get_spend_receipt' that might also involve financial information, leaving some ambiguity about scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With sibling tools like 'get_credit_balance' and 'get_spend_receipt' present, there's no indication of whether this tool is for general pricing, subscription plans, or specific cost calculations, leaving the agent to guess about appropriate contexts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_project_contextGet Crux Project ContextAInspect

Detail for a single project — planning target, allowed passports, working tenants.

ParametersJSON Schema

Name	Required	Description	Default
`project_id`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the behavioral burden. It discloses the tool returns specific project details and implies a read-only operation. It also transparently flags backward-compatibility status and the preferred alternative path, adding context beyond raw purpose.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief: a one-line summary followed by a necessary usage note. Every sentence adds value, and the critical guidance (prefer cuecrux_session) is front-loaded. No extraneous text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema), the description covers the essence: what it returns and how to use it properly. It lacks details on errors or output format, but the core context is sufficient for a straightforward fetch operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, and the description does not elaborate on the single parameter 'project_id' beyond implying it identifies the project. The format, source, or example are missing, leaving the agent with minimal guidance for constructing a valid input.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides 'detail for a single project' including specific fields like 'planning target, allowed passports, working tenants.' This distinguishes it from sibling tools like 'list_projects' that list all projects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises to prefer 'cuecrux_session' as the primary call, explaining that it returns a routing plan for optimal channel, tier, and cost class. It notes this tool is directly callable for backward compatibility but the 'collapsed surface is the intended surface,' giving clear when-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_proof_chunksGet Proof ChunksAInspect

Retrieve the chunk-level hashes for a completed proof job. Supports cursor-based pagination.

ParametersJSON Schema

Name	Required	Description	Default
`cursor`	No	Pagination cursor for the next page of chunks
`proof_job_id`	Yes	The proof job ID

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It adds useful context by specifying that it 'supports cursor-based pagination,' which clarifies how results are handled. However, it lacks details on permissions, rate limits, error conditions, or the format of returned hashes, leaving gaps in behavioral understanding for a tool with no annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first clause and adds a useful behavioral detail in the second sentence. It is appropriately sized with zero waste, making it efficient and easy to parse for an AI agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (retrieving hashes with pagination), no annotations, and no output schema, the description provides basic context but lacks completeness. It covers the purpose and pagination behavior but misses details on output format, error handling, or prerequisites, which could hinder effective tool invocation without further information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with clear documentation for both parameters ('proof_job_id' and 'cursor'). The description does not add any additional meaning beyond what the schema provides, such as explaining parameter interactions or constraints. According to the rules, with high schema coverage, the baseline score is 3, as the schema adequately handles parameter semantics without extra description input.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('retrieve') and resource ('chunk-level hashes for a completed proof job'), making the purpose specific and understandable. However, it does not explicitly differentiate this tool from sibling tools like 'get_proof_receipt' or 'get_proof_status', which might also relate to proof jobs, leaving some ambiguity in sibling context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by mentioning 'completed proof job,' suggesting it should be used after a proof job is done, but it does not provide explicit guidance on when to use this tool versus alternatives like 'get_proof_receipt' or 'get_proof_status.' No exclusions or clear alternatives are stated, leaving usage context somewhat vague.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_proofpackGet ProofpackCInspect

Download the full proofpack bundle for a receipt. Includes all chunk hashes, Merkle tree, signature, and verification instructions.

ParametersJSON Schema

Name	Required	Description	Default
`receipt_id`	Yes	The receipt ID to get the proofpack for

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the bundle contents (chunk hashes, Merkle tree, etc.) and that it's for verification, but doesn't cover critical aspects like whether this is a read-only operation, authentication requirements, rate limits, error conditions, or what the download format/response looks like. For a tool with zero annotation coverage, this leaves significant gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise - a single sentence that efficiently communicates the core functionality and bundle contents without any wasted words. It's front-loaded with the main action and resource, making it immediately understandable.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no annotations and no output schema, the description is insufficiently complete. While it explains what a proofpack contains, it doesn't describe the return format, error handling, authentication needs, or operational constraints. Given the complexity implied by terms like 'Merkle tree' and 'verification instructions', more context would be helpful for an AI agent to use this tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, with the single parameter 'receipt_id' fully documented in the schema. The description doesn't add any additional parameter semantics beyond what the schema already provides (e.g., format examples, validation rules, or context about what constitutes a valid receipt_id). The baseline score of 3 reflects adequate coverage through the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Download') and resource ('full proofpack bundle for a receipt'), specifying what the tool does. It distinguishes from siblings like 'get_proof_chunks' or 'get_proof_receipt' by emphasizing the comprehensive bundle nature. However, it doesn't explicitly contrast with these siblings in the description text itself.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like 'get_proof_chunks' or 'get_proof_receipt'. The description implies it's for obtaining a complete verification package, but lacks explicit context about prerequisites, timing, or comparisons to sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_proof_receiptGet Proof ReceiptCInspect

Retrieve the cryptographic proof receipt for a specific answer. Contains the Merkle root, signature, and verification metadata.

ParametersJSON Schema

Name	Required	Description	Default
`answer_id`	Yes	The answer ID to get the proof receipt for

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions the content of the receipt but does not cover critical aspects like whether this is a read-only operation, authentication requirements, rate limits, error handling, or response format. This leaves significant gaps for a tool that likely involves cryptographic data retrieval.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that efficiently conveys the purpose and key components without any wasted words. It is front-loaded with the main action and resource, making it highly concise and effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of cryptographic proof receipts and the lack of annotations and output schema, the description is incomplete. It does not explain the return values, error conditions, or behavioral traits, leaving the agent with insufficient context for reliable invocation in a potentially sensitive domain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the single parameter 'answer_id' clearly documented. The description does not add any additional meaning beyond the schema, such as format examples or constraints, but the schema is sufficient, so the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Retrieve') and resource ('cryptographic proof receipt for a specific answer'), with specific components listed (Merkle root, signature, verification metadata). However, it does not explicitly differentiate from sibling tools like 'get_proof_chunks' or 'get_proof_status', which may have overlapping domains, so it falls short of a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives (e.g., 'get_proof_chunks' or 'get_proof_status'), nor does it mention any prerequisites or exclusions. Usage is implied by the purpose but lacks explicit context for selection among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_proof_statusGet Proof StatusAInspect

Poll the status of a proof job. Returns the current state (queued, processing, complete, failed) and progress details.

ParametersJSON Schema

Name	Required	Description	Default
`proof_job_id`	Yes	The proof job ID to check

Tool Definition Quality

A3.5/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions the tool 'polls' status and returns state/progress details, which implies a read-only operation, but it does not disclose critical behaviors such as rate limits, authentication requirements, or whether it's idempotent. For a status-checking tool with zero annotation coverage, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that efficiently conveys the tool's action ('poll'), target ('proof job'), and return information ('state and progress details'). It is front-loaded with the core purpose and avoids unnecessary details, making it highly concise and effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (one parameter, no output schema, no annotations), the description is minimally complete but lacks depth. It covers the basic purpose and return types, but without annotations or output schema, it misses details like error handling or response structure. This is adequate for a simple status tool but could be more informative.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with the single parameter 'proof_job_id' fully documented in the schema. The description does not add any additional meaning beyond what the schema provides (e.g., format examples or constraints), so it meets the baseline of 3 for adequate but not enhanced parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('poll') and resource ('proof job'), distinguishing it from siblings like 'get_proof_chunks' or 'get_proof_receipt' by focusing on status monitoring rather than content retrieval. It explicitly mentions what information is returned (state and progress details), making the function unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for checking the status of a proof job, but it does not explicitly state when to use this tool versus alternatives (e.g., 'get_proof_receipt' for results or 'get_proof_chunks' for content). No exclusions or prerequisites are mentioned, leaving the agent to infer context from the tool name and description alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_reasoning_profileGet Reasoning ProfileBInspect

Get the agent's current reasoning profile.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It states this is a retrieval operation but provides no information about authentication requirements, rate limits, error conditions, or what format the reasoning profile returns. For a tool with zero annotation coverage, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that communicates the core purpose without any wasted words. It's appropriately sized for a simple retrieval tool and front-loads the essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations, no output schema, and a simple zero-parameter design, the description is insufficient. It doesn't explain what a 'reasoning profile' contains, how it's structured, or what the agent should expect as output. For a tool that presumably returns configuration data, more context about the return value would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has zero parameters, and schema description coverage is 100%, so there are no parameters to document. The description appropriately doesn't mention parameters, which is correct for a parameterless tool. A baseline of 4 reflects that the description doesn't need to compensate for any parameter documentation gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get') and resource ('agent's current reasoning profile'), making the purpose immediately understandable. However, it doesn't differentiate this tool from its sibling 'set_reasoning_profile' beyond the verb difference, missing an opportunity to clarify the read-only vs. write distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided about when to use this tool versus alternatives. While the name implies retrieval, there's no mention of prerequisites, typical use cases, or how it relates to sibling tools like 'set_reasoning_profile' or 'get_session_context'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_session_contextGet Session ContextBInspect

Retrieve the current session context for the active agent, including recent interactions and state.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool retrieves data, implying a read-only operation, but doesn't specify permissions, rate limits, data format, or whether it's real-time vs. cached. For a tool with zero annotation coverage, this leaves significant gaps in understanding its behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action ('retrieve') and resource. It wastes no words, making it easy to parse and understand quickly without unnecessary detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (0 parameters, no output schema, no annotations), the description is minimally adequate. It explains what the tool does but lacks details on behavior, output format, or usage context, which could be helpful for an agent despite the simple schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0 parameters with 100% coverage, so no parameter documentation is needed. The description appropriately doesn't discuss parameters, focusing on the tool's purpose instead, which aligns with the schema's completeness.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('retrieve') and resource ('current session context for the active agent'), specifying what the tool does. It distinguishes from siblings by focusing on session context rather than other data types like beliefs, receipts, or policies, though it doesn't explicitly contrast with similar tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, timing, or compare it to siblings like 'get_journal' or 'annotate_session', leaving the agent to infer usage context without explicit direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_spend_receiptGet Spend ReceiptCInspect

Retrieve a specific spend receipt by ID.

ParametersJSON Schema

Name	Required	Description	Default
`receipt_id`	Yes	The receipt ID to look up.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It only states the retrieval action without mentioning permissions, rate limits, error handling, or what the output looks like (e.g., receipt details format). This is inadequate for a tool that likely involves sensitive financial data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without any fluff. It's front-loaded and appropriately sized, making it easy for an agent to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete. It lacks details on behavioral traits (e.g., authentication needs, response format) and doesn't provide enough context for a retrieval tool in a complex system with many sibling tools, leaving significant gaps for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the input schema already documents the 'receipt_id' parameter fully. The description adds no additional meaning beyond implying it's for lookup, which aligns with the schema but doesn't compensate or enhance it, meeting the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Retrieve') and resource ('spend receipt'), specifying it's for a specific receipt by ID. However, it doesn't distinguish this from sibling tools like 'get_proof_receipt' or 'diff_receipts', which might handle similar receipt-related operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. For example, it doesn't mention if this is for viewing details after a purchase or how it differs from 'get_proof_receipt' or 'diff_receipts' in the sibling list, leaving the agent without context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_stale_pinsGet Stale PinsBInspect

List pinned items for the active agent that may be outdated and need refresh or removal.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Maximum number of stale pins to return.
`cursor`	No	Pagination cursor from a previous response.

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the tool lists items 'that may be outdated and need refresh or removal,' hinting at a read-only diagnostic function, but doesn't clarify permissions, rate limits, response format, or whether the tool itself performs any actions on the pins. This leaves significant gaps for a tool with potential mutation implications.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose without unnecessary words. Every part of the sentence contributes meaning, making it appropriately sized and well-structured for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of annotations and output schema, the description is minimally complete. It explains what the tool does but omits critical details like return format, error conditions, or how 'stale' is determined. For a tool with potential read/write implications and no structured safety hints, this leaves the agent under-informed about behavioral expectations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with clear documentation for 'limit' and 'cursor' parameters. The description adds no additional parameter semantics beyond what the schema provides, such as explaining what constitutes 'stale' or default behaviors. This meets the baseline score of 3 for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'List pinned items for the active agent that may be outdated and need refresh or removal.' It specifies the verb ('List'), resource ('pinned items'), and scope ('for the active agent'), but doesn't explicitly differentiate from sibling tools like 'get_watches' or 'get_watch_alerts' which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, exclusions, or compare it to sibling tools like 'get_watches' or 'get_watch_alerts', leaving the agent to infer usage context solely from the tool name and description.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_trust_levelGet Trust LevelBInspect

Get the current agent's trust escalation level.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It states the tool retrieves the 'current' trust level, implying it's a read operation, but does not disclose behavioral traits like authentication needs, rate limits, or what 'current' entails (e.g., real-time vs. cached). This leaves significant gaps for a tool with no annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It is front-loaded and wastes no space, making it highly concise and well-structured for its simplicity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has no parameters, no annotations, and no output schema, the description is minimally adequate but incomplete. It explains what the tool does but lacks details on return values, error conditions, or behavioral context, which are important for a tool that might involve sensitive trust data.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has 0 parameters, and the input schema has 100% description coverage (though empty). The description does not need to add parameter semantics, as there are none to explain. A baseline of 4 is appropriate since no parameters exist, and the description does not mislead about inputs.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get') and the resource ('current agent's trust escalation level'), making the purpose specific and understandable. However, it does not differentiate from sibling tools like 'get_active_policy' or 'get_credit_balance', which follow a similar 'get X' pattern, so it lacks explicit distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as other 'get_' tools like 'get_active_policy' or 'get_credit_balance'. It implies usage for retrieving trust level but offers no context on prerequisites, timing, or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_watch_alertsGet Watch AlertsCInspect

Retrieve alerts triggered by a specific watch, with optional filtering by time and pagination.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum number of alerts to return.
`since`	No	ISO 8601 datetime to fetch alerts after.
`cursor`	No	Pagination cursor from a previous response.
`watch_id`	Yes	The ID of the watch to retrieve alerts for.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions 'optional filtering by time and pagination,' which hints at some behavior, but lacks critical details like whether this is a read-only operation, if it requires specific permissions, rate limits, or what the response format looks like. For a tool with no annotation coverage, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose and includes key optional features. Every word contributes to understanding without redundancy, making it appropriately concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of retrieving alerts with filtering and pagination, no annotations, and no output schema, the description is incomplete. It doesn't explain the return format, error conditions, or behavioral nuances like ordering of results, which are essential for effective tool use in this context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds minimal value beyond the input schema, which has 100% coverage. It implies filtering by time ('since') and pagination ('cursor', 'limit'), but doesn't provide additional context like default values, typical usage patterns, or how parameters interact. Since the schema already documents all parameters well, the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Retrieve') and resource ('alerts triggered by a specific watch'), making the purpose immediately understandable. However, it doesn't explicitly differentiate from sibling tools like 'get_watches' or 'get_blast_radius', which might also involve alert-related operations, so it doesn't reach the highest score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as whether it's for real-time monitoring or historical analysis, or if other tools like 'query_vault' might be better for broader searches. It mentions optional filtering but doesn't specify use cases or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_watchesGet WatchesBInspect

List all active watches for the current agent.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Maximum number of watches to return.
`cursor`	No	Pagination cursor from a previous response.

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the tool lists active watches but doesn't clarify if this is a read-only operation, whether it requires authentication, or details about rate limits or pagination behavior. This leaves significant gaps in understanding how the tool behaves beyond its basic function.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence that efficiently conveys the tool's purpose without unnecessary words. It is front-loaded with the core action and resource, making it easy to parse and understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (listing with pagination), no annotations, and no output schema, the description is minimally adequate but incomplete. It covers the basic purpose but lacks details on behavior, output format, or error handling, which are important for effective tool use in this context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, so the schema fully documents the 'limit' and 'cursor' parameters. The description adds no additional meaning or context about these parameters, such as default values or usage examples. This meets the baseline score when schema coverage is high.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('List') and resource ('all active watches for the current agent'), making the purpose specific and understandable. However, it doesn't explicitly differentiate from sibling tools like 'get_watch_alerts', which might serve a related but distinct purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as 'get_watch_alerts' or other sibling tools. It lacks context about prerequisites, exclusions, or typical scenarios for invocation, leaving the agent without usage direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

github_comments_sinceRecent GitHub Comments (Crux-Indexed)AInspect

Recent comments across selected repos — the 'what happened since I last looked' surface.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No

Tool Definition Quality

A3.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations exist, and the description does not disclose any behavioral traits such as side effects, authentication requirements, rate limits, or response format. This is a significant gap for a tool that is directly callable.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short, using three sentences. The first sentence is slightly figurative but still informative. The structure is front-loaded with the core purpose, then guidance. It is largely efficient but could be more direct.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple query tool, the description provides the essential purpose and usage guidance. However, it does not explain how the 'selected repos' are determined, nor does it describe the return format. Given no output schema, this leaves some ambiguity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter, 'limit', has 0% schema description coverage. The description does not explain its purpose or behavior beyond what is obvious from the schema constraints (min 1, max 500). It adds no value over the structured definition.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as retrieving recent GitHub comments across selected repos. It uses a specific verb ('recent comments') and resource ('GitHub repos'), and distinguishes it from sibling tools like github_open_issues or github_recent_commits.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises preferring `cuecrux_session` as the primary MCP call, explaining that it returns a capability plan for routing. It also notes backward compatibility but recommends the collapsed surface, providing clear guidance on when to use this tool vs. the alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

github_open_issuesOpen GitHub Issues (Crux-Indexed)BInspect

Indexed issues in a selected repo. Optional label for client-side filtering.

ParametersJSON Schema

Name	Required	Description	Default
`repo`	Yes
`label`	No
`limit`	No

Tool Definition Quality

B3.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must bear full transparency. It only mentions 'indexed' and 'client-side filtering' but omits details on read-only nature, pagination, or return format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two short paragraphs. The first sentence is front-loaded with purpose. The second paragraph, while relevant to usage, detracts slightly from focus on the tool itself.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given three parameters and no output schema or annotations, the description is incomplete. It lacks information on return values, pagination behavior, and the meaning of 'Crux-Indexed', leaving gaps for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, yet the description only explains the 'label' parameter as client-side filtering. The required 'repo' and optional 'limit' (with min/max) are left undocumented, failing to compensate for the schema gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The first sentence clearly states the tool retrieves indexed issues from a selected repo, with an optional label filter. This distinguishes it from siblings like github_open_prs (pull requests) and github_comments_since (comments).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises preferring cuecrux_session over direct use, establishing a usage hierarchy. However, it does not specify when direct use is appropriate or contrast with other GitHub tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

github_open_prsOpen GitHub PRs (Crux-Indexed)AInspect

Indexed PRs in a selected repo (filter open client-side via the returned state field).

ParametersJSON Schema

Name	Required	Description	Default
`repo`	Yes
`limit`	No

Tool Definition Quality

A3.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must fully convey behavioral traits. It only mentions that results can be filtered client-side via the state field, but lacks details on side effects, authentication requirements, rate limits, or mutation behavior. Minimal disclosure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise: one sentence for purpose, one paragraph for usage guidance. Every sentence adds value without redundancy. Front-loaded with the core functionality.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 2 parameters and no output schema, the description covers the repo parameter and the client-side filter, but omits the limit parameter and any details on output format, ordering, or pagination. Adequate but has clear gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description should explain both parameters. It only implicitly covers 'repo' by mentioning 'selected repo', but does not describe the 'limit' parameter at all. Adds some meaning but insufficient coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves indexed PRs from a selected repo, with client-side filtering by state. This distinguishes it from sibling tools like github_open_issues and github_search, making the purpose specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises using cuecrux_session instead of this tool directly, noting that this tool exists for backward compatibility. This provides clear when-to-use and when-not-to-use guidance with a named alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

github_recent_commitsRecent GitHub Commits (Crux-Indexed)BInspect

Recent indexed commits for a selected repo.

ParametersJSON Schema

Name	Required	Description	Default
`repo`	Yes
`limit`	No

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose behavioral traits. It mentions 'indexed' but does not describe what that means for the agent (e.g., caching, staleness). It lacks details on read-only nature, rate limits, or error handling for missing repos.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes a substantial paragraph about routing. The purpose is front-loaded, and the additional text is relevant. It could be slightly more concise by merging the routing guidance into fewer sentences.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has a simple input schema (2 parameters) and no output schema. The description does not explain what 'indexed' means, the output format, or any side effects. The heavy focus on routing leaves the core functionality under-described.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description should explain the parameters. It does not mention `repo` or `limit` at all, leaving the agent to infer meaning from the schema alone. This is insufficient for effective tool use.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns 'recent indexed commits for a selected repo,' which is a specific verb+resource. However, it does not differentiate from sibling tools like github_comments_since or github_open_issues, which also retrieve GitHub data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly directs agents to prefer `cuecrux_session` as the primary call and limits this tool to backward compatibility. It states 'one call per session is enough,' providing clear when-to-use and when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

github_searchSearch Crux-Indexed GitHub CorpusAInspect

Search the indexed GitHub corpus (commits, PRs, issues, comments) under repos selected via /v1/integrations/github/repos.

ParametersJSON Schema

Name	Required	Description	Default
`repo`	No
`query`	Yes
`top_k`	No

Tool Definition Quality

A3.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It omits key behavioral details like read-only nature, rate limits, authentication requirements, result format, or pagination. The only behavioral hint is that repos are pre-selected via another endpoint, but this is insufficient for safe tool invocation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise with two main ideas: purpose and usage guidance. The first sentence is efficient; the second is longer but necessary. Some redundancy exists with the repeated mention of cuecrux_session, but overall structure is acceptable.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 3 parameters, no output schema, and no annotations, the description only covers purpose and basic usage pattern. It lacks parameter details, output description, error handling, and behavioral info, leaving the agent underinformed for a search tool expected to work with varied inputs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description does not explain any parameter specifics. Query is implied as search term, repo and top_k are not defined. This leaves agents guessing about parameter meanings and constraints, especially top_k's maximum of 200.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches an indexed GitHub corpus of commits, PRs, issues, and comments under selected repos. It distinguishes itself from sibling tools like github_open_issues by offering broad search across multiple types, making the purpose specific and distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly recommends using cuecrux_session first for routing and states this tool remains directly callable for backward compatibility. It provides clear when-to-use and when-not-to-use guidance, with a reference to alternative setup via the session.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

invite_seatInvite SeatCInspect

Invite a new member to the organisation by email address.

ParametersJSON Schema

Name	Required	Description	Default
`role`	No	Role to assign to the new member (default: member).
`email`	Yes	Email address of the person to invite.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but only states the basic action. It doesn't disclose behavioral traits like whether this sends an email invitation, requires specific permissions, has rate limits, or what happens on success/failure. For a mutation tool with zero annotation coverage, this is a significant gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with zero waste. It's front-loaded with the core purpose and appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is incomplete. It doesn't cover behavioral aspects like permissions, side effects, or response format, leaving gaps that could hinder an AI agent's correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters ('email' and 'role'). The description adds no additional meaning beyond implying the 'email' parameter is used for invitation, which is already clear from the schema. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('invite') and resource ('new member to the organisation') with the mechanism ('by email address'). It's specific but doesn't explicitly distinguish from sibling tools like 'change_seat_role' or 'revoke_seat', which would require mentioning it's for adding new members rather than modifying existing ones.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like 'change_seat_role' for existing members or 'list_seats' for viewing members. It lacks context such as prerequisites (e.g., admin permissions) or exclusions (e.g., not for updating roles).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

join_coalitionJoin CoalitionCInspect

Join an existing coalition with a credit pledge.

ParametersJSON Schema

Name	Required	Description	Default
`pledge_crux`	No	Pledge amount in crux credits (defaults to 1).
`coalition_id`	Yes	The coalition ID to join.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but lacks critical behavioral details. It mentions a 'credit pledge' but doesn't specify if this is irreversible, requires sufficient credit balance, triggers notifications, or has rate limits. The description implies a write operation ('join') but doesn't disclose permissions, side effects, or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action ('join an existing coalition') and includes the key constraint ('with a credit pledge'). There is no wasted verbiage or redundant information, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is incomplete. It doesn't explain what happens after joining (e.g., confirmation message, updated membership list), error scenarios (e.g., invalid ID, insufficient credits), or dependencies on other tools. Given the complexity of joining a coalition with financial implications, more context is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters thoroughly. The description adds no additional semantic context beyond implying 'pledge_crux' relates to the 'credit pledge' mentioned. This meets the baseline of 3 when schema coverage is high, but doesn't enhance understanding of parameter interactions or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('join') and resource ('existing coalition'), specifying it involves a 'credit pledge'. It distinguishes from sibling 'create_coalition' by focusing on joining rather than creating. However, it doesn't explicitly contrast with other potential sibling tools like 'invite_seat' or 'list_seats' that might involve coalition membership.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing a coalition ID from another tool), exclusions (e.g., cannot join if already a member), or comparisons to related tools like 'invite_seat' or 'get_credit_balance' for checking eligibility.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_projectsList Crux ProjectsAInspect

List all projects on the configured Crux Daemon. Mirrors corecruxd /v1/projects.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so the description must carry full burden. It describes the tool as listing projects and mirroring an API, but does not detail output format, authentication needs, or side effects. Given the simple read-only nature, the description is adequate but lacks explicit behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The purpose is stated concisely in the first sentence. The subsequent usage guidance is valuable but slightly repetitive. Overall efficient, though could be tightened.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple, parameterless listing tool with no output schema, the description covers purpose and usage guidance adequately. Missing details about return format, but it references the mirrored API endpoint, which provides some context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, and schematic coverage is 100%. Baseline for zero params is 4. The description adds no unnecessary parameter information, but is sufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists all projects on the Crux Daemon and mirrors an API endpoint. It distinguishes its purpose from siblings by mentioning the preferred alternative tool, cuecrux_session, thus providing clear differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises preferring cuecrux_session for routing and capability planning, and states this tool is for backward compatibility. This provides clear when-to-use and when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_seatsList SeatsBInspect

List all seats (members) in the current organisation.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Maximum number of seats to return.
`cursor`	No	Pagination cursor from a previous response.

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states 'List all seats' but doesn't mention pagination behavior (implied by 'limit' and 'cursor' parameters), permissions required, rate limits, or what 'all' entails (e.g., active vs. inactive seats). For a listing tool with zero annotation coverage, this leaves significant gaps in understanding how it behaves.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with zero waste. It's front-loaded with the core purpose and uses parentheses to clarify 'seats' as 'members'. Every word earns its place without redundancy or unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (listing operation), 100% schema coverage, and no output schema, the description is minimally adequate. However, it lacks context on pagination behavior, permissions, or return format, which would be helpful for an agent to use it correctly. Without annotations, it should do more to compensate for missing behavioral details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with both 'limit' and 'cursor' parameters fully documented in the schema. The description adds no additional parameter semantics beyond what's in the schema (e.g., it doesn't explain default values, pagination flow, or typical usage patterns). Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('List all seats') and resource ('seats (members) in the current organisation'), providing a specific verb+resource combination. However, it doesn't explicitly distinguish this tool from sibling tools like 'change_seat_role' or 'invite_seat', which would require mentioning that this is a read-only listing operation versus mutation tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing organisation context), exclusions, or comparisons to similar tools like 'get_passport' or 'get_trust_level' that might retrieve related member information. Usage is implied but not explicitly stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_workList Crux Work ItemsBInspect

List work items, optionally filtered by project_id, state, tenant_id, assignee_passport. The kanban surface for cross-agent coordination.

ParametersJSON Schema

Name	Required	Description	Default
`state`	No
`tenant_id`	No
`project_id`	No
`assignee_passport`	No

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must carry full burden for behavioral disclosure. It does not state whether the tool is read-only, what happens when no filters are applied, or any side effects. The metaphor 'kanban surface' is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The first sentence is clear and direct, but the following paragraph about cuecrux_session, while informative, is lengthy for this tool's definition. It could be more concise while still conveying the usage guidance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 optional parameters, no output schema, and no annotations, the description should cover return values, pagination, or default behavior. It lacks any such details, leaving significant gaps for an agent to invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must add meaning to parameters. It merely lists the parameter names from the schema without any additional semantics like format, constraints, or behavior. No value added beyond restating names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists work items with optional filters (project_id, state, tenant_id, assignee_passport), and calls it a 'kanban surface for cross-agent coordination.' This provides a specific verb+resource and a hint of its role, but does not explicitly differentiate from similar list tools like list_projects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises using cuecrux_session first for routing and only using list_work directly for backward compatibility. This provides clear when-to-use and when-not-to-use guidance with a named alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_engrams_resolveMemory Engrams ResolveAInspect

Resolve body content for engrams listed in memory_session_init.available_on_demand. Use when a question matches an on-demand engram's applies_when or trigger_features.

ParametersJSON Schema

Name	Required	Description
`names`	Yes	Engram names in name@version form.
`modelId`	No	The LLM model ID making this call.
`manifestHash`	No	Manifest hash from memory_session_init.

Tool Definition Quality

A3.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must fully disclose behavior. It lacks details on side effects, permissions, error handling, or return format. The verb 'resolve' suggests read-only but is not explicitly stated, leaving ambiguity about safety.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences and an additional paragraph. Every sentence provides value, including the routing advice about cuecrux_session, which is essential for usage. It is well-structured and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of output schema, the description should explain what the tool returns, but it does not. It also does not cover error scenarios or the exact relationship with memory_session_init. Important details for an agent to correctly use the tool are missing.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage, so parameters are documented. The description does not add additional context beyond the schema, meeting the baseline but not exceeding it.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool resolves body content for on-demand engrams from memory_session_init, with a specific verb and resource. It distinguishes itself from sibling tools by referencing cuecrux_session as the preferred routing, making its role unique.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'Use when a question matches an on-demand engram's applies_when or trigger_features.' Also provides guidance to prefer cuecrux_session as an alternative, giving clear usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_reason_aboutMemory Reason AboutAInspect

Reason over previously retrieved memory chunks and optional curated facts using the cached Pattern B prompt.

ParametersJSON Schema

Name	Required	Description
`facts`	No	Optional ESI facts returned by memory_retrieve.
`chunks`	Yes	Chunks returned by one or more memory_retrieve calls.
`intent`	Yes	Intent returned by memory_retrieve.
`question`	Yes	The user question to answer.
`retrievalReceiptIds`	Yes	Receipt IDs from prior memory_retrieve calls.

Tool Definition Quality

A3.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, and the description only mentions 'using the cached Pattern B prompt' without explaining behavioral traits like state changes, permissions, or side effects. It does not add transparency beyond the basic purpose, leaving critical behavioral details unspecified.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, front-loading the primary purpose, then providing usage guidance. Each sentence adds value with no redundancy, making it highly efficient and structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description lacks specification of return value or output format, which is problematic given the absence of an output schema. It implies prerequisites (prior memory retrieval) but does not make them explicit. The tool's complexity with 5 parameters and reasoning task warrants more detail about what the tool produces.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already provides clear parameter definitions. The description adds no additional parameter-level meaning beyond noting the source of chunks and facts. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The tool's description clearly states its purpose: 'Reason over previously retrieved memory chunks and optional curated facts using the cached Pattern B prompt.' The verb 'reason' and resource 'memory chunks and facts' are specific. It distinguishes from related tools like memory_retrieve and prefers cuecrux_session, providing context for selection among siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises the agent to prefer cuecrux_session and notes that this tool is directly callable for backward compatibility. This provides clear guidance on when to use this tool versus the alternative, making it highly actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_retrieveMemory RetrieveAInspect

Retrieve memory chunks, optional curated ESI facts, and passport-driven engrams for Pattern B memory reasoning. The pre_logic field in the response is a ready-to-inject system prompt preamble containing structural data-shape facts calibrated to the calling model's capability class — insert it before reasoning. When deterministic engram pre-execution is enabled, the server may also inline enumerated_facts directly in the response.

ParametersJSON Schema

Name	Required	Description
`query`	Yes	The memory question or retrieval query.
`groupId`	No	Optional enrichment config group.
`modelId`	No	The LLM model ID making this call (e.g. 'claude-sonnet-4-6'). Used to calibrate which engrams are dispatched and how pre_logic is formatted. Omit if unknown.
`iteration`	No	1-based retrieval iteration number.
`sessionId`	No	Optional session identifier for receipt grouping.
`topicHints`	No	Optional topic hints.
`sessionProcedureHash`	No	Hash of a session_procedure already seen by this client. When current, the server omits the procedure body.

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes pre_logic field behavior and calibration, but lacks details on side effects, auth, or rate limits. Without annotations, description carries burden but is incomplete.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two paragraphs, first sets purpose, second adds usage guidance. Efficient but the second paragraph is lengthy; still no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description explains pre_logic and high-level return contents (chunks, facts, engrams). Adequate for understanding return value.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters. The description does not add extra meaning beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it retrieves memory chunks, ESI facts, and engrams for Pattern B reasoning. Specific verb+resource that distinguishes it from many sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly recommends using cuecrux_session first, explains why, and notes that this tool is for backward compatibility. Provides clear when-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_session_initMemory Session InitAInspect

Boot a memory reasoning session and return the server-controlled session procedure, deterministic passport identity, capability class, and engram manifest. Call once before memory_retrieve when the session-init endpoint is enabled.

ParametersJSON Schema

Name	Required	Description	Default
`modelId`	No	The LLM model ID making this call (e.g. 'claude-sonnet-4-6'). Used to calibrate the manifest.

Tool Definition Quality

A4.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It explains the tool returns a plan and is for backward compatibility, but lacks details on side effects, authorization needs, or behavior if called multiple times. The mention of 'backward compatibility' hints at deprecation but doesn't fully disclose implications.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two short, focused paragraphs. The first paragraph states purpose and usage trigger; the second provides strategic context about tool preference. Every sentence adds value, no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description lists exactly what the tool returns and positions it within a workflow (before memory_retrieve, prefer cuecrux_session). For a complex system with many siblings, this is sufficient for an agent to decide correct usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema covers the sole parameter 'modelId' with a clear description. The tool description adds no further meaning beyond what the schema already provides. Baseline 3 is appropriate given 100% schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: 'Boot a memory reasoning session' and enumerates the returned items (session procedure, passport identity, etc.). It distinguishes itself from siblings by referencing 'memory_retrieve' and 'cuecrux_session', ensuring the agent knows its specific role.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to call ('once before memory_retrieve when the session-init endpoint is enabled') and provides a strong alternative: 'Prefer cuecrux_session as your first and only direct MCP call'. Explains why the alternative is better, offering a clear decision rule.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pin_receiptPin ReceiptCInspect

Pin a receipt to prevent it from being garbage collected.

ParametersJSON Schema

Name	Required	Description
`reason`	No	Reason for pinning.
`expires_at`	No	ISO 8601 expiry timestamp.
`receipt_id`	Yes	The receipt ID to pin.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It implies a mutation ('Pin') but doesn't disclose behavioral traits such as required permissions, whether pinning is reversible, effects on system resources, or error conditions. This is inadequate for a tool that likely modifies state.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It's front-loaded and appropriately sized for its function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is insufficient. It lacks details on behavioral context, error handling, and what happens post-pinning, leaving gaps in understanding the tool's full impact.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters. The description adds no meaning beyond the schema, as it doesn't explain parameter interactions or provide examples. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Pin') and resource ('a receipt') with the specific purpose 'to prevent it from being garbage collected.' It distinguishes the tool's function but doesn't explicitly differentiate it from sibling tools like 'get_stale_pins' or 'get_spend_receipt'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The description doesn't mention prerequisites, conditions for pinning, or related tools like 'get_stale_pins' for checking pinned receipts, leaving usage context unclear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

proof_documentProof DocumentAInspect

Submit a document artefact for cryptographic proof. Creates an async proof job that retrieves the artefact, chunks it, hashes each chunk, and produces a Merkle receipt. Returns the job ID for status polling.

ParametersJSON Schema

Name	Required	Description
`mode`	No	Proof mode (default: light)
`metadata`	No	Optional metadata to attach to the proof job
`artefact_id`	Yes	The artefact ID to proof

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing key behavioral traits: it's an async operation ('creates an async proof job'), describes the processing steps ('retrieves the artefact, chunks it, hashes each chunk'), and specifies the return value ('Returns the job ID for status polling'). It doesn't mention error conditions, rate limits, or authentication requirements, but provides substantial operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with zero waste. First sentence states purpose and outcome, second sentence describes the async nature and return value. Every word earns its place, and the information is front-loaded with the core functionality.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 3 parameters, 100% schema coverage, and no output schema, the description provides good context about the async nature and processing steps. It doesn't explain what happens to the proof job after creation or error scenarios, but covers the essential operational behavior well given the available structured data.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema (like explaining 'artefact_id' format or 'mode' implications). Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Submit a document artefact for cryptographic proof'), the resource ('document artefact'), and the outcome ('creates an async proof job...produces a Merkle receipt'). It distinguishes from sibling tools like 'get_proof_status' or 'get_proof_receipt' by focusing on the submission/creation action rather than retrieval operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by mentioning 'async proof job' and 'status polling', suggesting this initiates a process that needs follow-up. However, it doesn't explicitly state when to use this versus alternatives like 'get_proofpack' or 'diff_receipts', nor does it mention prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

purchase_bundlePurchase BundleCInspect

Purchase a credit bundle by ID.

ParametersJSON Schema

Name	Required	Description	Default
`metadata`	No	Optional metadata for the purchase.
`bundle_id`	Yes	The bundle ID to purchase.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states this is a purchase operation (implying a financial transaction and system mutation), but doesn't mention critical aspects like whether this deducts from a credit balance, requires payment authorization, has side effects on user accounts, or what happens on success/failure. This leaves significant gaps for an agent to understand the tool's behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise at just one sentence with no wasted words. It's front-loaded with the core action and immediately specifies the required parameter. Every word earns its place in this minimal but complete statement of purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a financial transaction tool with no annotations and no output schema, the description is insufficient. It doesn't explain what happens after purchase (does it return a receipt? update balances? trigger notifications?), doesn't mention authentication requirements, and provides no error handling context. Given the complexity of a purchase operation, this leaves too many unknowns for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters thoroughly. The description adds no additional semantic context about parameters beyond what's in the schema (e.g., what format bundle_id should be, what metadata is used for, or examples of valid values). This meets the baseline expectation when schema coverage is complete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Purchase') and resource ('a credit bundle by ID'), making the purpose immediately understandable. However, it doesn't differentiate from sibling tools like 'browse_bundles' or 'get_pricing' which might be related to bundles, leaving room for confusion about when to use this specific purchase function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided about when to use this tool versus alternatives like 'browse_bundles' (which might list available bundles) or 'get_pricing' (which might show bundle costs). The description doesn't mention prerequisites such as authentication, payment methods, or whether the user needs sufficient credits/balance to make the purchase.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

query_vaultQuery VaultCInspect

Retrieve relevant documents from the vault using semantic search across one or more corpora.

ParametersJSON Schema

Name	Required	Description
`lane`	No	Retrieval lane controlling depth and cost. Accepts `light\|verified\|audit` — map informal terms (e.g. 'quick'→`light`, 'strict'→`audit`) to the nearest enum value. Defaults to `light`.
`limit`	No	Maximum number of results to return (1-50, default 8).
`query`	Yes	The search query to retrieve documents for.
`corpusIds`	No	Corpus IDs to search within.
`includeCommons`	No	Whether to include common/shared corpora in the search.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the tool performs retrieval via semantic search but omits critical details such as authentication requirements, rate limits, error handling, or the format of returned documents. For a tool with 5 parameters and no output schema, this is a significant gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose without unnecessary words. Every part of the sentence contributes to understanding the tool's function, making it appropriately concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, semantic search functionality) and lack of annotations and output schema, the description is incomplete. It doesn't explain return values, error conditions, or behavioral traits like performance or limitations, leaving significant gaps for an AI agent to use it effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all 5 parameters. The description adds minimal value beyond the schema by implying semantic search functionality and corpus scope, but doesn't provide additional syntax, format details, or examples. Baseline 3 is appropriate when the schema handles parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Retrieve') and resource ('documents from the vault'), and specifies the method ('semantic search across one or more corpora'). It distinguishes itself from siblings like 'query_with_threshold' by focusing on semantic retrieval rather than threshold-based filtering, though it doesn't explicitly name alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'query_with_threshold' or other search-related tools. It mentions the scope ('across one or more corpora') but lacks explicit when/when-not instructions, prerequisites, or named alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

query_with_thresholdQuery with Trust ThresholdCInspect

Execute a trust-routed query that filters results by minimum confidence and respects budget constraints.

ParametersJSON Schema

Name	Required	Description
`query`	Yes	The search query.
`budget_cap`	No	Maximum budget units to spend on this query.
`min_confidence`	No	Minimum confidence threshold (0-1, default 0.8).
`requested_mode`	No	Requested routing mode override.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions 'trust-routed query', 'filters results by minimum confidence', and 'respects budget constraints', which imply some behavioral traits like routing based on trust and cost management. However, it lacks details on permissions, rate limits, error handling, or what happens if budget is exceeded, leaving significant gaps for a tool with potential financial or trust implications.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads key information ('Execute a trust-routed query') and avoids redundancy. Every word contributes to understanding the tool's purpose, making it appropriately concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity implied by terms like 'trust-routed' and budget constraints, along with no annotations and no output schema, the description is incomplete. It doesn't explain the return format, error conditions, or deeper behavioral aspects, leaving the agent with insufficient context to use the tool effectively in varied scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description adds minimal value by implying that 'min_confidence' and 'budget_cap' are used for filtering and constraints, but doesn't provide additional semantics beyond what's in the schema descriptions. This meets the baseline of 3 for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('execute', 'filters') and resources ('trust-routed query', 'results'), specifying it filters by minimum confidence and respects budget constraints. However, it doesn't explicitly distinguish this tool from sibling tools like 'query_vault' or 'get_beliefs', which might also involve querying operations, leaving some ambiguity about its unique role.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as 'query_vault' or other query-related siblings. It mentions filtering by confidence and budget constraints but doesn't specify scenarios, prerequisites, or exclusions, offering minimal usage context beyond the basic purpose.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

register_agentRegister AgentBInspect

Self-register a new agent with the VaultCrux platform. No API key or tenant ID required.

ParametersJSON Schema

Name	Required	Description
`callback_url`	No	URL for the platform to send callbacks to.
`agent_framework`	No	The agent framework being used (default: unknown).
`agent_display_name`	No	A human-readable display name for the agent.
`framework_fingerprint`	No	Unique fingerprint of the agent framework instance.

Tool Definition Quality

B3.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions that no API key or tenant ID is required, which is useful context about authentication needs. However, it lacks details on what the registration entails (e.g., whether it creates persistent resources, requires confirmation, has rate limits, or returns specific data like an agent ID), leaving significant gaps for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the key information ('self-register a new agent') and adds necessary context ('No API key or tenant ID required') without any wasted words. Every part earns its place, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that this is a mutation tool with no annotations and no output schema, the description is incomplete. It covers the purpose and authentication context but lacks details on behavioral outcomes (e.g., what happens after registration, error conditions, or return values), which are critical for an agent to use this tool effectively in a complex environment with many sibling tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, so the schema already documents all four parameters thoroughly. The description doesn't add any meaning beyond what the schema provides (e.g., it doesn't explain parameter relationships or usage examples), resulting in a baseline score of 3 where the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('self-register') and resource ('new agent with the VaultCrux platform'), making the purpose unambiguous. However, it doesn't explicitly differentiate this from sibling tools like 'invite_seat' or 'create_coalition', which might also involve agent/entity creation in different contexts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('No API key or tenant ID required'), indicating it's for initial registration without existing credentials. However, it doesn't specify when NOT to use it or name explicit alternatives among the many sibling tools, such as 'invite_seat' for adding users to an existing setup.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

register_beliefRegister BeliefCInspect

ParametersJSON Schema

Name	Required	Description
`answer_id`	No	The answer ID this belief relates to.
`cost_crux`	No	Credit cost of the belief.
`receipt_id`	No	The receipt ID backing this belief.
`confidence_band`	No	Confidence band object.
`decision_context`	No	Context for the decision.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden but only hints at behavior ('trust tracking'). It doesn't disclose whether this is a mutation, requires authentication, has side effects, or how it interacts with the system. More behavioral context is needed for a tool with 5 parameters.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with no wasted words. It's appropriately sized and front-loaded with the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 5-parameter mutation tool with no annotations and no output schema, the description is insufficient. It lacks details on behavior, return values, error conditions, and integration with sibling tools, leaving significant gaps in understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all 5 parameters. The description adds no additional parameter semantics beyond what's in the schema, meeting the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('register') and resource ('belief about an answer'), specifying it's for 'trust tracking'. It distinguishes from obvious siblings like 'get_beliefs' (read vs. write) but doesn't explicitly differentiate from all other tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like 'get_beliefs' or 'watch_answer', nor any prerequisites or contextual triggers for belief registration. The description lacks explicit usage instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

request_sponsorRequest SponsorBInspect

Request a sponsor for the current agent session. Requires a session token (vcrx_self_ prefixed).

ParametersJSON Schema

Name	Required	Description	Default
`session_token`	Yes	Session token obtained from agent registration.

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions the session token requirement, which is a behavioral trait, but doesn't disclose other critical aspects like what 'sponsor' means, whether this is a read or write operation, potential side effects, or response format. For a tool with no annotation coverage, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded, consisting of two clear sentences: one stating the purpose and another specifying the requirement. There is no wasted text, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete. It lacks details on what 'sponsor' entails, the tool's behavioral impact (e.g., read vs. write), and expected outcomes. For a tool with such minimal structured data, the description should provide more context to be fully helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, with the parameter 'session_token' fully documented in the schema. The description adds minimal value by specifying the token prefix ('vcrx_self_'), but doesn't provide additional semantics beyond what the schema already covers. Baseline 3 is appropriate given high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Request a sponsor') and the resource ('for the current agent session'), making the purpose understandable. However, it doesn't differentiate this tool from sibling tools like 'register_agent' or 'get_session_context', which might involve similar session-related operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by specifying 'Requires a session token (vcrx_self_ prefixed)', which suggests this tool is used after agent registration. However, it doesn't explicitly state when to use it versus alternatives (e.g., 'register_agent' for initial setup or 'get_session_context' for session info), leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

revoke_seatRevoke SeatCInspect

Remove a member from the organisation by revoking their seat.

ParametersJSON Schema

Name	Required	Description	Default
`seat_id`	Yes	The ID of the seat to revoke.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states this is a removal action ('remove a member'), implying a destructive mutation, but doesn't specify whether this is reversible, what permissions are required, if it triggers notifications, or what happens to associated data. For a mutation tool with zero annotation coverage, this leaves critical behavioral aspects undocumented.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It's front-loaded with the core action ('Remove a member') and avoids redundancy with the name or title.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (a destructive mutation tool), lack of annotations, and no output schema, the description is incomplete. It doesn't cover behavioral aspects like permissions, reversibility, or side effects, nor does it explain the result (e.g., what confirmation is returned). For a tool that removes organizational members, this leaves significant gaps for an AI agent to use it safely and effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage (the 'seat_id' parameter is fully documented in the schema), so the baseline is 3. The description doesn't add any parameter-specific details beyond what the schema provides (e.g., it doesn't explain how to obtain the seat_id or format constraints), but it doesn't need to since the schema is comprehensive.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('remove a member') and the resource ('from the organisation') with the specific mechanism ('by revoking their seat'). It distinguishes from sibling tools like 'change_seat_role' or 'list_seats' by focusing on removal rather than modification or listing. However, it doesn't explicitly differentiate from other potential removal tools (none present in siblings), so it's not a perfect 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., seat must exist, user must have permissions), exclusions (e.g., cannot revoke own seat), or related tools like 'invite_seat' or 'change_seat_role'. The agent must infer usage from the name and context alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

schedule_recheckSchedule RecheckCInspect

Schedule a periodic re-check of knowledge freshness.

ParametersJSON Schema

Name	Required	Description
`scope`	No	Scope object for the recheck.
`cron_expr`	No	Cron expression (defaults to '0 0 * * *').
`next_run_at`	No	ISO 8601 timestamp for the next run.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the action ('Schedule a periodic re-check') but doesn't describe what happens after scheduling (e.g., how re-checks are triggered, if they require permissions, or if there are rate limits). This is a significant gap for a scheduling tool with no structured safety hints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It's appropriately sized and front-loaded, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of scheduling periodic tasks and the lack of annotations and output schema, the description is insufficient. It doesn't explain what 'knowledge freshness' entails, how the re-check operates, or what the tool returns, leaving critical behavioral aspects undocumented.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters (scope, cron_expr, next_run_at). The description doesn't add any parameter-specific details beyond what's in the schema, such as examples or constraints, resulting in the baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Schedule') and resource ('periodic re-check of knowledge freshness'), making the purpose understandable. However, it doesn't differentiate this tool from potential siblings like 'get_stale_pins' or 'forecast_obsolescence' that might also relate to knowledge freshness, so it doesn't reach the highest score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, exclusions, or related tools, leaving the agent to infer usage context solely from the tool name and purpose.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_policySet PolicyCInspect

Set or update an active policy for the agent.

ParametersJSON Schema

Name	Required	Description
`rules`	No	Policy rules object.
`policy_name`	No	Policy name (defaults to 'default').
`principal_id`	No	Principal ID to apply the policy to.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It implies a write operation ('set or update') but doesn't specify if this requires special permissions, whether changes are permanent or reversible, or what happens on success/failure. This is inadequate for a mutation tool with zero annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with no wasted words. It's front-loaded with the core action and resource, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is incomplete. It lacks details on behavioral traits (e.g., side effects, error handling), usage context, and return values, leaving significant gaps for an AI agent to understand how to invoke it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents the three parameters (rules, policy_name, principal_id). The description adds no additional meaning beyond what's in the schema, such as explaining the structure of 'rules' or how 'principal_id' relates to the agent. Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('set or update') and the resource ('active policy for the agent'), making the purpose understandable. However, it doesn't differentiate from sibling tools like 'get_active_policy' or 'set_reasoning_profile', which could cause confusion about when to use each.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'get_active_policy' (for viewing) or 'set_reasoning_profile' (for a different configuration). There's no mention of prerequisites, such as whether a policy must exist to update it, or context for when setting vs. updating applies.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_reasoning_profileSet Reasoning ProfileCInspect

Set the agent's reasoning methodology profile.

ParametersJSON Schema

Name	Required	Description	Default
`constraints`	No	Reasoning constraints array.
`methodology`	No	Reasoning methodology object.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. 'Set' implies a mutation operation, but the description doesn't specify whether this is reversible, requires specific permissions, has side effects (e.g., affecting other agent functions), or what the expected outcome is. For a mutation tool with zero annotation coverage, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence with zero wasted words. It's front-loaded with the core action and target, making it easy to parse. Every word earns its place, and there's no redundancy or unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (a mutation tool with nested objects and no output schema) and lack of annotations, the description is incomplete. It doesn't explain what a 'reasoning methodology profile' entails, how changes affect the agent's behavior, or what the tool returns (if anything). For a tool that likely impacts core agent functionality, more context is needed to guide proper use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, with both parameters ('constraints' and 'methodology') documented in the schema. The description adds no additional parameter semantics beyond what's in the schema (e.g., no examples of valid constraints or methodology structures). Given the high schema coverage, the baseline score of 3 is appropriate, as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Set') and the target ('the agent's reasoning methodology profile'), which is specific and unambiguous. It distinguishes from the sibling tool 'get_reasoning_profile' by indicating a write operation rather than a read. However, it doesn't fully differentiate from other configuration tools like 'set_policy', leaving some ambiguity about the exact scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., whether the agent must have certain permissions), when it's appropriate (e.g., during setup or dynamic adjustment), or what happens if used incorrectly. With no usage context, the agent must infer from the tool name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

submit_feature_requestSubmit Feature RequestBInspect

Submit a new feature request or suggestion to the VaultCrux product team.

ParametersJSON Schema

Name	Required	Description
`title`	Yes	Short title for the feature request.
`category`	No	Category for the request (default: other).
`metadata`	No	Additional metadata to attach to the request.
`description`	Yes	Detailed description of the requested feature.

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It mentions the action ('Submit') but lacks details on permissions required, whether submissions are public or private, confirmation mechanisms, rate limits, or what happens after submission (e.g., ticket creation, email notification). This leaves significant gaps for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words or fluff. It is appropriately sized and front-loaded with the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is insufficient. It lacks behavioral context (e.g., side effects, response format) and does not compensate for the absence of structured data, leaving the agent with incomplete information for proper invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters (title, category, metadata, description). The description does not add any additional meaning, syntax, or examples beyond what the schema provides, meeting the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Submit a new feature request or suggestion') and identifies the target resource ('VaultCrux product team'), distinguishing it from sibling tools like 'vote_feature_request' or 'get_feature_requests' which involve different operations on similar resources.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like 'vote_feature_request' or 'get_feature_requests', nor any context about prerequisites, timing, or exclusions. The description only states what it does, not when it should be used.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tip_agentTip AgentCInspect

Send a credit tip to another agent.

ParametersJSON Schema

Name	Required	Description
`reason`	No	Reason for the tip.
`amount_crux`	Yes	Tip amount in crux credits.
`recipient_principal_id`	Yes	The recipient agent's principal ID.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the action ('Send a credit tip') but doesn't describe whether this is a transactional operation, if it requires authentication, what happens on success/failure, or if there are rate limits. For a financial tool with zero annotation coverage, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with zero waste. It's front-loaded with the core action and appropriately sized for the tool's complexity. Every word earns its place without being overly brief or verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given this is a financial transaction tool with no annotations and no output schema, the description is incomplete. It doesn't cover behavioral aspects like authorization needs, transaction outcomes, or error handling. While the schema covers parameters well, the overall context for safe and effective use is insufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description doesn't add any meaning beyond what's in the schema (e.g., it doesn't explain what 'crux credits' are or provide context for 'recipient_principal_id'). With high schema coverage, the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Send a credit tip') and the target ('to another agent'), which provides a specific verb+resource combination. However, it doesn't differentiate this tool from sibling tools like 'tip_platform' or explain how tipping an agent differs from tipping the platform, leaving room for improvement.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'tip_platform', nor does it mention prerequisites such as having sufficient credits (implied by 'get_credit_balance' sibling) or appropriate permissions. It lacks explicit when/when-not instructions or named alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tip_platformTip PlatformCInspect

Send a credit tip to the platform. Amount must be a positive number.

ParametersJSON Schema

Name	Required	Description	Default
`amount`	Yes	Tip amount (must be > 0).
`reason`	No	Optional reason for the tip.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions that the amount must be positive, which is useful, but fails to describe critical aspects like whether this is a write operation (implied by 'Send'), what happens upon success (e.g., credit deduction, confirmation), error conditions, or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two sentences that directly convey the core purpose and a key constraint. There is no wasted language, and it is front-loaded with the main action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool that performs a financial transaction (tipping) with no annotations and no output schema, the description is insufficient. It lacks details on behavioral outcomes (e.g., what is returned, error handling), usage context (e.g., authentication needs, credit balance implications), and differentiation from similar tools like 'tip_agent'.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters ('amount' and 'reason') adequately. The description adds minimal value by reiterating that the amount must be positive, but does not provide additional context beyond what the schema specifies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Send a credit tip') and target ('to the platform'), which is specific and unambiguous. However, it does not explicitly differentiate this tool from sibling tools like 'tip_agent', leaving room for potential confusion about when to use each.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as 'tip_agent' (which likely tips an agent instead of the platform). It also lacks information about prerequisites, like whether the user must have sufficient credits or be authenticated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

unwatch_answerUnwatch AnswerAInspect

Remove an existing watch by its watch ID.

ParametersJSON Schema

Name	Required	Description	Default
`watch_id`	Yes	The ID of the watch to remove.

Tool Definition Quality

A3.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool removes a watch, implying a destructive mutation, but does not mention permissions required, whether the action is reversible, error handling (e.g., invalid watch IDs), or side effects. For a mutation tool with zero annotation coverage, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with zero wasted words, front-loading the core action ('Remove an existing watch') and specifying the key input ('by its watch ID'). It is appropriately sized for a simple tool with one parameter.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema, no annotations), the description is minimally adequate but lacks completeness for a mutation tool. It does not cover behavioral aspects like permissions, reversibility, or error responses, which are important for safe invocation by an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaning by specifying that the watch_id parameter corresponds to 'an existing watch' to remove, which clarifies the context beyond the schema's description ('The ID of the watch to remove'). With 100% schema description coverage and only one parameter, the description compensates adequately, earning a baseline above 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Remove') and the resource ('an existing watch by its watch ID'), making the purpose specific and unambiguous. It distinguishes from sibling tools like 'watch_answer' (which creates watches) and 'get_watches' (which lists them), establishing a clear functional boundary.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when a watch needs to be removed, but provides no explicit guidance on when to use this tool versus alternatives (e.g., if there are other ways to manage watches) or any prerequisites (e.g., needing an existing watch ID). It lacks context about when-not to use it or named alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_work_stateUpdate Crux Work StateAInspect

Move a work item to a new state. If the calling passport has agent_work_gate=true the request queues for human approval (returns applied:false).

ParametersJSON Schema

Name	Required	Description	Default
`state`	Yes
`work_id`	Yes
`by_passport`	Yes
`blocker_reason`	No

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses the key behavior that if agent_work_gate=true, the request queues for human approval and returns applied:false. Lacks details on idempotency, invalid state transitions, and exact responses for non-queued cases.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences front-loaded with core purpose, followed by key condition and usage guidance. Some extra info about cuecrux_session is beneficial but slightly tangential. Overall well-structured and concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks explanation of parameter semantics, error handling, and detailed outputs beyond the queuing case. Given no annotations and no output schema, the description is insufficient for a state update tool with conditional behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite no schema descriptions, the description does not explain parameters beyond implicating 'state' and 'by_passport'. The meanings of work_id, by_passport, and blocker_reason are not clarified, leaving significant gaps for a tool with 4 parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool's action ('Move a work item to a new state'), specifying the verb and resource. It distinguishes itself from sibling tools like create_work or comment_on_work by focusing on state transitions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises preferring cuecrux_session for routing, and notes this tool is for backward compatibility. Also mentions the condition about agent_work_gate leading to human approval. However, it does not compare with other work state-related siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

verify_passportVerify PassportCInspect

Verify another agent's trust passport.

ParametersJSON Schema

Name	Required	Description	Default
`principal_id`	No	Principal ID to verify (defaults to own agent ID).

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It states the action is to 'verify,' which implies a read operation, but doesn't disclose behavioral traits such as authentication needs, rate limits, what 'verify' entails (e.g., checks validity, returns status), or potential side effects. This is a significant gap for a tool with no annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence with zero waste. It's front-loaded and efficiently conveys the core purpose without unnecessary elaboration, making it easy for an agent to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of verification operations and the lack of annotations and output schema, the description is incomplete. It doesn't explain what 'verify' means in practice, what the expected output is, or any error conditions, leaving the agent with insufficient context to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with the single parameter 'principal_id' documented as 'Principal ID to verify (defaults to own agent ID).' The description adds no additional meaning beyond this, as it doesn't elaborate on parameter usage or implications. Baseline 3 is appropriate since the schema fully describes the parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Verify another agent's trust passport' clearly states the action (verify) and the resource (trust passport), with the qualifier 'another agent's' indicating it's for external verification. However, it doesn't explicitly differentiate from sibling tools like 'get_passport' or 'get_trust_level', which might retrieve rather than verify.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The description implies verification of another agent's passport, but it doesn't specify use cases, prerequisites, or exclusions, leaving the agent to infer context from the tool name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vote_feature_requestVote on Feature RequestCInspect

Cast an upvote on an existing feature request to signal interest.

ParametersJSON Schema

Name	Required	Description	Default
`metadata`	No	Additional metadata to attach to the vote.
`request_id`	Yes	The ID of the feature request to vote on.

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It states the action is to 'cast an upvote', implying a write operation, but doesn't disclose behavioral traits such as whether this requires authentication, if votes are reversible, rate limits, or what happens on success/failure. The description is minimal and lacks critical operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action and purpose. There's no wasted wording, and it's appropriately sized for a simple tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete for a mutation tool. It lacks details on permissions, side effects, response format, or error handling. While concise, it doesn't provide enough context for safe and effective use by an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters ('request_id' and 'metadata'). The description doesn't add any meaning beyond the schema—it doesn't explain parameter usage, constraints, or examples. The baseline score of 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Cast an upvote') and the target resource ('an existing feature request'), with the goal 'to signal interest'. It's specific about the verb and resource, but doesn't explicitly differentiate from sibling tools like 'submit_feature_request' or 'get_feature_requests' beyond the action itself.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing an existing feature request ID), exclusions, or comparisons to sibling tools like 'get_feature_requests' for viewing requests or 'submit_feature_request' for creating them.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

watch_answerWatch AnswerBInspect

Create a watch on an answer to receive alerts when it changes or becomes stale.

ParametersJSON Schema

Name	Required	Description	Default
`answer_id`	Yes	The ID of the answer to watch.
`frequency`	No	How often to check for changes (default: daily).

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but only states the outcome ('receive alerts') without detailing alert mechanisms (e.g., notifications, webhooks), permissions required, rate limits, or whether watches are user-specific. It mentions 'changes or becomes stale' but doesn't define 'stale' criteria, leaving behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the purpose and outcome with zero wasted words. It directly communicates the tool's function without unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete for a mutation tool ('Create'). It lacks details on return values (e.g., watch ID, confirmation), error conditions, or side effects, leaving significant gaps for an agent to understand full behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters ('answer_id' and 'frequency'). The description adds no additional meaning beyond implying 'answer_id' is needed for watching, but doesn't clarify parameter interactions or usage beyond the schema's details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Create a watch'), target resource ('on an answer'), and outcome ('to receive alerts when it changes or becomes stale'). It distinguishes itself from sibling tools like 'unwatch_answer' and 'get_watch_alerts' by focusing on creation rather than removal or retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'unwatch_answer' or 'get_watches', nor does it mention prerequisites (e.g., needing answer access) or exclusions. Usage is implied through the action but lacks explicit context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Resources

Need Help?