Ace Memory
Server Details
Persistent memory for AI agents. Semantic search, memory graph, W3C DID identity.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 3.3/5 across 82 of 82 tools scored. Lowest: 2.6/5.
The tool set has significant overlap and ambiguity, particularly in memory and context management. For example, 'agent_memory_store', 'remember', and 'observe' all handle memory storage with unclear distinctions, while 'assemble_context' and 'session_start' both assemble context but with different approaches. This overlap makes it difficult for an agent to reliably choose the correct tool without deep domain knowledge.
Most tools follow a consistent verb_noun or noun_verb pattern (e.g., 'agent_create', 'memory_export', 'privacy_check_access'), with clear prefixes like 'agent_', 'memory_', 'skill_' for grouping. However, there are minor deviations such as 'link', 'observe', 'recall', and 'whoami' that break this pattern, slightly reducing overall consistency.
With 82 tools, the count is excessive for a memory management server, leading to bloat and potential confusion. Many tools could be consolidated (e.g., multiple memory storage and context assembly tools) or omitted without losing functionality. This large number overwhelms the core purpose and suggests poor scoping.
The tool set provides comprehensive coverage for agent memory management, including creation, updating, deletion, benchmarking, dreaming, privacy, skills, and context assembly. It supports full CRUD operations across agents, memories, goals, and other entities, with no apparent gaps in the domain's lifecycle or workflows.
Available Tools
82 toolsagent_benchmark_historyBInspect
Get benchmark run history for the tenant. Optionally filter by run type.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum results to return. Default: 50. | |
| runType | No | Filter by benchmark type. Omit to get all. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions optional filtering but fails to describe key traits such as pagination behavior (implied by 'limit' parameter), rate limits, authentication requirements, or the format of returned history data. This leaves significant gaps for an agent to understand operational constraints.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core purpose ('Get benchmark run history for the tenant') and adds a concise optional feature ('Optionally filter by run type'). There is no wasted wording, making it easy for an agent to parse quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (2 parameters, no output schema, no annotations), the description is adequate but incomplete. It covers the basic purpose and optional filtering, but without annotations or output schema, it misses details on behavioral traits (e.g., data format, pagination) that would help an agent invoke it correctly. This results in a minimal viable description with clear gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with clear documentation for 'limit' (maximum results, default 50) and 'runType' (filter by benchmark type, enum values). The description adds minimal value beyond the schema by noting the optional filtering, but it doesn't provide additional context like typical use cases or parameter interactions. Baseline 3 is appropriate as the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Get benchmark run history for the tenant' specifies the verb ('Get') and resource ('benchmark run history'), with an optional filtering capability. It distinguishes from most siblings (e.g., agent_benchmark_run, agent_benchmark_trend) by focusing on historical data retrieval, though it doesn't explicitly differentiate from agent_dream_history or similar history tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides implied usage through 'Optionally filter by run type,' suggesting when to apply the runType parameter. However, it lacks explicit guidance on when to use this tool versus alternatives like agent_benchmark_trend (which might analyze trends) or agent_list (which could list other entities), and no exclusions or prerequisites are mentioned.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_benchmark_runAInspect
Run a memory benchmark. Types: "external" (MemoryBench adapter), "internal" (curated test corpus), "production" (telemetry snapshot). Returns MemScore triple (accuracy, latencyMs, contextTokens).
| Name | Required | Description | Default |
|---|---|---|---|
| type | Yes | Benchmark type to run | |
| agentId | Yes | Agent ID to benchmark |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses the return format (MemScore triple) and benchmark types, which is useful behavioral context. However, it omits details like execution time, resource consumption, permissions needed, or whether it's a read-only or mutating operation, leaving gaps for a tool that performs active benchmarking.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured in two sentences: the first states the purpose and types, the second specifies the return value. Every phrase adds value without redundancy, and it's front-loaded with the core action.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no annotations and no output schema, the description partially compensates by explaining the return format (MemScore triple) and benchmark types. However, for a tool that likely involves significant computation or system impact, it lacks details on error conditions, side effects, or output structure beyond the triple names, leaving room for improvement.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already fully documents both parameters (type with enum values, agentId). The description adds no additional parameter semantics beyond what's in the schema, such as explaining what 'external' vs 'internal' entails or agentId format expectations, meeting the baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Run a memory benchmark') and resource ('memory benchmark'), with explicit differentiation from siblings by specifying the unique benchmark types and return format. It distinguishes from tools like agent_benchmark_history or agent_benchmark_trend by focusing on execution rather than historical analysis.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context by listing the three benchmark types and their meanings, which implicitly guides when to use each. However, it lacks explicit when-not-to-use guidance or named alternatives among sibling tools (e.g., when to use agent_benchmark_history instead for past results).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_benchmark_trendBInspect
Get score trend for a specific benchmark over time. Shows how MemScore has changed across runs.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Number of recent runs to include. Default: 20. | |
| agentId | Yes | Agent ID to scope the trend to | |
| benchmarkName | Yes | Benchmark name (e.g., "internal-eval", "memorybench-basic", "production-telemetry") |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden for behavioral disclosure. It mentions that the tool shows 'how MemScore has changed across runs,' which implies read-only behavior and time-series data, but doesn't address permissions, rate limits, data freshness, or error conditions. For a tool with no annotations, this leaves significant behavioral gaps unaddressed.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise - just two sentences that directly state the tool's purpose and what it shows. Every word earns its place with zero redundancy or unnecessary elaboration. It's front-loaded with the core functionality.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 3 parameters, 100% schema coverage, no output schema, and no annotations, the description provides adequate basic purpose but lacks important context. It doesn't explain what format the trend data returns, how time periods are determined, or what 'MemScore' represents. For a trend analysis tool, more output context would be helpful despite the good schema coverage.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description adds no additional parameter information beyond what's in the schema - it doesn't explain relationships between parameters or provide usage examples. This meets the baseline 3 when schema coverage is complete.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Get score trend for a specific benchmark over time' specifies the action (get trend) and resource (benchmark scores). It distinguishes from siblings like agent_benchmark_history or agent_benchmark_run by focusing on trend analysis rather than raw history or individual runs. However, it doesn't explicitly differentiate from all possible alternatives, keeping it at 4 rather than 5.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It doesn't mention when to choose agent_benchmark_trend over agent_benchmark_history or other benchmarking tools, nor does it specify prerequisites or exclusions. The lack of usage context leaves the agent without clear decision criteria.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_context_clearAInspect
Clear the active context for an agent. Use when a task is complete or the agent needs a fresh start.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | The ID of the agent |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It states the tool 'clears' context, implying a destructive mutation, but doesn't specify whether this action is reversible, what exactly gets cleared (e.g., conversation history, temporary variables), or what permissions are required. For a mutation tool with zero annotation coverage, this leaves significant behavioral gaps.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly concise with two sentences that each earn their place: the first states the purpose, the second provides usage guidance. It's front-loaded with the core action and wastes no words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (a destructive operation with one parameter) and the absence of both annotations and output schema, the description is minimally adequate. It explains what the tool does and when to use it, but doesn't address behavioral aspects like side effects, error conditions, or what happens after clearing. The 100% schema coverage helps, but more behavioral context would be needed for completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The schema description coverage is 100% (the single parameter 'agentId' is fully documented in the schema), so the baseline is 3. The description adds no additional parameter information beyond what the schema already provides about the agentId parameter.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Clear') and target ('active context for an agent'), providing a specific verb+resource combination. It distinguishes from sibling tools like 'agent_context_get' (which retrieves context) and 'agent_memory_*' tools (which manage persistent memory). However, it doesn't explicitly differentiate from all siblings like 'memory_clean' or 'context_budget_*' tools, which prevents a perfect score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use the tool ('when a task is complete or the agent needs a fresh start'), giving clear context for its application. It doesn't specify when NOT to use it or name alternatives (like whether 'agent_memory_clean' serves a different purpose), which keeps it from a score of 5.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_context_getCInspect
Get the active context for an agent including current task, goal, and recent memories.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | The ID of the agent |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It states what information is retrieved but does not describe how the context is structured, whether it's real-time or cached, any permissions required, rate limits, or error conditions. For a tool with no annotations, this leaves significant behavioral gaps.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core action and details. It wastes no words and directly communicates the tool's function without redundancy. Every part of the sentence earns its place by specifying what is retrieved.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of retrieving agent context, no annotations, and no output schema, the description is incomplete. It lacks details on the structure of the returned context, how recent memories are defined, whether the operation is idempotent, or any error handling. For a tool with no structured support, more context is needed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with the single parameter 'agentId' fully documented in the schema. The description does not add any meaning beyond the schema, such as explaining what constitutes a valid agent ID or how to obtain it. Baseline 3 is appropriate since the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Get' and the resource 'active context for an agent', specifying what information is retrieved (current task, goal, and recent memories). It distinguishes from siblings like agent_get (general agent info) or agent_memory_query (specific memory queries), though not explicitly named. The purpose is specific but could be more precise about sibling differentiation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It does not mention when to choose it over tools like agent_get (for basic agent details), agent_memory_query (for memory-specific queries), or assemble_context (for context assembly). There is no indication of prerequisites, timing, or exclusions, leaving usage unclear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_createCInspect
Create a new agent with identity. The agent will be assigned a DID and can be linked to a user.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Display name for the agent | |
| ownerId | Yes | User ID of the agent owner | |
| description | No | Description of the agent's purpose and capabilities | |
| personalityTemplate | No | Personality template to apply |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the agent gets a DID and can be linked to a user, but fails to cover critical aspects like required permissions, whether this is a mutating operation, potential side effects, or error conditions. This leaves significant gaps for an agent to understand the tool's behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that gets straight to the point without unnecessary words. It could be slightly improved by front-loading more critical information, but it's appropriately sized and wastes no space.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of creating an agent (a mutating operation with identity implications), no annotations, and no output schema, the description is insufficient. It doesn't explain what happens after creation, what the DID assignment entails, or how linking works, leaving the agent with incomplete context for proper tool invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description adds no additional parameter information beyond what's in the schema, such as explaining relationships between parameters or usage examples. This meets the baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Create a new agent') and the resource ('agent with identity'), specifying that it assigns a DID and can link to a user. However, it doesn't explicitly differentiate from sibling tools like 'agent_update' or 'agent_get', which keeps it from a perfect score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'agent_update' or 'agent_list', nor does it mention prerequisites or exclusions. It lacks context for selection among the many agent-related tools available.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_deleteAInspect
Delete an agent (soft delete - changes status to deleted).
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | The ID of the agent to delete | |
| preserveMemories | No | Whether to archive memories before deletion |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden. It discloses key behavioral traits: this is a deletion operation (mutative) and specifies it's a 'soft delete' that changes status rather than permanent removal. However, it doesn't cover important aspects like required permissions, whether the operation is reversible, what happens to related data, or error conditions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise (one sentence) and front-loaded with the core action. Every word earns its place, with the parenthetical adding crucial nuance about the deletion type. There's zero waste or redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a deletion tool with no annotations and no output schema, the description is minimally adequate. It covers the core action and clarifies it's a soft delete, but lacks important context about permissions, reversibility, side effects, and what the tool returns. Given the mutative nature and 2 parameters, more completeness would be expected.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already fully documents both parameters. The description doesn't add any parameter-specific information beyond what's in the schema. The baseline of 3 is appropriate when the schema does the heavy lifting, though the description could have explained the implications of the preserveMemories parameter.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Delete') and resource ('an agent'), and distinguishes it from siblings like agent_create, agent_get, and agent_update by specifying the deletion operation. It also adds important nuance by clarifying it's a 'soft delete' rather than permanent removal.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. While it mentions 'soft delete,' it doesn't specify when this should be used instead of other agent-related tools like agent_update for status changes or agent_context_clear for clearing agent data. No prerequisites or exclusions are mentioned.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_dreamAInspect
Trigger a memory dream cycle — consolidates, deduplicates, resolves contradictions, normalizes temporal references, extracts missing facts, and manages tier promotion/demotion. Use dryRun to preview changes without applying them.
| Name | Required | Description | Default |
|---|---|---|---|
| dryRun | No | If true, returns metrics without applying changes (default: false) | |
| agentId | Yes | Agent ID to dream for |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool's actions (consolidation, deduplication, etc.) and the dryRun option for previewing changes, which clarifies it's a potentially mutative operation. However, it lacks details on side effects, error conditions, or performance characteristics like execution time or resource usage.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured in two sentences: the first outlines the tool's purpose and operations, and the second explains the dryRun parameter. Every sentence adds essential information with no wasted words, making it easy to scan and understand quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of memory management operations and lack of output schema, the description adequately covers the tool's purpose and key parameter. However, it doesn't detail the output format, potential errors, or integration with sibling tools like agent_dream_history, leaving gaps for an AI agent to fully understand execution outcomes.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the input schema already documents both parameters (dryRun and agentId) with clear descriptions. The description adds minimal value by mentioning dryRun's purpose ('preview changes without applying them'), but doesn't provide additional context beyond what the schema states, such as agentId format or dryRun implications.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states the tool's function with specific verbs ('trigger a memory dream cycle') and details the operations performed (consolidation, deduplication, contradiction resolution, etc.). It clearly distinguishes this from sibling tools like agent_memory_query or agent_facts_contradictions by focusing on a comprehensive memory optimization cycle rather than individual operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool ('to preview changes without applying them' via dryRun) and implies it's for memory management cycles. However, it doesn't explicitly state when NOT to use it or name specific alternatives among the many sibling tools, such as agent_memory_clean or memory_audit, which might overlap in function.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_dream_configBInspect
Get or update dream configuration for an agent. If no update fields are provided, returns the current config.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | Agent ID | |
| enabled | No | Enable/disable dreaming | |
| maxMemoriesPerDream | No | Max memories to process per dream (default: 500) | |
| minTimeBetweenDreams | No | Minimum hours between dream cycles (default: 24) | |
| contradictionStrategy | No | Strategy for handling contradictions | |
| minSessionsBetweenDreams | No | Minimum session count between dreams (default: 5) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. While it mentions the conditional behavior (get vs. update based on input), it doesn't describe critical aspects: whether updates are persistent, what permissions are needed, if there are rate limits, or what the return format looks like. For a tool that can perform mutations, this is a significant gap in transparency.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly concise: two sentences that efficiently convey the tool's dual functionality and conditional behavior. Every word earns its place, with no redundancy or fluff. It's front-loaded with the core purpose and follows with the key usage nuance.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (dual get/update functionality with 6 parameters) and lack of both annotations and output schema, the description is minimally adequate. It explains the conditional behavior but doesn't cover mutation effects, error conditions, or return values. For a tool that can modify agent configuration, more context about behavioral implications would be helpful.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all 6 parameters thoroughly. The description adds no additional parameter semantics beyond what's in the schema (e.g., it doesn't explain the 'contradictionStrategy' enum values or interactions between parameters). With high schema coverage, the baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the dual purpose: 'Get or update dream configuration for an agent.' It specifies the verb ('Get or update') and resource ('dream configuration for an agent'), making the purpose unambiguous. However, it doesn't explicitly differentiate from sibling tools like agent_dream or agent_dream_history, which handle dream execution and history respectively.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides implied usage guidance: 'If no update fields are provided, returns the current config.' This indicates the tool defaults to read mode when only agentId is given. However, it doesn't explicitly state when to use this tool versus alternatives (e.g., agent_update for general agent settings) or mention prerequisites like required permissions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_dream_historyBInspect
Get past dream run results for an agent, ordered by most recent first.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum results to return (default: 10) | |
| agentId | Yes | Agent ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden but offers minimal behavioral context. It mentions ordering and implies pagination via 'limit', but doesn't disclose authentication needs, rate limits, error conditions, or what 'dream run results' entail. The description doesn't contradict annotations (none exist), but fails to provide adequate transparency for a tool that retrieves historical data.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core purpose. Every word earns its place, with no redundant or vague phrasing. It's appropriately sized for a simple retrieval tool.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a read-only tool with full schema coverage but no output schema, the description is minimally complete. It specifies what's retrieved and ordering, but lacks context on result format, pagination details, or error handling. Without annotations or output schema, more behavioral detail would improve completeness, but it's adequate for basic use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents both parameters ('agentId' and 'limit'). The description adds no additional parameter semantics beyond what's in the schema—it doesn't explain parameter interactions, format expectations for 'agentId', or constraints on 'limit'. Baseline 3 is appropriate when schema does all the work.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Get past dream run results') and resource ('for an agent'), with specific ordering ('ordered by most recent first'). It distinguishes from some siblings like 'agent_dream' (which likely initiates dreams) and 'agent_dream_config' (which configures dreams), but doesn't explicitly differentiate from all potential query siblings like 'agent_benchmark_history'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives. It doesn't mention prerequisites, when not to use it, or compare it to similar tools like 'agent_benchmark_history' or 'agent_facts_list' that might retrieve different historical data.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_facts_contradictionsBInspect
Find contradicting facts for a subject. Returns pairs of current facts with the same key but different values.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | Agent ID | |
| subjectId | Yes | Subject to check for contradictions |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries full burden. It mentions the return format ('Returns pairs of current facts with the same key but different values'), which is helpful. However, it doesn't disclose important behavioral aspects like whether this is a read-only operation, what permissions are needed, how it handles missing data, or potential side effects. For a tool that presumably queries agent facts, this leaves significant gaps.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly concise with two sentences that each earn their place. The first sentence states the purpose, the second describes the return format. There's zero wasted language or redundancy, and the most important information (what the tool does) is front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (contradiction detection), no annotations, and no output schema, the description provides basic but incomplete coverage. It explains what the tool does and the return format, but lacks details about behavioral characteristics, error conditions, or usage context. For a tool with no output schema, it should ideally describe the structure of returned contradiction pairs more thoroughly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with both parameters clearly documented in the schema. The description adds no additional parameter information beyond what's already in the schema. According to scoring rules, when schema coverage is high (>80%), the baseline is 3 even with no param info in the description, which applies here.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Find contradicting facts for a subject' with the specific action 'find' and resource 'contradicting facts'. It distinguishes from siblings like 'agent_facts_create', 'agent_facts_list', and 'agent_facts_update' by focusing on contradiction detection rather than CRUD operations. However, it doesn't explicitly differentiate from 'agent_facts_search' which might have overlapping functionality.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, appropriate contexts, or exclusions. With many sibling tools (especially other agent_facts_* tools), this lack of comparative guidance leaves the agent uncertain about tool selection.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_facts_createAInspect
Manually create a structured fact about a subject. Use this when the human explicitly shares personal information.
| Name | Required | Description | Default |
|---|---|---|---|
| key | Yes | Fact key in dot notation (e.g., "name", "daughter.name", "food.preference") | |
| value | Yes | The fact value | |
| source | Yes | How this fact was obtained | |
| agentId | Yes | Agent ID | |
| category | Yes | Fact category | |
| subjectId | Yes | Subject the fact is about | |
| confidence | No | Confidence in the fact accuracy (default: 0.8) | |
| privacyLevel | No | Privacy level (default: protected). Note: "secret" is not allowed — use platform secret management. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It indicates this is a manual creation operation for structured facts, which implies data persistence and potential privacy implications. However, it doesn't disclose important behavioral traits like whether this operation is idempotent, what permissions are required, how conflicts with existing facts are handled, or what happens on success/failure.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly concise with just two sentences that each earn their place. The first sentence states the core purpose, and the second provides crucial usage guidance. There's zero wasted language, and the information is front-loaded effectively.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a data creation tool with 8 parameters and no annotations or output schema, the description provides adequate but minimal context. It covers the purpose and usage scenario well but lacks information about behavioral consequences, error conditions, or what constitutes successful execution. Given the complexity of creating structured facts with privacy implications, more complete guidance would be beneficial.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the input schema already documents all 8 parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema. It provides general context about when to use the tool but no additional parameter semantics, earning the baseline score for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('manually create a structured fact') and the resource ('about a subject'), distinguishing it from sibling tools like agent_facts_list, agent_facts_search, and agent_facts_update. It provides a precise verb+resource combination that makes the tool's purpose immediately understandable.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly states when to use this tool: 'when the human explicitly shares personal information.' This provides clear contextual guidance that helps the agent distinguish this manual creation tool from automated or inferred fact-creation alternatives that might exist in the system.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_facts_listAInspect
Get current facts for a subject, optionally filtered by category. Returns only active (non-superseded) facts.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | Agent ID that owns the facts | |
| category | No | Optional category filter. Omit to get all categories. | |
| subjectId | Yes | Subject the facts are about (usually the human the agent serves) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses that the tool returns 'only active (non-superseded) facts,' which is a key behavioral trait not inferable from the schema alone. However, it lacks details on permissions, rate limits, error handling, or response format, leaving gaps in behavioral context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is highly concise and front-loaded, consisting of two clear sentences that directly state the tool's function and key constraint. Every word earns its place, with no redundancy or unnecessary elaboration, making it efficient and easy to parse.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (3 parameters, no output schema, no annotations), the description is adequate but incomplete. It covers the core purpose and a key behavioral trait (active facts only), but lacks details on output structure, error cases, or integration with sibling tools, leaving some contextual gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents all parameters. The description adds minimal value by mentioning 'optionally filtered by category,' which aligns with the schema's 'category' parameter but does not provide additional semantics beyond what the schema already specifies. Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Get current facts for a subject, optionally filtered by category.' It specifies the verb ('Get'), resource ('facts'), and scope ('current' and 'active'), but does not explicitly differentiate it from sibling tools like 'agent_facts_search' or 'agent_facts_contradictions', which prevents a score of 5.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage by mentioning optional filtering by category and that it returns only active facts, but it does not provide explicit guidance on when to use this tool versus alternatives like 'agent_facts_search' or 'agent_facts_contradictions'. No prerequisites or exclusions are stated, leaving room for ambiguity.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_facts_searchBInspect
Search facts by text query. Matches against fact keys and values.
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | Search query text | |
| agentId | Yes | Agent ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries full burden. It states the tool searches facts by matching against keys and values, which implies a read-only operation, but doesn't disclose behavioral traits like whether it requires authentication, has rate limits, returns paginated results, or what happens on no matches. For a search tool with zero annotation coverage, this is a significant gap in transparency.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise and front-loaded: two sentences with zero waste. The first sentence states the core purpose, and the second clarifies the matching scope. Every word earns its place, making it easy for an agent to parse quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (search operation with 2 parameters), no annotations, and no output schema, the description is minimally adequate. It covers what the tool does but lacks details on behavior, output format, error handling, or sibling differentiation. It meets basic needs but leaves gaps that could hinder correct invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, with clear documentation for both parameters (query and agentId). The description adds no additional semantic meaning beyond what the schema provides—it doesn't explain query syntax (e.g., partial matches, case sensitivity) or agentId context. Baseline score of 3 is appropriate as the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Search facts by text query' specifies the verb (search) and resource (facts). It distinguishes from siblings like agent_facts_list (which likely lists without querying) and agent_facts_create/update (which modify facts). However, it doesn't explicitly differentiate from agent_memory_query, which might have overlapping functionality, keeping it from a perfect score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It doesn't mention when to choose agent_facts_search over agent_facts_list (for listing all facts) or agent_memory_query (for querying memory). There are no prerequisites, exclusions, or contextual hints, leaving the agent to infer usage from the name alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_facts_updateAInspect
Correct a fact by superseding it with a new value. The old fact is preserved in the timeline with a valid_until timestamp.
| Name | Required | Description | Default |
|---|---|---|---|
| factId | Yes | ID of the fact to supersede | |
| agentId | Yes | Agent ID | |
| newValue | Yes | The corrected value |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses key behavioral traits: it's a mutation ('correct', 'superseding'), preserves history ('old fact is preserved in the timeline'), and adds a timestamp ('valid_until'). However, it lacks details on permissions, side effects, error conditions, or response format.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core action ('Correct a fact by superseding it with a new value') and follows with important behavioral context. Every word earns its place with zero waste.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no annotations and no output schema, the description is moderately complete for a mutation tool: it explains the action and historical preservation. However, it lacks details on permissions, error handling, or return values, leaving gaps in understanding the tool's full behavior.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all three parameters (agentId, factId, newValue). The description adds no additional meaning beyond what the schema provides (e.g., clarifying 'newValue' as 'corrected value' is redundant with schema). Baseline 3 is appropriate when schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Correct a fact by superseding it with a new value') and identifies the resource ('fact'), distinguishing it from siblings like 'agent_facts_create' (creates new facts) and 'agent_facts_list' (lists facts). It precisely defines the operation beyond just the tool name.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives (e.g., 'agent_facts_create' for new facts, 'agent_update' for general agent updates) or any prerequisites. It implies usage for correcting facts but lacks explicit context or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_getBInspect
Get agent details by ID including identity information and current status.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | The ID of the agent | |
| includeBond | No | Alias for includeLink; include ownership bond/credential information | |
| includeLink | No | Whether to include ownership link/credential information |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden. It states the tool retrieves details, implying a read-only operation, but doesn't disclose behavioral traits such as authentication requirements, rate limits, error handling, or what happens if the agent ID is invalid. The description is minimal and lacks context beyond the basic purpose.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core purpose ('Get agent details by ID') and adds supplementary information ('including identity information and current status'). There is no wasted language, and it's appropriately sized for a simple retrieval tool.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (a simple read operation), 100% schema coverage, and no output schema, the description is minimally adequate. It covers the purpose but lacks behavioral context and usage guidelines. For a tool with no annotations, it should do more to explain authentication, errors, or output format, making it incomplete for full agent understanding.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents all three parameters (agentId, includeBond, includeLink). The description doesn't add any parameter-specific details beyond what's in the schema, such as explaining the relationship between includeBond and includeLink or providing examples. Baseline 3 is appropriate since the schema handles parameter documentation adequately.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('Get') and resource ('agent details by ID'), specifying it includes 'identity information and current status.' It distinguishes from sibling 'agent_list' (which likely lists multiple agents) by focusing on a single agent via ID. However, it doesn't explicitly differentiate from other agent-specific tools like 'agent_context_get' or 'agent_update,' which slightly reduces clarity.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It doesn't mention siblings like 'agent_list' for listing agents or 'agent_context_get' for context details, nor does it specify prerequisites or exclusions. Usage is implied only by the description's focus on ID-based retrieval, but no explicit guidelines are given.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_handoff_initiateCInspect
Initiate a context handoff. Creates a handoff package with summary, key learnings, and progressive links for resuming later.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | The ID of the agent |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It states the tool 'creates a handoff package' but doesn't specify whether this is a read-only or mutative operation, what permissions are required, how the package is stored or accessed, or any rate limits. The description is minimal and lacks critical behavioral details for safe invocation.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, well-structured sentence that efficiently conveys the core action and outcome without unnecessary details. It is front-loaded with the main purpose and avoids redundancy, making it easy to parse quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of a handoff operation with no annotations and no output schema, the description is insufficient. It doesn't explain what the handoff package contains in detail, how it's used by 'agent_handoff_resume', what happens if the agentId is invalid, or what the tool returns. For a tool that likely involves state management, more context is needed for safe and effective use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, with the single parameter 'agentId' documented as 'The ID of the agent'. The description adds no additional semantic context about this parameter, such as format examples or how it relates to the handoff process. Baseline score of 3 is appropriate since the schema adequately covers the parameter.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('initiate', 'creates') and identifies the resource ('handoff package') and its components ('summary, key learnings, and progressive links'). It distinguishes from siblings like 'agent_handoff_resume' by focusing on initiation rather than resumption, though it doesn't explicitly contrast with all related tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'agent_handoff_resume' or 'agent_context_clear', nor does it mention prerequisites or context. It implies usage for 'resuming later' but lacks explicit conditions or exclusions, leaving the agent to infer timing.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_handoff_latestAInspect
Get the most recent handoff package for an agent. Use to check previous session state before resuming.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | No | Agent ID (optional if session identity set) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions retrieving a 'handoff package' and checking 'previous session state,' which implies a read-only operation, but it doesn't specify permissions, rate limits, or what happens if no handoff exists. For a tool with zero annotation coverage, this leaves significant gaps in understanding its behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences long, front-loaded with the core purpose and followed by usage context. Every word serves a clear purpose without redundancy, making it efficient and easy to parse.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (retrieving session state), no annotations, and no output schema, the description is somewhat complete but lacks details. It explains the purpose and usage but doesn't cover behavioral aspects like error handling or return format. This leaves room for improvement in providing a fuller context for the agent.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, with the parameter 'agentId' documented as optional if session identity is set. The description doesn't add any additional meaning beyond this, such as explaining the format of 'agentId' or the implications of omitting it. With high schema coverage, a baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Get the most recent handoff package for an agent.' It specifies the verb ('Get') and resource ('most recent handoff package'), making it easy to understand. However, it doesn't explicitly differentiate from sibling tools like 'agent_handoff_resume' or 'agent_context_get', which might have overlapping functionality.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for usage: 'Use to check previous session state before resuming.' This gives a specific scenario when the tool should be used. However, it doesn't explicitly state when not to use it or name alternatives among sibling tools, such as 'agent_handoff_resume' for resuming sessions directly.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_handoff_resumeCInspect
Resume from a handoff package. Restores context and provides access to previous session information.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | The ID of the agent | |
| packageId | Yes | The ID of the handoff package to resume from |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions restoring context and providing access to session information, but fails to detail critical aspects: whether this is a read-only or mutating operation, what permissions or authentication are required, how the resumed context integrates with the current session, or any side effects like overwriting existing data. For a tool handling session state with no annotation coverage, this is a significant gap.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded and efficient, using two concise sentences that directly state the tool's function without fluff. Every sentence earns its place by covering core actions. However, it could be slightly more structured by explicitly separating purpose from outcomes, keeping it from a perfect 5.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity in managing session handoffs, the absence of annotations and output schema, and the description's lack of behavioral details, it is incomplete. It doesn't explain what 'restores context' entails operationally, what 'access to previous session information' includes, or the return format. For a tool with no structured safety or output guidance, the description should provide more comprehensive context to ensure safe and effective use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with both parameters ('agentId', 'packageId') documented in the schema. The description adds no additional meaning beyond implying these IDs are needed for resumption, but doesn't clarify their format, sourcing, or interrelationships. Since the schema does the heavy lifting, the baseline score of 3 is appropriate, though the description could have enhanced understanding with examples or context.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('Resume', 'Restores', 'provides access') and identifies the resource ('handoff package', 'previous session information'). It distinguishes from siblings like 'agent_handoff_initiate' and 'agent_handoff_latest' by focusing on resuming rather than initiating or fetching. However, it doesn't explicitly contrast with all sibling tools, keeping it at 4 instead of 5.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing an existing handoff package), exclusions, or compare it to related tools like 'agent_handoff_initiate' for starting a handoff or 'agent_context_get' for general context retrieval. This lack of usage context leaves the agent guessing about appropriate scenarios.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_link_verifyCInspect
Verify the ownership link between an agent and user.
| Name | Required | Description | Default |
|---|---|---|---|
| userId | Yes | The user ID to verify ownership against | |
| agentId | Yes | The ID of the agent |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries full burden for behavioral disclosure. While 'verify' implies a read-only check, the description doesn't specify what verification entails (e.g., returns boolean, details, or errors), whether it requires specific permissions, or how it handles invalid inputs. This leaves significant behavioral gaps for a tool with security implications.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that states the core purpose without unnecessary words. It's appropriately sized for a simple verification tool and front-loads the essential information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a verification tool with no annotations and no output schema, the description is insufficient. It doesn't explain what constitutes successful verification, what format the result takes, or error conditions. Given the security context and lack of structured output documentation, more completeness is needed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with both parameters clearly documented in the schema itself. The description doesn't add any meaningful parameter context beyond what's already in the schema (e.g., format requirements, relationship between agentId and userId). This meets the baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('verify') and the target ('ownership link between an agent and user'), making the purpose immediately understandable. However, it doesn't distinguish this tool from potential siblings like 'link' or 'trust_status' that might involve similar verification concepts, preventing a perfect score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. With siblings like 'link', 'trust_status', and various agent-related tools, there's no indication of prerequisites, appropriate contexts, or exclusions for this verification operation.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_listCInspect
List agents for the current tenant/user.
| Name | Required | Description | Default |
|---|---|---|---|
| page | No | Page number for pagination | |
| limit | No | Maximum number of agents to return | |
| status | No | Filter by agent status | |
| ownerId | No | Filter by owner user ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden for behavioral disclosure. While 'List' implies a read operation, it doesn't disclose important behavioral traits like whether this requires specific permissions, how pagination works beyond the parameters, what the response format looks like, or any rate limits. The description is too minimal for a tool with 4 parameters.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence with zero wasted words. It's appropriately sized for a list operation and front-loads the essential information. Every word earns its place in this concise formulation.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has 4 parameters, no annotations, and no output schema, the description is insufficiently complete. It doesn't explain what 'agents' means in this context, what fields are returned, how pagination works in practice, or any authentication requirements. For a list operation with filtering capabilities, more context is needed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already fully documents all 4 parameters with their types, defaults, and descriptions. The description adds no additional parameter semantics beyond what's in the schema, meeting the baseline 3 for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('List') and resource ('agents') with scope ('for the current tenant/user'), providing a specific verb+resource combination. However, it doesn't distinguish this tool from other agent-related tools like 'agent_get' or 'agent_facts_list', which would require explicit differentiation to earn a 5.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. With multiple sibling tools like 'agent_get' (retrieve single agent), 'agent_facts_list' (list agent facts), and 'agent_create', there's no indication of when this list operation is appropriate versus those other operations.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_memory_expandAInspect
Expand a memory to see more detail. Use this when a memory summary is not detailed enough.
| Name | Required | Description | Default |
|---|---|---|---|
| level | No | Level of expansion. Detailed provides more context, full provides complete raw content. | detailed |
| memoryId | Yes | The ID of the memory to expand |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden. It mentions the tool 'expands' a memory to show more detail, implying a read operation, but doesn't disclose behavioral traits like whether it requires permissions, has rate limits, or what the output format looks like. This leaves gaps in understanding how the tool behaves beyond its basic function.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences, front-loaded with the purpose and followed by usage guidance. Every sentence earns its place without waste, making it efficient and easy to parse.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no annotations and no output schema, the description provides basic purpose and usage but lacks details on behavioral aspects and return values. It's adequate for a simple tool with good schema coverage, but could be more complete to compensate for missing structured data.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents both parameters ('memoryId' and 'level' with enum values). The description doesn't add any meaning beyond this, such as explaining the practical difference between 'detailed' and 'full' levels. Baseline 3 is appropriate as the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('expand a memory') and the resource ('memory'), specifying it's for seeing more detail when a summary isn't enough. However, it doesn't explicitly differentiate from sibling tools like 'agent_memory_query' or 'agent_get', which might also retrieve memory details, so it's not fully distinguished.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context on when to use this tool ('when a memory summary is not detailed enough'), which helps the agent decide based on the level of detail needed. It doesn't mention when not to use it or name specific alternatives among siblings, so it's not fully explicit.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_memory_queryBInspect
Query agent memories using semantic search. Returns relevant memories based on the query text.
| Name | Required | Description | Default |
|---|---|---|---|
| tier | No | Level of detail to return. Summary saves tokens, full provides complete content. | summary |
| limit | No | Maximum number of memories to return | |
| model | No | Filter by LLM model name. | |
| query | Yes | The search query to find relevant memories | |
| types | No | Filter by memory types | |
| agentId | Yes | The ID of the agent to query memories for | |
| minTrust | No | Minimum trust score (0-1). Omit to include all. | |
| platform | No | Filter by runtime platform (e.g., claude_code, cursor). | |
| sessionId | No | Filter by session/conversation ID. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions semantic search and returning relevant memories, but lacks critical details such as how results are ranked, whether pagination is involved, error conditions, or performance characteristics. For a query tool with 9 parameters, this leaves significant behavioral gaps.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise and front-loaded, consisting of just two sentences that directly state the tool's function and output. Every word serves a purpose with zero redundancy, making it efficient and easy to parse.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (9 parameters, no annotations, no output schema), the description is minimally adequate but incomplete. It covers the basic purpose and output type, but lacks details on result format, error handling, or behavioral nuances. For a semantic search tool, more context would be beneficial, though the high schema coverage mitigates some gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The description adds minimal parameter semantics beyond the schema, which has 100% coverage. It mentions 'query text' and 'relevant memories', aligning with the 'query' parameter and the tool's purpose, but doesn't explain interactions between parameters or provide usage examples. With high schema coverage, the baseline of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Query agent memories using semantic search. Returns relevant memories based on the query text.' It specifies the verb ('query'), resource ('agent memories'), and method ('semantic search'), making the function unambiguous. However, it doesn't explicitly differentiate from sibling tools like 'agent_facts_search' or 'recall', which prevents a perfect score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. With many sibling tools related to memory and search (e.g., 'agent_facts_search', 'recall', 'remember'), the absence of explicit usage context or exclusions leaves the agent without clear direction for tool selection.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_memory_storeBInspect
Store a memory for an agent. Memories are persisted across sessions and can be retrieved later. Use this to save important facts, events, lessons learned, or context.
| Name | Required | Description | Default |
|---|---|---|---|
| type | No | The type of memory: fact (static info), event (something that happened), lesson (learned insight), context (current situation), goal (objective), task (work item) | fact |
| model | No | LLM model name (e.g., claude-sonnet-4-20250514). Auto-detected from MCP connection if omitted. | |
| goalId | No | Optional goal ID to bind this memory to. Goal-bound memories are retained until the goal completes. | |
| agentId | Yes | The ID of the agent to store the memory for | |
| content | Yes | The memory content to store | |
| platform | No | Runtime platform (e.g., claude_code, cursor, codex). Auto-detected from MCP connection if omitted. | |
| sessionId | No | Session/conversation ID for grouping memories. | |
| importance | No | Importance score from 0 to 1. Higher importance memories are retained longer. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It states that memories are 'persisted across sessions' which is valuable behavioral context, but doesn't mention authentication requirements, rate limits, error conditions, or what happens on duplicate storage. The description doesn't contradict annotations since none exist.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately concise with two sentences that each serve a purpose: the first states the core function, the second provides usage examples. It's front-loaded with the primary action and wastes no words, though it could be slightly more structured with bullet points for the examples.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a write operation with 8 parameters and no annotations or output schema, the description provides basic context but lacks important details. It doesn't explain what the tool returns, error conditions, or how memories interact with the broader system. Given the complexity and lack of structured metadata, the description should do more to compensate.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema already documents all 8 parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema. The baseline of 3 is appropriate when the schema does the heavy lifting, though the description could have explained parameter relationships or constraints.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Store a memory for an agent' with specific examples of what can be stored (facts, events, lessons learned, context). It distinguishes from retrieval-focused siblings like agent_memory_query but doesn't explicitly differentiate from agent_memory_expand or other memory-related tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides implied usage guidance by stating 'Memories are persisted across sessions and can be retrieved later' and listing example use cases. However, it doesn't explicitly state when to use this vs. alternatives like agent_memory_expand or other storage mechanisms, nor does it mention prerequisites or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_transferCInspect
Transfer agent ownership to another user. Requires signatures from both parties.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | The ID of the agent to transfer | |
| toUserId | Yes | User ID of the new owner | |
| toSignature | Yes | Signature from new owner accepting transfer | |
| fromSignature | Yes | Signature from current owner authorizing transfer |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the signature requirement, which is useful context about authentication/authorization needs. However, it doesn't describe other critical behavioral traits: whether this is a destructive/mutative operation (implied but not stated), what happens to the agent during/after transfer, error conditions, or any rate limits. For a tool that changes ownership with significant implications, this is inadequate.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise (one sentence) and front-loaded with the core purpose. Every word earns its place: 'Transfer agent ownership to another user' states the action, and 'Requires signatures from both parties' adds essential context. There is zero waste or redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of an ownership transfer operation with no annotations and no output schema, the description is incomplete. It lacks information about what the tool returns (success/failure indicators, new owner details), error handling, side effects, or security implications. For a high-stakes mutation tool with 4 required parameters, this minimal description leaves significant gaps for an AI agent to understand proper usage.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all four parameters thoroughly. The description adds no additional parameter semantics beyond what's in the schema (e.g., format of signatures, how to obtain them, or relationship between parameters). The baseline of 3 is appropriate when the schema does the heavy lifting, though the description could have added value by explaining the signature workflow.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('transfer agent ownership') and resource ('to another user'), making the purpose unambiguous. It distinguishes from sibling tools like agent_create or agent_update by focusing specifically on ownership transfer. However, it doesn't explicitly differentiate from other agent-related tools that might involve ownership changes.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides minimal usage guidance, mentioning only that it 'requires signatures from both parties' as a prerequisite. It offers no explicit guidance on when to use this tool versus alternatives (e.g., agent_update for other changes) or when not to use it. No sibling tool comparisons or contextual exclusions are provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
agent_updateCInspect
Update agent details.
| Name | Required | Description | Default |
|---|---|---|---|
| name | No | New display name | |
| status | No | New status | |
| agentId | Yes | The ID of the agent to update | |
| description | No | New description |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries full burden. 'Update agent details' implies a mutation operation but doesn't disclose behavioral traits like required permissions, whether changes are reversible, rate limits, or what happens to unspecified fields. This is inadequate for a mutation tool with zero annotation coverage.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence with zero wasted words. It's appropriately sized and front-loaded, making it easy to parse quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given this is a mutation tool with no annotations and no output schema, the description is incomplete. It doesn't explain what happens during the update, what values are returned, or error conditions. With 4 parameters and siblings that handle similar resources, more context is needed for proper agent use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all four parameters (agentId, name, status, description) with their types and descriptions. The description adds no additional meaning beyond what's in the schema, but the baseline is 3 when schema coverage is high.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description 'Update agent details' clearly states the action (update) and resource (agent details), which is better than a tautology. However, it's vague about what 'details' specifically means and doesn't distinguish this tool from sibling tools like agent_create, agent_delete, or agent_get, which all operate on agents.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. With siblings like agent_create, agent_delete, agent_get, and agent_list available, there's no indication of prerequisites, appropriate contexts, or exclusions for using this update function.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
assemble_contextAInspect
Assemble agent context within budget constraints. Call this at session start to get identity, personality, constitution, facts, goals, lessons, and skills content sized to fit the context window. Each component is truncated to its budget allocation.
| Name | Required | Description | Default |
|---|---|---|---|
| preset | No | Budget preset to use (overrides stored config for this call) | |
| agentId | Yes | Agent ID to assemble context for | |
| subjectId | No | Subject ID for facts lookup (default: agentId) | |
| contextWindowSize | No | Context window size in tokens (default: 200000 for Claude) | |
| includeComponents | No | Specific components to include (default: all) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden and does well by disclosing key behavioral traits: it's for session initialization, operates within budget constraints, truncates components to fit context windows, and returns multiple content types. It doesn't mention authentication needs, rate limits, or whether this is idempotent, but covers the core operational behavior adequately for a tool with no annotation support.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences with zero waste - first states purpose and key constraint, second explains the truncation behavior. Every word earns its place. The description is front-loaded with the core functionality and efficiently covers the essential details without unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with 5 parameters, 100% schema coverage, but no annotations and no output schema, the description provides good context about the assembly process, budget constraints, and component truncation. It doesn't describe the return format or structure, which would be helpful given no output schema, but covers the operational context well for a session initialization tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all 5 parameters thoroughly. The description mentions 'budget constraints' which relates to the preset parameter and contextWindowSize, and 'components' which maps to includeComponents, but doesn't add significant semantic meaning beyond what the schema provides. The baseline 3 is appropriate when schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('assemble agent context') and resource ('agent context') with precise scope ('within budget constraints'). It distinguishes from siblings like agent_context_get (which presumably retrieves without assembly) and agent_context_clear by specifying it's for session start to gather multiple components. The verb 'assemble' implies a construction/aggregation operation not present in other tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use ('Call this at session start') and what it provides ('identity, personality, constitution, facts, goals, lessons, and skills content'). However, it doesn't explicitly state when NOT to use it or name specific alternatives among the many sibling tools (e.g., agent_context_get for retrieving without assembly, or individual component tools). The guidance is helpful but lacks explicit exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
authorizeAInspect
Initiate agent-first OAuth 2.0 Device Flow (RFC 8628) to register and bond with a human without needing an API key. Returns a short user_code (e.g. "BCDF-GHJK") to display to the human. The human visits the verification_uri, enters the user_code, and approves. Once approved, subsequent MCP calls (remember, recall, etc.) use the provisional tenant automatically.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Your display name (e.g., "code-assistant", "research-agent") | |
| description | No | Optional description of what you do |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the OAuth 2.0 Device Flow process, including the return value ('short user_code'), what the human must do ('visits the verification_uri, enters the user_code, and approves'), and the system behavior after approval ('subsequent MCP calls use the provisional tenant automatically'). It doesn't mention error conditions, timeout periods, or rate limits, but provides substantial operational context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is efficiently structured in three sentences: the first states the purpose and technology, the second describes the return value and human interaction, the third explains the system behavior after approval. Every sentence earns its place by providing essential information about this complex authentication flow.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a complex authentication tool with no annotations and no output schema, the description provides substantial context about the OAuth flow, human interaction requirements, and system behavior. It explains what happens before, during, and after the authorization process. The main gap is the lack of information about return format details (beyond mentioning 'short user_code') and potential error scenarios.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents both parameters thoroughly. The description doesn't add any additional meaning about the parameters beyond what's in the schema. It focuses on the tool's purpose and behavior rather than parameter details, which is appropriate given the comprehensive schema documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Initiate agent-first OAuth 2.0 Device Flow'), the resource involved ('to register and bond with a human'), and the technology standard ('RFC 8628'). It distinguishes this from sibling tools by focusing on authentication/authorization without API keys, unlike other tools that handle memory, goals, skills, or agent operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool: 'to register and bond with a human without needing an API key' and mentions that subsequent MCP calls will use the provisional tenant. However, it doesn't explicitly state when NOT to use it or name specific alternatives among the sibling tools (e.g., when API key authentication is preferred).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
constitution_check_conflictsCInspect
Check for conflicts between constitution tiers.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | The ID of the agent |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions 'checking' conflicts, which implies a read-only or analysis operation, but doesn't specify if it's safe (e.g., non-destructive), what the output entails (e.g., a list of conflicts, a boolean result), or any side effects like rate limits or authentication needs. This leaves key behavioral traits unclear.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, clear sentence that directly states the tool's function without unnecessary words. It's front-loaded with the core action, making it easy to parse. However, it could be slightly more informative without losing conciseness, such as hinting at the output or context.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of checking conflicts (which could involve nuanced logic), the lack of annotations, and no output schema, the description is incomplete. It doesn't explain what constitutes a conflict, the format of results, or any dependencies. This makes it hard for an agent to understand the tool's full behavior and integrate it effectively into workflows.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, with 'agentId' documented as 'The ID of the agent'. The description doesn't add any parameter details beyond this, such as why the agentId is needed or how it relates to constitution tiers. Since the schema already provides adequate parameter information, the baseline score of 3 is appropriate, as no extra value is added.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the action ('Check for conflicts') and the subject ('between constitution tiers'), which provides a basic purpose. However, it's vague about what 'conflicts' means (e.g., logical inconsistencies, overlapping rules, or implementation issues) and doesn't distinguish it from sibling tools like 'constitution_get_tier' or 'constitution_validate_action', which might involve similar concepts. It's not tautological but lacks specificity.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description offers no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing an agentId), exclusions, or related tools like 'constitution_validate_action' that might handle validation. Without any context, users must infer usage from the name alone, which is insufficient for effective tool selection.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
constitution_getAInspect
Get the merged constitution for an agent. Returns the effective rules after combining System > User > Agent tiers.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | The ID of the agent | |
| includeConflicts | No | Whether to include conflict resolution information |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden. It discloses that the tool returns merged rules from multiple tiers, which is useful behavioral context. However, it does not mention permissions, rate limits, or other operational traits like whether it's read-only or has side effects.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the purpose and key behavior. There is no wasted text, and it directly communicates the tool's function and output.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with 2 parameters, 100% schema coverage, and no output schema, the description is adequate but has gaps. It explains the merging behavior but lacks details on output format, error handling, or prerequisites. Without annotations, it should provide more operational context for a read operation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents both parameters (agentId and includeConflicts). The description does not add any parameter-specific details beyond what the schema provides, such as explaining what 'conflict resolution information' entails. Baseline 3 is appropriate when schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('Get') and resource ('merged constitution for an agent'), specifying it returns 'effective rules after combining System > User > Agent tiers.' It distinguishes from sibling tools like constitution_get_tier (which gets a specific tier) and constitution_check_conflicts (which focuses on conflicts).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage when needing the merged constitution, but does not explicitly state when to use this tool versus alternatives like constitution_get_tier or constitution_check_conflicts. It provides clear context about what it returns, but lacks explicit exclusions or named alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
constitution_get_tierBInspect
Get constitution rules for a specific tier (system, user, or agent).
| Name | Required | Description | Default |
|---|---|---|---|
| tier | Yes | The constitution tier to retrieve | |
| userId | No | User ID (required for user tier) | |
| agentId | No | Agent ID (required for agent tier) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden but only states the action without disclosing behavioral traits. It doesn't mention whether this is a read-only operation, what permissions are required, how results are formatted, or any rate limits. The description is minimal and leaves critical behavioral aspects unspecified.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It's appropriately sized for a simple retrieval tool and front-loads the essential information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with 3 parameters, no annotations, and no output schema, the description is insufficient. It doesn't explain the relationship between tier and optional parameters (userId for user tier, agentId for agent tier), what format the constitution rules are returned in, or any error conditions. The description leaves too many contextual gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so parameters are well-documented in the schema itself. The description adds no additional parameter semantics beyond mentioning 'tier' values, which are already covered by the enum in the schema. This meets the baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('Get') and resource ('constitution rules') with specific scope ('for a specific tier'). It distinguishes from sibling 'constitution_get' by specifying tier-based retrieval, but doesn't explicitly contrast with other constitution tools like 'constitution_check_conflicts' or 'constitution_list_proposals'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage when needing constitution rules for a particular tier (system, user, or agent), but provides no explicit guidance on when to choose this over alternatives like 'constitution_get' or 'constitution_list_proposals'. It mentions tier specificity but lacks context about prerequisites or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
constitution_list_proposalsCInspect
List pending constitution change proposals for an agent.
| Name | Required | Description | Default |
|---|---|---|---|
| status | No | Filter by proposal status | |
| agentId | Yes | The ID of the agent |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions 'pending' proposals but does not clarify if this includes all statuses (the schema allows filtering by status), whether it's a read-only operation, what permissions are required, or how results are returned (e.g., pagination, format). This leaves significant gaps for a tool that interacts with agent data.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It is front-loaded with the core action ('List pending constitution change proposals'), making it easy to parse quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of listing proposals for an agent, the lack of annotations and output schema means the description should compensate by providing more context. It fails to explain behavioral aspects (e.g., safety, permissions), usage guidelines, or what the output entails, making it incomplete for effective agent use despite the clear schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, clearly documenting both parameters (agentId and status with enum values). The description adds no additional meaning beyond the schema, such as explaining the relationship between agentId and proposals or the implications of status filtering. With high schema coverage, a baseline score of 3 is appropriate as the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'List' and the resource 'pending constitution change proposals for an agent,' making the purpose specific and understandable. However, it does not explicitly differentiate from sibling tools like 'constitution_propose_change' or 'constitution_get,' which are related but serve different functions (proposing changes vs. retrieving constitution details).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives, such as 'constitution_get' for general constitution details or 'constitution_propose_change' for creating proposals. It lacks context on prerequisites, exclusions, or typical scenarios for listing proposals, leaving usage unclear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
constitution_propose_changeBInspect
Propose a change to the agent's constitution. Requires user approval.
| Name | Required | Description | Default |
|---|---|---|---|
| reason | Yes | Reason for proposing this change | |
| agentId | Yes | The ID of the agent proposing the change | |
| proposedRules | Yes | The proposed rules to add or change |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions 'requires user approval,' which is a critical behavioral trait indicating this is a proposal mechanism rather than an immediate change. However, it lacks details on what happens after proposal submission (e.g., is it queued, logged, or triggers notifications?), whether changes are reversible, or any rate limits. For a mutation tool with zero annotation coverage, this is insufficient.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise with just two sentences that directly convey the core action and a key constraint. It's front-loaded with the main purpose and wastes no words, making it easy to parse quickly. Every sentence earns its place by providing essential information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool that proposes changes to an agent's constitution—a potentially complex and impactful operation—the description is too minimal. With no annotations, no output schema, and only basic behavioral hints, it fails to provide sufficient context about the proposal lifecycle, approval mechanisms, or potential side effects. This leaves significant gaps for an AI agent to understand the full implications of using this tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, with clear documentation for agentId, proposedRules, and reason. The description doesn't add any parameter-specific information beyond what's in the schema, such as format examples or constraints. Given the high schema coverage, the baseline score of 3 is appropriate, as the schema already does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('propose a change') and the target ('agent's constitution'), making the purpose immediately understandable. It distinguishes itself from sibling tools like constitution_get or constitution_list_proposals by focusing on modification rather than retrieval. However, it doesn't specify whether this is for adding, modifying, or removing rules, which prevents a perfect score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides some usage context by mentioning 'requires user approval,' which implies this should be used when seeking to modify the constitution with oversight. However, it doesn't explicitly state when to use this tool versus alternatives like constitution_validate_action or when not to use it (e.g., for minor updates vs. major overhauls). The guidance is implied rather than explicit.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
constitution_validate_actionCInspect
Validate whether an action is allowed by the constitution.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | The ID of the agent | |
| actionType | Yes | Type of action to validate | |
| actionDetails | No | Details of the action |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool validates actions but doesn't describe what happens during validation (e.g., checks permissions, returns boolean/explanation, requires specific agent roles, or has side effects like logging). This is a significant gap for a validation tool with zero annotation coverage.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, clear sentence with zero waste. It's appropriately sized and front-loaded, directly stating the tool's purpose without unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of validating actions against a constitution, no annotations, and no output schema, the description is incomplete. It doesn't explain the validation process, return values (e.g., allowed/denied with reasons), or prerequisites (e.g., agent permissions). This leaves critical gaps for an AI agent to use the tool effectively.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all parameters (agentId, actionType, actionDetails). The description doesn't add meaning beyond what the schema provides, such as examples of action types or details. Baseline 3 is appropriate when the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Validate whether an action is allowed by the constitution.' It uses a specific verb ('validate') and resource ('action'), but doesn't explicitly differentiate from sibling tools like 'constitution_check_conflicts' or 'constitution_get', which appear related to constitution operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools like 'constitution_check_conflicts' or explain scenarios where validation is needed versus other constitution-related operations. Usage is implied but not articulated.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
context_budget_apply_presetCInspect
Apply a named budget preset to an agent.
| Name | Required | Description | Default |
|---|---|---|---|
| preset | Yes | Preset name | |
| agentId | Yes | Agent ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It states the action ('apply') but does not explain what this entails—e.g., whether it overwrites existing settings, requires specific permissions, has side effects, or returns confirmation. This leaves critical behavioral traits unspecified for a mutation tool.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, direct sentence with no wasted words, efficiently conveying the core action. It is appropriately sized and front-loaded, making it easy to parse without unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a mutation tool with no annotations and no output schema, the description is incomplete. It lacks details on behavioral implications, success/failure responses, or integration with sibling tools. Given the complexity of applying budget presets to agents, more context is needed for effective use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, with clear parameter definitions and an enum for 'preset'. The description does not add any semantic details beyond the schema, such as explaining preset effects or agent selection criteria. Given the high schema coverage, a baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('apply') and target ('a named budget preset to an agent'), specifying both the resource (budget preset) and recipient (agent). However, it does not explicitly differentiate this tool from sibling tools like 'context_budget_get' or 'context_budget_set', which handle budget-related operations but with different functions.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives, such as 'context_budget_set' for custom budgets or other agent configuration tools. It lacks context about prerequisites, typical scenarios, or exclusions, leaving usage decisions ambiguous.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
context_budget_getBInspect
Get the current budget configuration for an agent.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | Agent ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It states 'Get' implies a read operation, but does not mention potential side effects, error conditions, authentication needs, or rate limits. This is inadequate for a tool with zero annotation coverage.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It is appropriately sized and front-loaded, with zero waste.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the simple input schema (1 parameter, 100% coverage) and no output schema, the description is minimally adequate but lacks completeness. It does not explain return values or error handling, which is a gap for a tool with no annotations to guide the agent.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the input schema fully documents the 'agentId' parameter. The description adds no additional meaning beyond what the schema provides, such as format examples or constraints, resulting in the baseline score of 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Get') and resource ('current budget configuration for an agent'), making the purpose unambiguous. However, it does not explicitly differentiate from sibling tools like 'context_budget_set' or 'context_budget_apply_preset', which would require a 5.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'context_budget_set' or other agent-related tools. It lacks context about prerequisites, such as whether the agent must exist or be accessible, leaving usage unclear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
context_budget_setCInspect
Set a custom budget configuration for an agent. Component allocations must sum to <= totalBudget.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | Agent ID | |
| components | Yes | Per-component budget allocations | |
| totalBudget | Yes | Total budget as fraction of context window (0.01-1.0, default: 0.15) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It states this is a 'Set' operation (implying a write/mutation) but doesn't mention whether this requires specific permissions, whether it overwrites existing configurations, what happens if allocations exceed the total budget, or what the response looks like. For a mutation tool with zero annotation coverage, this leaves significant behavioral gaps.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core purpose and includes a crucial constraint. Every word earns its place with zero waste, making it optimally concise and well-structured for quick comprehension.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given this is a mutation tool with no annotations and no output schema, the description is incomplete. It doesn't cover behavioral aspects like permissions, side effects, error conditions, or response format. While it mentions the allocation sum constraint, it lacks context about default values, validation rules beyond the sum, or how this interacts with other agent configuration tools.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already fully documents all parameters (agentId, totalBudget, components). The description adds the constraint that 'component allocations must sum to <= totalBudget', which provides useful validation context beyond the schema. However, it doesn't explain the meaning or impact of these allocations, keeping it at the baseline 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Set a custom budget configuration') and the resource ('for an agent'), making the purpose immediately understandable. However, it doesn't explicitly differentiate this tool from sibling tools like 'context_budget_apply_preset' or 'context_budget_get', which would require a 5.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'context_budget_apply_preset' (which applies a preset budget) or 'context_budget_get' (which retrieves budget settings). It mentions the constraint that 'component allocations must sum to <= totalBudget' but doesn't explain when custom configuration is preferred over presets or other budget-related operations.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
goal_completeCInspect
Mark a goal as complete. Records success/failure for tracking.
| Name | Required | Description | Default |
|---|---|---|---|
| notes | No | Completion notes | |
| goalId | Yes | The ID of the goal | |
| success | Yes | Whether the goal was successfully completed | |
| lessonsLearned | No | Lessons learned during goal execution |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It states the tool 'marks a goal as complete' and 'records success/failure,' implying a write/mutation operation, but lacks critical details: it doesn't specify permissions required, whether the action is reversible, what happens to in-progress goals, or error conditions. For a mutation tool with zero annotation coverage, this is a significant gap in transparency.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two concise sentences with zero waste: 'Mark a goal as complete. Records success/failure for tracking.' It's front-loaded with the primary action and efficiently adds context. Every word earns its place, making it easy for an agent to parse quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (a mutation tool with no annotations and no output schema), the description is incomplete. It lacks information on behavioral traits (e.g., side effects, error handling), usage context, and return values. While the schema covers parameters well, the overall context for safe and correct invocation is insufficient, especially for a tool that modifies data.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all four parameters (goalId, success, notes, lessonsLearned) with clear descriptions. The description adds no additional meaning beyond what's in the schema—it doesn't explain parameter interactions, formatting, or examples. This meets the baseline of 3 when the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Mark a goal as complete') and the resource ('goal'), with the additional context of 'Records success/failure for tracking.' This distinguishes it from sibling tools like goal_create, goal_update_progress, or goal_get, which have different purposes. However, it doesn't explicitly differentiate from all siblings (e.g., goal_update_progress might also involve status changes), so it's not a perfect 5.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing an existing goal), exclusions (e.g., not for partial updates), or direct alternatives like goal_update_progress for incremental tracking. This leaves the agent without context for tool selection among similar goal-related tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
goal_createCInspect
Create a new goal for an agent. Goals track progress and success rates over time.
| Name | Required | Description | Default |
|---|---|---|---|
| title | Yes | Title of the goal | |
| agentId | Yes | The ID of the agent | |
| description | Yes | Detailed description of what needs to be accomplished | |
| parentGoalId | No | ID of parent goal if this is a sub-goal | |
| successCriteria | No | List of criteria that define successful completion |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations, the description carries full burden but lacks behavioral details. It states the tool creates a goal but doesn't disclose permissions needed, whether creation is idempotent, error handling, or what the response contains. The mention of tracking progress adds some context but is insufficient for a mutation tool.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two concise sentences with zero waste: the first states the action and resource, the second adds purpose. It's front-loaded and appropriately sized, though it could integrate usage hints more efficiently.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a mutation tool with no annotations and no output schema, the description is incomplete. It lacks details on behavioral traits (e.g., side effects, auth), response format, and error conditions, leaving gaps for an AI agent to invoke it correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so parameters are well-documented in the schema. The description adds no additional parameter semantics beyond implying goal creation, which aligns with the schema. Baseline 3 is appropriate as the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Create a new goal') and the resource ('for an agent'), with additional context about purpose ('Goals track progress and success rates over time'). It distinguishes from siblings like goal_complete or goal_update_progress by focusing on creation, though it doesn't explicitly contrast with them.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives. While it implies usage for goal creation, it doesn't mention prerequisites (e.g., agent must exist), exclusions, or comparisons to sibling tools like goal_update_progress for modifications.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
goal_find_similarCInspect
Find similar past goals to learn from previous attempts.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of similar goals to return | |
| agentId | Yes | The ID of the agent | |
| goalDescription | Yes | Description of the goal to find similar past goals for |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool finds similar goals but doesn't explain how similarity is determined, what 'learn from previous attempts' entails, whether this is a read-only operation, or any performance or error-handling traits. This leaves significant gaps for a tool with behavioral implications.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core purpose without unnecessary words. It's appropriately sized for the tool's complexity, making every word count and avoiding redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the lack of annotations and output schema, the description is incomplete. It doesn't cover behavioral aspects like how similarity is computed, what the return format includes, or error conditions. For a tool that likely involves algorithmic matching and learning intent, more context is needed to guide the agent effectively.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, clearly documenting all three parameters. The description doesn't add any semantic details beyond the schema, such as explaining the 'goalDescription' format or how 'agentId' influences results. With high schema coverage, a baseline score of 3 is appropriate as the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with a specific verb ('Find') and resource ('similar past goals'), and it provides the intent ('to learn from previous attempts'). However, it doesn't explicitly differentiate this tool from potential sibling tools like 'goal_list' or 'goal_get', which might also involve goal retrieval, so it doesn't reach the highest score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description offers no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, exclusions, or compare it to sibling tools like 'goal_list' or 'goal_get', leaving the agent to infer usage context without explicit direction.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
goal_getCInspect
Get goal details including progress and success metrics.
| Name | Required | Description | Default |
|---|---|---|---|
| goalId | Yes | The ID of the goal | |
| includeMetrics | No | Whether to include historical success metrics |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden. It states this is a read operation ('Get'), implying it's non-destructive, but doesn't disclose behavioral traits like authentication needs, rate limits, error conditions, or what happens if the goalId is invalid. For a tool with no annotation coverage, this leaves significant gaps in understanding how it behaves.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core action ('Get goal details') and adds specifics ('including progress and success metrics'). There's no wasted text, and it's appropriately sized for a simple retrieval tool, though it could be slightly more structured with usage hints.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (2 parameters, no nested objects) and high schema coverage (100%), the description is somewhat complete but lacks output information (no output schema) and behavioral context. It covers the basic purpose but doesn't compensate for missing annotations or provide enough guidance for effective use, making it adequate but with clear gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with both parameters (goalId and includeMetrics) well-documented in the schema. The description adds minimal value beyond the schema, as it mentions 'progress and success metrics' which loosely relates to includeMetrics but doesn't explain parameter interactions or usage. With high schema coverage, the baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('Get') and resource ('goal details'), specifying what information is retrieved ('including progress and success metrics'). It distinguishes this from sibling tools like goal_list (which lists goals) and goal_update_progress (which modifies progress), though it doesn't explicitly name these alternatives. The purpose is specific but could be more differentiated.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing a goalId), when not to use it (e.g., for listing goals), or refer to sibling tools like goal_list or goal_find_similar. Usage is implied by the action 'Get goal details,' but no explicit context is given.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
goal_get_success_rateCInspect
Get the overall success rate for an agent's goals.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | The ID of the agent | |
| endDate | No | End of period to calculate success rate | |
| startDate | No | Start of period to calculate success rate |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool retrieves data ('Get'), implying a read-only operation, but doesn't clarify if it requires authentication, has rate limits, or what the output format looks like (e.g., percentage, raw counts). For a tool with no annotations, this is a significant gap in transparency.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core purpose without unnecessary details. It uses minimal words to convey the essential function, making it easy to parse and understand quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of calculating success rates and the lack of annotations and output schema, the description is incomplete. It doesn't explain what 'success rate' means (e.g., based on goal completion, progress updates), how it's derived, or what the return value includes. This leaves critical contextual gaps for effective tool use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, with clear documentation for 'agentId', 'endDate', and 'startDate'. The description adds no additional parameter semantics beyond what the schema provides, such as explaining how success rate is calculated or default time periods. Given the high schema coverage, a baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Get the overall success rate for an agent's goals.' It specifies the verb ('Get'), resource ('success rate'), and scope ('for an agent's goals'), making the function unambiguous. However, it doesn't differentiate from sibling tools like 'goal_get' or 'skill_get_effectiveness', which might retrieve related but different metrics.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, such as requiring an existing agent, or compare it to siblings like 'goal_list' or 'goal_get', which might offer overlapping functionality. This leaves the agent without context for tool selection.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
goal_listCInspect
List goals for an agent with optional filters.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of goals to return | |
| status | No | Filter by goal status | |
| agentId | Yes | The ID of the agent |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries full burden. It states it's a list operation with filtering, which implies read-only behavior, but doesn't disclose important traits like pagination (implied by 'limit' parameter), authentication needs, rate limits, error conditions, or what the output looks like. For a tool with no annotation coverage, this leaves significant behavioral gaps.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core purpose ('List goals for an agent') and adds qualifying information ('with optional filters'). There's zero waste or redundancy, making it appropriately concise for a straightforward list tool.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no annotations and no output schema, the description is incomplete for a tool with three parameters. It doesn't explain the return format (e.g., list structure, fields included), error handling, or how filtering interacts with the 'limit' parameter. For a list tool with filtering capabilities, more context about behavior and output is needed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents all three parameters (agentId, limit, status) with descriptions and enum values. The description adds no additional parameter semantics beyond mentioning 'optional filters', which the schema already covers. This meets the baseline of 3 when schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('List') and resource ('goals for an agent'), making the purpose understandable. However, it doesn't differentiate this tool from potential sibling tools like 'goal_get' or 'goal_find_similar', which might also retrieve goal information in different ways. The description is specific about the action but lacks sibling distinction.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides minimal guidance with 'optional filters' hinting at when to use parameters, but it doesn't explain when to choose this tool over alternatives like 'goal_get' (for a single goal) or 'goal_find_similar' (for similarity-based retrieval). There's no mention of prerequisites, exclusions, or comparison to sibling tools, leaving usage context vague.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
goal_update_progressCInspect
Update the progress of a goal.
| Name | Required | Description | Default |
|---|---|---|---|
| notes | No | Optional notes about progress update | |
| goalId | Yes | The ID of the goal | |
| progress | Yes | Progress percentage (0-100) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It states 'Update' which implies a mutation, but fails to mention permissions, side effects, or what happens to existing goal data. This is inadequate for a mutation tool with zero annotation coverage.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, direct sentence with no wasted words, making it easy to parse. It's appropriately sized for the tool's apparent complexity and front-loads the core action.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a mutation tool with no annotations and no output schema, the description is insufficient. It doesn't cover behavioral aspects like error conditions, response format, or how progress updates interact with other goal operations, leaving significant gaps in understanding.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the input schema fully documents parameters like 'goalId' and 'progress'. The description adds no additional meaning beyond what's in the schema, such as explaining progress units or update constraints, meeting the baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Update') and resource ('progress of a goal'), making the purpose immediately understandable. However, it doesn't differentiate from sibling tools like 'goal_complete' or 'goal_get', which could also involve goal progress manipulation or retrieval, leaving room for ambiguity.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives such as 'goal_complete' or 'goal_get'. The description lacks context about prerequisites, timing, or exclusions, leaving the agent to infer usage based on the tool name alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
linkAInspect
Generate a claim link for your human to link with you. Linking verifies ownership and gives the human access to manage your memory via the dashboard.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | Your agent ID (from register or whoami) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It describes the tool's function and outcome ('verifies ownership and gives access'), but lacks details on potential side effects, authentication needs, rate limits, or error conditions. It adequately conveys the action but misses deeper behavioral traits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded and efficiently structured in two sentences: the first states the action, and the second explains the purpose. Every sentence adds value without redundancy, making it easy to parse and understand quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (one required parameter) and lack of annotations or output schema, the description is reasonably complete. It explains what the tool does and why, but could benefit from more behavioral details (e.g., response format or error handling). However, it adequately covers the core functionality for a link-generation tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, with the 'agentId' parameter clearly documented. The description does not add any additional meaning or context beyond what the schema provides, such as format examples or usage tips. This meets the baseline score of 3 for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('generate a claim link') and resources ('for your human'), and distinguishes it from sibling tools like 'agent_link_verify' by focusing on link creation rather than verification. It explicitly explains the outcome ('linking verifies ownership and gives the human access to manage your memory via the dashboard'), making the purpose unambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context by mentioning 'for your human to link with you' and the purpose of linking, but it does not provide explicit guidance on when to use this tool versus alternatives (e.g., 'agent_link_verify' for verification). No exclusions or prerequisites are stated, leaving usage somewhat open-ended.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
memory_adaptersAInspect
List all available memory platform adapters and their capabilities (import, export, sync support, limitations).
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden but only states what the tool returns, not behavioral traits. It doesn't disclose whether this is a read-only operation, potential performance characteristics, authentication requirements, rate limits, or error conditions. The description is purely functional without behavioral context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, well-structured sentence that efficiently communicates the tool's purpose. It's front-loaded with the main action ('List all available memory platform adapters') followed by clarifying details about what information is included. Every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a zero-parameter tool with no output schema, the description adequately explains what the tool returns. However, without annotations or output schema, it lacks details about return format, structure, or potential limitations. The description is complete enough for basic understanding but leaves operational details unspecified.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The tool has zero parameters with 100% schema description coverage, so no parameter documentation is needed. The description appropriately focuses on what the tool does rather than parameter details, earning a baseline score of 4 for this dimension.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('List') and resource ('all available memory platform adapters'), specifying the scope of what will be listed. It distinguishes from sibling tools like memory_import, memory_export, and memory_sync by focusing on adapter metadata rather than performing operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context by mentioning 'capabilities (import, export, sync support, limitations)', suggesting this tool helps determine which adapters support specific operations. However, it doesn't explicitly state when to use this versus alternatives like memory_source_list or provide clear exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
memory_auditCInspect
Scan your memories for safety issues. Returns a health summary with trust score distribution and flagged count.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | Your agent ID | |
| autoFix | No | If true, quarantines flagged memories automatically. Default: false. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries full burden. It mentions 'Returns a health summary with trust score distribution and flagged count,' which gives some output context, but lacks critical behavioral details: whether this is a read-only operation, if it modifies data (especially with autoFix parameter), performance characteristics, or error handling. For a safety scanning tool, this is insufficient.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise—just two sentences that directly state the purpose and output. Every word earns its place with zero redundancy. It's front-loaded with the core function and efficiently communicates essential information without unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of memory safety scanning and lack of annotations/output schema, the description is incomplete. It doesn't cover behavioral aspects like side effects, permissions needed, or what 'safety issues' entail. The output mention is vague ('health summary'), and there's no context about when this tool should be invoked relative to other memory operations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so parameters are fully documented in the schema. The description doesn't add any parameter-specific information beyond what's in the schema (e.g., it doesn't explain what 'autoFix' entails beyond quarantining). This meets the baseline for high schema coverage but doesn't provide extra semantic value.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Scan your memories for safety issues.' It specifies the action (scan) and resource (memories) with a safety focus. However, it doesn't explicitly differentiate from sibling memory tools like memory_clean or memory_ingest, which prevents a perfect score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. With many sibling tools related to memory management (e.g., memory_clean, memory_query), there's no indication of context, prerequisites, or comparisons. This leaves the agent guessing about appropriate usage scenarios.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
memory_cleanAInspect
Clean up old or low-quality memories. Preview first (dry run), then confirm to delete.
| Name | Required | Description | Default |
|---|---|---|---|
| types | No | Only clean memories of these types | |
| agentId | Yes | Your agent ID | |
| confirm | No | Set to true to actually delete. False (default) for dry-run preview. | |
| minTrust | No | Delete memories with trust score below this threshold | |
| maxAgeDays | No | Delete memories older than this many days |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It reveals the tool's destructive nature ('delete') and the dry-run safety mechanism, which are crucial behavioral traits. However, it doesn't mention permission requirements, rate limits, what constitutes 'low-quality,' or how deletions affect related data, leaving gaps in behavioral understanding.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly concise and well-structured in two sentences. The first sentence states the core purpose, and the second provides essential usage guidance. Every word earns its place with zero redundancy or fluff.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a destructive tool with 5 parameters and no annotations or output schema, the description is minimally adequate. It covers the basic purpose and safety workflow but lacks details on what 'clean up' entails operationally, how deletions are performed, or what the preview output looks like. Given the complexity and absence of structured behavioral data, more context would be beneficial.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents all 5 parameters. The description adds minimal parameter semantics beyond the schema—it only clarifies that 'confirm' controls actual deletion versus dry-run preview. This matches the baseline expectation when schema coverage is complete.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Clean up old or low-quality memories.' It specifies the action (clean up) and target (memories) with qualifying criteria (old or low-quality). However, it doesn't explicitly differentiate from sibling tools like 'agent_context_clear' or 'memory_audit' that might also involve cleanup operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear usage guidance: 'Preview first (dry run), then confirm to delete.' This establishes a recommended workflow and explains the purpose of the 'confirm' parameter. It doesn't explicitly mention when not to use this tool or name alternatives among siblings, but the workflow guidance is practical and helpful.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
memory_exportCInspect
Export AIS memories to an external platform. Translates memories to the target format and pushes them.
| Name | Required | Description | Default |
|---|---|---|---|
| since | No | ISO timestamp — only export memories created after this date | |
| types | No | Memory types to export (e.g., ["fact", "lesson"]) | |
| agentId | Yes | Agent ID to export memories from | |
| platform | Yes | Platform adapter name (e.g., "mem0") | |
| credentials | Yes | Platform credentials |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden but is minimal. It mentions translation and pushing, but doesn't disclose critical behaviors: whether this is a one-time or recurring export, if it overwrites existing data on the platform, authentication requirements beyond credentials, rate limits, error handling, or what 'pushes them' entails (e.g., immediate vs. batched). The description is too vague for a mutation tool with external integration.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core purpose. It avoids redundancy but could be more structured by separating key actions (translate, push). Every word earns its place, though it's slightly terse for a complex export operation.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with 5 parameters, no annotations, and no output schema, the description is incomplete. It lacks details on behavioral traits (e.g., side effects, idempotency), error cases, and what success looks like (e.g., confirmation message, exported count). Given the complexity of exporting to external platforms, more context is needed to guide effective use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so parameters are well-documented in the schema. The description adds no additional meaning about parameters beyond implying export scope (e.g., 'memories' relates to agentId, types). It doesn't clarify parameter interactions or provide examples (e.g., valid platform values). Baseline 3 is appropriate as the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Export AIS memories to an external platform') and the resource ('memories'), with additional detail about translation and pushing. It distinguishes from siblings like memory_import (which imports) and memory_audit/clean (which don't export), but doesn't explicitly differentiate from memory_sync, which might have overlapping functionality.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No explicit guidance on when to use this tool versus alternatives like memory_sync or memory_adapters. The description implies usage for exporting memories, but lacks context about prerequisites (e.g., needing platform credentials), timing (e.g., after memory creation), or exclusions (e.g., not for real-time syncing).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
memory_importBInspect
Import memories from an external platform (e.g., Mem0, Claude) into AIS. Connects to the platform, fetches memories, deduplicates, and stores with provenance.
| Name | Required | Description | Default |
|---|---|---|---|
| since | No | ISO timestamp — only import memories created after this date | |
| agentId | Yes | Agent ID to import memories for | |
| platform | Yes | Platform adapter name (e.g., "mem0", "claude_code") | |
| credentials | Yes | Platform credentials (apiKey, filePath, etc.) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses key behavioral traits: connecting to an external platform, fetching memories, deduplicating, and storing with provenance. However, it lacks details on permissions needed, rate limits, error handling, or what 'provenance' entails, which are important for a tool that handles external data import.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, well-structured sentence that efficiently covers the tool's purpose and key steps. It's front-loaded with the main action and avoids unnecessary details, though it could be slightly more concise by omitting the parenthetical example if not critical.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no annotations and no output schema, the description is moderately complete for a tool with 4 parameters and complex behavior (external integration, deduplication). It covers the high-level process but lacks specifics on output format, error cases, or integration details, which could hinder an agent's ability to use it effectively without trial and error.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents all parameters. The description adds no additional parameter semantics beyond what's in the schema, such as explaining platform-specific credential requirements or the impact of the 'since' parameter on deduplication. This meets the baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool imports memories from external platforms into AIS, specifying the action (import), resource (memories), and target system (AIS). It distinguishes from siblings like memory_export (exporting) and memory_ingest (general ingestion), but doesn't explicitly differentiate from memory_sync or memory_adapters, which might have overlapping functionality.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage when importing from platforms like Mem0 or Claude, but doesn't provide explicit guidance on when to use this versus alternatives such as memory_ingest or memory_sync. No exclusions or prerequisites are mentioned, leaving the agent to infer context from the tool name and description alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
memory_ingestAInspect
Ingest a single memory from an external source. The memory passes through the trust & safety pipeline before being accepted or quarantined.
| Name | Required | Description | Default |
|---|---|---|---|
| type | No | Memory type classification | fact |
| agentId | Yes | Agent ID | |
| content | Yes | Memory content | |
| validAt | No | ISO timestamp when the fact became true | |
| sourceId | Yes | Source ID depositing the memory | |
| importance | No | Importance score | |
| sourceMemoryId | No | Original ID in the source system | |
| originalCreatedAt | No | ISO timestamp when created in source |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It adds valuable context about the trust & safety pipeline and the possibility of quarantine, which goes beyond basic parameter semantics. However, it doesn't cover aspects like rate limits, authentication needs, or error handling, keeping it from a perfect score.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly concise with two sentences that each earn their place: the first states the core action, and the second adds critical behavioral context. There's zero waste or redundancy, and it's front-loaded with the essential information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with 8 parameters, no annotations, and no output schema, the description provides adequate but incomplete context. It covers the core action and safety pipeline, but doesn't explain return values, error conditions, or prerequisites. Given the complexity, it should do more to be fully complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all 8 parameters thoroughly. The description doesn't add any additional meaning about parameters beyond what the schema provides, such as explaining relationships between fields or usage patterns. This meets the baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'ingest' and resource 'a single memory from an external source', making the purpose specific and understandable. However, it doesn't explicitly differentiate from sibling tools like 'memory_import' or 'memory_ingest_batch', which would be needed for a perfect score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'memory_import' or 'memory_ingest_batch' from the sibling list. It mentions the trust & safety pipeline, but this is behavioral context rather than usage guidance. No explicit when/when-not instructions are present.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
memory_ingest_batchAInspect
Ingest multiple memories from an external source in a single batch (max 100). Each memory passes through the trust & safety pipeline independently.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | Agent ID | |
| memories | Yes | Array of memories to ingest | |
| sourceId | Yes | Source ID depositing the memories | |
| syncCursor | No | Opaque cursor for incremental sync |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden. It discloses the batch size limit (max 100) and that each memory goes through trust & safety independently, which are useful behavioral traits. However, it doesn't mention authentication needs, rate limits, whether this is a write operation, or what happens on failure.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences with zero waste. The first sentence covers purpose, scope, and constraints. The second adds important behavioral context about trust & safety processing. Every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a write operation with 4 parameters and no annotations or output schema, the description is adequate but has gaps. It covers the batch nature and safety processing but doesn't explain what 'ingest' means operationally, what the tool returns, or error handling for a mutation tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the baseline is 3. The description doesn't add meaningful parameter semantics beyond what's already in the schema descriptions (e.g., what 'agentId' or 'sourceId' represent in context). It mentions 'external source' which relates to 'sourceId' but doesn't elaborate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('ingest multiple memories'), resource ('from an external source'), and scope ('in a single batch, max 100'). It distinguishes from the sibling 'memory_ingest' by specifying batch capability and external source focus.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool (batch ingestion from external sources with up to 100 items). However, it doesn't explicitly state when NOT to use it or name alternatives like 'memory_ingest' for single items or 'memory_import' for different source types.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
memory_source_approveAInspect
Approve a pending memory source. Issues a MemorySourceAuthorizationCredential (W3C VC) granting the source permission to deposit memories.
| Name | Required | Description | Default |
|---|---|---|---|
| userId | Yes | User ID of the human approving this source | |
| agentId | Yes | Agent ID | |
| sourceId | Yes | Source ID to approve |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden. It discloses that the tool issues a credential and grants permission, which indicates a write/mutation operation. However, it lacks details on behavioral traits such as required permissions, whether the action is reversible, rate limits, or error conditions, which are critical for a tool that authorizes access.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core action in the first sentence and adds necessary detail in the second. It is appropriately sized with zero waste, efficiently conveying purpose and outcome without redundancy or unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no annotations and no output schema, the description is incomplete for a mutation tool. It explains what the tool does but lacks details on behavioral aspects like authorization requirements, side effects, or return values. However, it covers the basic purpose and outcome, making it minimally adequate but with clear gaps in context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all three parameters (userId, agentId, sourceId) with clear descriptions. The description does not add any additional meaning or context beyond what the schema provides, such as explaining relationships between parameters or usage examples, meeting the baseline for high coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Approve a pending memory source') and the resource ('memory source'), distinguishing it from sibling tools like 'memory_source_register' or 'memory_source_revoke'. It also specifies the outcome ('Issues a MemorySourceAuthorizationCredential (W3C VC) granting permission to deposit memories'), making the purpose explicit and distinct.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context by mentioning 'pending memory source', suggesting it should be used after registration but before revocation. However, it does not explicitly state when to use this tool versus alternatives like 'memory_source_register' or 'memory_source_revoke', nor does it provide exclusions or prerequisites, leaving some ambiguity.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
memory_source_listCInspect
List all registered memory sources for an agent, optionally filtered by status.
| Name | Required | Description | Default |
|---|---|---|---|
| status | No | Filter by status. Omit to list all. | |
| agentId | Yes | Agent ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool lists memory sources with optional filtering, but does not cover critical aspects like whether this is a read-only operation, potential rate limits, authentication needs, or what the output format looks like. This is a significant gap for a tool with no annotation coverage.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core purpose and includes the optional filtering detail. There is no wasted wording, making it appropriately sized and well-structured for quick comprehension.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has no annotations and no output schema, the description is incomplete. It lacks details on behavioral traits, output format, and usage context, which are essential for an agent to effectively invoke this tool. The description does not compensate for the missing structured data, leaving significant gaps in understanding.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents both parameters ('agentId' and 'status') with descriptions and enum values. The description adds minimal value by mentioning optional filtering by status, but does not provide additional semantic context beyond what the schema offers, aligning with the baseline score for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('List') and resource ('all registered memory sources for an agent'), making the purpose specific and understandable. However, it does not explicitly differentiate this tool from sibling tools like 'memory_audit' or 'memory_ingest', which might also involve memory sources, so it misses full sibling differentiation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives, such as other memory-related tools in the sibling list. It mentions optional filtering by status but does not specify contexts, prerequisites, or exclusions for usage, leaving the agent without clear selection criteria.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
memory_source_registerAInspect
Register a new external memory source (e.g., ChatGPT, Claude Code) for this agent. Returns a DID and API key. Source starts in "pending" status and must be approved before it can deposit memories.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Human-readable name for the source (e.g., "My ChatGPT") | |
| agentId | Yes | Agent ID to register the source for | |
| platform | Yes | Platform identifier | |
| allowedTypes | No | Memory types this source is allowed to submit. Defaults to [fact, event, lesson, context]. | |
| rateLimitPerHour | No | Maximum ingests per hour. Default: 100. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It does reveal important behavioral traits: that registration returns a DID and API key, and that sources start in 'pending' status requiring approval. However, it doesn't mention permission requirements, whether this is a mutating operation, potential side effects, or error conditions. For a registration tool with no annotation coverage, this leaves significant gaps in behavioral understanding.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly concise at two sentences with zero wasted words. The first sentence states the core purpose and return values, while the second provides crucial workflow information about the pending status. Every sentence earns its place and the information is front-loaded effectively.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (registration with approval workflow), lack of annotations, and no output schema, the description provides a basic but incomplete picture. It covers the purpose and initial workflow state but doesn't explain what happens after approval, how the returned DID and API key should be used, error scenarios, or relationship to other memory operations. For a registration tool in a memory management system, more contextual information would be helpful.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the schema already documents all 5 parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema. It mentions the concept of 'external memory source' which relates to the platform parameter, but doesn't provide additional semantic context about parameter interactions or usage patterns. The baseline of 3 is appropriate when the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Register a new external memory source'), identifies the resource ('for this agent'), and distinguishes from siblings by specifying it's about memory source registration rather than approval, listing, or revocation. It provides concrete examples of what constitutes an external memory source (e.g., ChatGPT, Claude Code).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context about when to use this tool: when registering a new external memory source. It mentions the 'pending' status and approval requirement, which gives important workflow context. However, it doesn't explicitly state when NOT to use it or name specific alternatives among the sibling tools (like memory_source_approve or memory_source_list).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
memory_source_revokeAInspect
Revoke a memory source authorization. Revokes the VC and blocks the source. Optionally deletes all memories deposited by this source.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | Agent ID | |
| sourceId | Yes | Source ID to revoke | |
| deleteMemories | No | If true, cascade-delete all memories from this source. Default: false. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden. It discloses that the tool revokes authorization and blocks the source, and mentions optional memory deletion. However, it lacks critical behavioral details: whether this action is reversible, what permissions are required, if there are rate limits, what happens to dependent data, or what the response looks like. For a destructive tool with zero annotation coverage, this is insufficient.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise with just two sentences that directly state the tool's purpose and key optional behavior. Every word earns its place with no redundancy or unnecessary elaboration. The structure is front-loaded with the main action.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a destructive tool with no annotations and no output schema, the description provides basic purpose and parameter context but lacks important completeness elements. It doesn't explain what 'revoking VC' means, what 'blocks the source' entails operationally, whether there are confirmation steps, or what the tool returns. The description is adequate but has clear gaps for a tool that performs authorization revocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description adds value by explaining the behavioral consequence of the deleteMemories parameter ('deletes all memories deposited by this source'), which goes beyond the schema's technical description. However, it doesn't provide additional context for agentId or sourceId parameters.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('revoke', 'blocks', 'deletes') and identifies the resource ('memory source authorization'). It distinguishes itself from siblings like memory_source_approve and memory_source_list by focusing on revocation rather than approval or listing.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage when needing to revoke a memory source, but provides no explicit guidance on when to use this tool versus alternatives like agent_delete or privacy_revoke_consent. It mentions an optional parameter but doesn't explain when to set deleteMemories to true versus false.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
memory_syncBInspect
Incremental sync with an external platform. Uses stored cursor to fetch only new memories since last sync.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | Agent ID | |
| platform | Yes | Platform adapter name | |
| credentials | Yes | Platform credentials |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries full burden. It mentions the tool uses a stored cursor for incremental fetching, which is useful behavioral context. However, it lacks details on permissions needed, rate limits, error handling, whether it's idempotent, or what happens if credentials are invalid—critical for a sync operation with external platforms.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core purpose ('incremental sync') and adds key behavioral detail ('uses stored cursor'). There is no wasted verbiage, making it highly concise and well-structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no annotations and no output schema, the description is incomplete for a tool that performs external synchronization. It misses details on what 'memories' entail, the sync outcome (e.g., success/failure indicators), error scenarios, or how the cursor is managed. For a 3-parameter tool with complex credentials, this leaves significant gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all parameters (agentId, platform, credentials). The description adds no additional meaning about parameters beyond implying 'platform' refers to an external platform adapter and 'credentials' are for authentication. Baseline 3 is appropriate as the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool performs an 'incremental sync' with an external platform for 'memories', specifying it fetches only new items using a stored cursor. It distinguishes from siblings like memory_import or memory_ingest by focusing on delta updates, but doesn't explicitly contrast with all memory-related tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage when needing to fetch only new memories since a previous sync, suggesting a recurring synchronization context. However, it doesn't explicitly state when to use this versus alternatives like memory_import (full import) or memory_ingest (one-time ingestion), nor does it mention prerequisites like having an existing cursor.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
observeAInspect
Automatically capture conversation content as a memory without manual curation. Unlike "remember", you do not need to decide the type or importance — the system classifies the content and extracts structured facts automatically. Use this to passively record what happened during a session turn.
| Name | Required | Description | Default |
|---|---|---|---|
| hint | No | Optional type hint if you know what kind of content this is. Omit to let the system classify. | |
| goalId | No | Optional goal ID to bind this memory to. Goal-bound memories are retained until goal completion. | |
| agentId | No | Your agent ID (optional if session identity is set) | |
| content | Yes | The conversation content, observation, or event to capture (raw text — no curation needed) | |
| sessionId | No | Session/conversation ID for grouping captured memories. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden. It discloses key behavioral traits: the tool automatically classifies content and extracts structured facts, and it's for passive recording. However, it doesn't mention potential side effects like storage limits, error conditions, or what happens if classification fails. For a tool with no annotations, this leaves some behavioral aspects unclear.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose in the first sentence, followed by differentiation from 'remember' and usage instructions. Every sentence adds value: the first defines the tool, the second contrasts with siblings, and the third provides context. It's efficiently structured with zero waste.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (5 parameters, no output schema, no annotations), the description is fairly complete. It explains the purpose, usage, and automation features well. However, without annotations or output schema, it could benefit from more details on behavioral aspects like error handling or memory retention. It's adequate but has minor gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all 5 parameters thoroughly. The description mentions 'content' implicitly but doesn't add meaning beyond what the schema provides for any parameters. It states 'no curation needed' for content, which slightly clarifies usage but doesn't enhance parameter semantics significantly. Baseline 3 is appropriate when schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Automatically capture conversation content as a memory without manual curation.' It specifies the verb ('capture'), resource ('conversation content'), and distinguishes it from the sibling tool 'remember' by explaining the automation aspect. The description is specific and avoids tautology.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool: 'Use this to passively record what happened during a session turn.' It distinguishes it from 'remember' by stating 'Unlike "remember", you do not need to decide the type or importance — the system classifies the content and extracts structured facts automatically.' This gives clear context and alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
privacy_check_accessCInspect
Check if a requesting agent can access specific facts. Does not log the check.
| Name | Required | Description | Default |
|---|---|---|---|
| factIds | Yes | Fact IDs to check access for | |
| targetAgentId | Yes | Agent whose data is requested | |
| relationshipType | Yes | Relationship between requesting and target agent | |
| requestingAgentId | Yes | Agent requesting access |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It states the tool 'Does not log the check,' which is useful non-functional context, but lacks details on permissions required, rate limits, side effects (e.g., whether it's read-only or has other impacts), or response format. For a privacy/access-check tool with zero annotation coverage, this is insufficient.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise and front-loaded: two sentences with zero waste. The first sentence states the core purpose, and the second adds a key behavioral note. Every word earns its place, making it easy to parse quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (privacy/access checking with 4 required parameters) and lack of annotations and output schema, the description is incomplete. It doesn't explain what the tool returns (e.g., access granted/denied, reasons), error handling, or integration with other privacy tools. For a tool in this domain, more context is needed to use it effectively.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents all four parameters. The description adds no additional parameter semantics beyond implying the tool checks 'access' for 'facts,' which is already covered by parameter names and descriptions. This meets the baseline for high schema coverage, but doesn't enhance understanding.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Check if a requesting agent can access specific facts.' It specifies the verb ('check') and resource ('access to specific facts'), and distinguishes it from logging operations by noting 'Does not log the check.' However, it doesn't explicitly differentiate from sibling tools like 'privacy_disclosure_log' or 'privacy_grant_consent', which keeps it from a perfect score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It mentions 'Does not log the check,' which implies a contrast with logging tools, but doesn't name specific siblings or explain use cases. There's no mention of prerequisites, error conditions, or typical workflows, leaving usage context unclear.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
privacy_create_presentationCInspect
Create an authorized SD-JWT presentation for selective disclosure after checking privacy policy and consent.
| Name | Required | Description | Default |
|---|---|---|---|
| factId | Yes | Fact ID to create presentation for | |
| agentId | Yes | Agent that owns the fact being presented | |
| derivedClaim | No | For partial disclosure: the claim to prove (e.g., "age >= 18") | |
| disclosureLevel | Yes | How much to reveal: full (raw value), partial (derived claim only), existence_only (just proves the fact exists) | |
| relationshipType | Yes | Relationship between the requesting and target agent | |
| requestingAgentId | Yes | Agent requesting the presentation |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden for behavioral disclosure. It mentions authorization and policy/consent checking, but doesn't describe what happens during creation (e.g., whether this generates a token, stores data, requires specific permissions, or has rate limits). For a tool with 6 parameters and no annotation coverage, this is insufficient behavioral context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that states the core purpose upfront. It could potentially be more structured with separate clauses for prerequisites and outcomes, but it's appropriately sized with no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with 6 parameters, no annotations, and no output schema, the description provides basic purpose but lacks important context about what the tool actually returns, error conditions, or behavioral details. The 100% schema coverage helps, but the description alone is incomplete for a privacy-sensitive creation operation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The schema description coverage is 100%, so the schema already documents all parameters thoroughly with descriptions and enum values. The description doesn't add any parameter-specific information beyond what's in the schema, making the baseline 3 appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('create an authorized SD-JWT presentation for selective disclosure') and the resource involved, with a specific purpose of privacy policy and consent checking. It doesn't explicitly differentiate from sibling tools like privacy_check_access or privacy_grant_consent, but the creation focus is distinct enough for a 4.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description mentions checking privacy policy and consent as prerequisites, but provides no guidance on when to use this tool versus alternatives like privacy_check_access or privacy_grant_consent. There's no explicit when/when-not usage context or sibling tool comparisons.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
privacy_disclosure_logAInspect
View the disclosure audit trail for an agent. Shows who requested what data and the decision.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum entries to return (default: 50) | |
| agentId | Yes | Agent ID to view audit log for |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. While it indicates this is a read-only operation ('view', 'shows'), it does not specify permissions required, rate limits, pagination behavior (beyond the implied limit parameter), or what the output format looks like. For a tool handling sensitive privacy data with zero annotation coverage, this is a significant gap in transparency.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core purpose ('View the disclosure audit trail for an agent') and adds clarifying detail ('Shows who requested what data and the decision'). There is no wasted language, and every word contributes to understanding the tool's function.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (privacy-related audit logging) and lack of annotations or output schema, the description is minimally adequate. It covers the basic purpose but omits critical details like output format, error handling, or security implications. The high schema coverage helps, but for a tool in this domain, more context on behavior and results would improve completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with clear descriptions for both parameters (agentId and limit). The description does not add any additional meaning beyond what the schema provides, such as explaining the audit trail structure or decision outcomes. With high schema coverage, the baseline score of 3 is appropriate as the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('view', 'shows') and resources ('disclosure audit trail for an agent'), including what information is displayed ('who requested what data and the decision'). It distinguishes itself from sibling tools like privacy_check_access or privacy_list_grants by focusing on audit logs rather than access checks or consent management.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context by specifying it's for viewing an audit trail for an agent, suggesting it should be used when monitoring data disclosure activities. However, it lacks explicit guidance on when to use this versus alternatives like privacy_check_access (which checks access) or privacy_list_grants (which lists consents), and does not mention prerequisites or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
privacy_grant_consentAInspect
Grant data access to another agent. Only humans should invoke this tool to authorize agent-to-agent data sharing.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | Agent whose data will be shared | |
| createdBy | Yes | Who is granting consent (human identifier) | |
| dataScope | Yes | Scope of data access (e.g., "all", "identity", "preferences") | |
| grantedTo | Yes | Agent ID receiving access | |
| validUntil | No | Optional: ISO-8601 expiration timestamp | |
| grantedToType | Yes | Type of the grantee | |
| privacyLevels | Yes | Privacy levels to grant access to | |
| factCategories | No | Optional: restrict to specific fact categories |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It clearly indicates this is a mutation/write operation ('Grant') with authorization requirements ('Only humans should invoke'), but doesn't describe what happens after consent is granted, whether it's reversible (though privacy_revoke_consent exists as a sibling), or any rate limits/error conditions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, zero waste. The first sentence states the core purpose, the second provides critical usage constraint. Every word earns its place with no redundancy or unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a mutation tool with no annotations and no output schema, the description provides good context about the authorization requirement and human-only invocation. However, it doesn't explain what the tool returns or potential side effects. The existence of privacy_revoke_consent as a sibling suggests reversibility, but this isn't stated.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all 8 parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema descriptions. The baseline of 3 is appropriate when the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Grant data access') and resource ('to another agent'), distinguishing it from sibling privacy tools like privacy_check_access, privacy_list_grants, and privacy_revoke_consent. It specifies this is for agent-to-agent data sharing authorization.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit usage guidance: 'Only humans should invoke this tool to authorize agent-to-agent data sharing.' This clearly states when to use it (human authorization of agent data sharing) and implies when not to use it (agents should not self-authorize).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
privacy_list_grantsCInspect
List active (non-revoked, non-expired) consent grants for an agent.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | Agent ID to list grants for |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden. It implies a read-only operation by using 'List', but doesn't disclose behavioral traits like authentication requirements, rate limits, pagination, error handling, or what constitutes 'active' beyond the non-revoked/expired criteria. The description is minimal and lacks context on how the tool behaves in practice.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the key information: action, resource, and scope. There's no wasted verbiage, and it directly communicates the tool's purpose without unnecessary details.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the lack of annotations and output schema, the description is incomplete for a tool that likely returns sensitive privacy data. It doesn't explain the return format (e.g., list structure, fields like grant IDs or permissions), error cases, or security implications. For a privacy-related tool with no structured support, more context is needed to ensure safe and effective use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with the single parameter 'agentId' fully documented in the schema as 'Agent ID to list grants for'. The description doesn't add any meaning beyond this, such as format examples or validation rules. With high schema coverage, the baseline score of 3 is appropriate as the description doesn't compensate but also doesn't detract.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('List') and resource ('active consent grants for an agent'), specifying the scope as 'non-revoked, non-expired'. It distinguishes from sibling tools like privacy_grant_consent (create) and privacy_revoke_consent (revoke), but doesn't explicitly contrast with privacy_check_access or privacy_disclosure_log, which have different purposes.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like privacy_check_access or privacy_disclosure_log. It doesn't mention prerequisites, such as needing the agent to exist or having appropriate permissions, nor does it suggest scenarios where listing active grants is appropriate.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
privacy_revoke_consentCInspect
Revoke a previously granted consent. Immediately stops data sharing.
| Name | Required | Description | Default |
|---|---|---|---|
| reason | Yes | Reason for revocation | |
| grantId | Yes | ID of the consent grant to revoke | |
| revokedBy | Yes | Who is revoking (human identifier) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It states the action is a revocation with immediate effect, implying a destructive mutation, but doesn't cover permissions required, error conditions, whether the action is reversible, or what happens to associated data. For a mutation tool with zero annotation coverage, this leaves significant gaps in understanding its behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core action and includes a key behavioral detail ('immediately stops data sharing'). There is no wasted verbiage or redundancy, making it highly concise and well-structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity as a destructive mutation with no annotations and no output schema, the description is incomplete. It lacks details on permissions, error handling, return values, or side effects, which are critical for safe and effective use. The immediate effect is noted, but more context is needed for a tool of this nature.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all three parameters (grantId, revokedBy, reason) with clear descriptions. The description doesn't add any additional meaning or context beyond what the schema provides, such as format examples or constraints, so it meets the baseline of 3 for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('revoke') and target ('previously granted consent'), with the additional detail about immediate effect ('immediately stops data sharing'). It distinguishes from sibling tools like 'privacy_grant_consent' and 'privacy_list_grants' by focusing on revocation. However, it doesn't explicitly differentiate from other potential privacy tools beyond the immediate context.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives, prerequisites, or constraints. It mentions 'previously granted consent' but doesn't specify how to identify such grants or when revocation is appropriate versus other actions like updating consent.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
recallAInspect
Search your memories by meaning (semantic search). Finds relevant memories from any session.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Max results to return (default: 10) | |
| query | Yes | What to search for (natural language) | |
| agentId | No | Your agent ID | |
| minTrust | No | Minimum trust score (0-1). Omit to include all. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions the search method ('semantic search') and scope ('from any session'), but lacks details on permissions, rate limits, response format, or potential side effects. For a search tool with no annotation coverage, this leaves significant gaps in understanding its behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded and efficient, consisting of two concise sentences that directly state the tool's function and scope without any wasted words. Every sentence earns its place by providing essential information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (semantic search across sessions) and no annotations or output schema, the description is adequate but incomplete. It covers the purpose and scope but lacks details on behavioral traits, return values, or error handling, which are important for a search operation in this context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents all parameters (query, limit, agentId, minTrust). The description adds no additional parameter semantics beyond what's in the schema, such as examples or usage tips. Baseline 3 is appropriate when the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with a specific verb ('Search') and resource ('memories'), and distinguishes it from siblings by specifying 'by meaning (semantic search)' and 'from any session', which differentiates it from tools like agent_memory_query or memory_audit that might have different scopes or methods.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool ('Search your memories by meaning'), implying it's for semantic rather than exact-match searches. However, it does not explicitly state when not to use it or name alternatives among the many sibling tools, such as agent_facts_search or memory_ingest, which could serve similar purposes.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
registerAInspect
Self-register as a new agent. Creates your identity (DID), sets up free-tier memory, and generates a claim link your human can use to link with you. No API key or human approval needed.
| Name | Required | Description | Default |
|---|---|---|---|
| name | Yes | Your display name (e.g., "code-assistant", "research-agent") | |
| description | No | Optional description of what you do |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: it's a creation/mutation tool (implied by 'Creates'), sets up free-tier resources, and generates a claim link. However, it lacks details on error conditions, rate limits, or what happens on repeated registration, leaving some gaps.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the core purpose in the first sentence, followed by specific outcomes. Every sentence adds value: the first defines the action, the second lists results, and the third clarifies usage conditions. There is no wasted text, making it highly efficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with no annotations and no output schema, the description does well by explaining the tool's purpose, usage, and key behaviors. It covers the essential 'what' and 'when,' though it could improve by detailing output format or error handling. Given the complexity (a self-registration mutation), it's mostly complete but not exhaustive.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents both parameters ('name' and 'description'). The description does not add any additional semantic context about the parameters beyond what the schema provides, such as formatting constraints or examples for the 'name' field. Baseline 3 is appropriate when the schema handles parameter documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Self-register as a new agent') and enumerates the concrete outcomes: creating a DID identity, setting up free-tier memory, and generating a claim link. It distinguishes itself from siblings like 'agent_create' by emphasizing self-service registration without API keys or human approval.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly states when to use this tool ('No API key or human approval needed'), which differentiates it from tools like 'agent_create' that might require such prerequisites. It also implies this is for initial setup by mentioning 'your human can use to link with you,' providing clear context for usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
rememberBInspect
Store a memory. Persists across sessions and tools. Memories are organized into 3 tiers (active, session, long-term) automatically.
| Name | Required | Description | Default |
|---|---|---|---|
| type | No | Type of memory (default: fact) | fact |
| model | No | LLM model name (auto-detected from MCP connection if omitted) | |
| agentId | No | Your agent ID | |
| content | Yes | The memory content to store | |
| platform | No | Runtime platform (auto-detected from MCP connection if omitted) | |
| sessionId | No | Session/conversation ID | |
| importance | No | Importance score 0-1 (default: 0.5) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden. It discloses key behavioral traits: persistence across sessions/tools and automatic tier organization. However, it doesn't mention important aspects like whether this operation is idempotent, what happens on duplicate content, performance characteristics, or error conditions. The description adds value but leaves significant behavioral gaps.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise - just two sentences that efficiently convey the core functionality. Every word earns its place: first sentence states the primary action and key feature (persistence), second sentence adds crucial context about tier organization. No wasted words or redundant information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with 7 parameters, no annotations, and no output schema, the description is minimal but covers essential aspects. It explains what the tool does and mentions persistence and tier organization, but doesn't address return values, error handling, or how it differs from similar memory tools. Given the complexity and lack of structured metadata, more context would be helpful.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all 7 parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema. It mentions memory tiers but doesn't explain how parameters like 'importance' or 'type' relate to tier assignment. Baseline 3 is appropriate when schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Store a memory' with the key characteristic of persistence 'across sessions and tools'. It specifies the verb ('store') and resource ('memory'), but doesn't explicitly differentiate from sibling tools like 'agent_memory_store' or 'recall', which appear to be related memory operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. While it mentions memory organization into tiers, it doesn't specify when to choose 'remember' over other memory-related tools like 'agent_memory_store', 'memory_ingest', or 'recall'. There's no mention of prerequisites, constraints, or typical use cases.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
session_startAInspect
Call this FIRST at the start of every session. Automatically recommends and loads the best skills for your current work context. Returns your identity, loaded skills, and assembled skill content — all in one call. Report the loaded skills in one line so your human knows what capabilities are active.
| Name | Required | Description | Default |
|---|---|---|---|
| format | No | Output format for your IDE/runtime (default: claude-code) | claude-code |
| agentId | Yes | Your agent ID (from register or whoami) | |
| context | No | What you are working on — branch name, task description, or user request | |
| workType | No | Type of work (default: dev). Infer from context if unsure. | dev |
| tokenBudget | No | Maximum token budget for loaded skills (default: 8000) | |
| toolAvailability | No | Tools available in the current runtime (e.g., ["git", "grep", "read"]). Skills with requiredTools will only be recommended if those tools are available. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool's behavior: it's a session initialization tool that automatically recommends and loads skills based on context, returns multiple pieces of information in one call, and includes a reporting requirement. It doesn't mention error handling, performance characteristics, or side effects, but covers the core behavioral aspects well.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly structured and concise. The first sentence establishes the primary directive, the second explains the core functionality, the third details the return values, and the fourth provides a specific reporting instruction. Every sentence earns its place with no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a 6-parameter tool with no annotations and no output schema, the description does well by explaining the tool's purpose, timing, behavior, and reporting requirements. However, it doesn't describe the format or structure of the returned data (beyond listing what's included), which would be helpful given the absence of an output schema. The description is mostly complete but could benefit from more detail about the response format.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The schema description coverage is 100%, so the baseline is 3. The description doesn't add any parameter-specific information beyond what's already documented in the schema. It mentions 'current work context' which relates to the 'context' parameter but doesn't provide additional semantic context beyond the schema's documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('call this FIRST', 'automatically recommends and loads the best skills') and distinguishes it from siblings by emphasizing it's the session initialization tool. It explicitly mentions what it returns ('identity, loaded skills, and assembled skill content') and provides a reporting instruction.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit usage guidance: 'Call this FIRST at the start of every session' establishes clear timing. While it doesn't name specific alternatives, the 'FIRST' directive and context of session initialization provide strong implicit guidance about when to use this versus other tools in the extensive sibling list.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
skill_generate_lessonsBInspect
Generate lesson memories from skill effectiveness data. Useful for session handoff — creates lesson-type memories summarizing skill performance so future sessions can self-reason about past effectiveness.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | The ID of the agent |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions that the tool 'creates lesson-type memories,' implying a write operation, but doesn't specify permissions needed, whether it's idempotent, rate limits, or what happens to existing data. The description adds some context about session handoff and future reasoning, but lacks critical behavioral details for a mutation tool.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise and front-loaded, with two sentences that directly explain the tool's purpose and usage. There's no wasted text, and it efficiently communicates key information. However, it could be slightly more structured by separating purpose from context more clearly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (a mutation tool with no annotations and no output schema), the description is moderately complete. It explains what the tool does and its high-level purpose, but lacks details on behavioral traits, output format, error handling, or integration with sibling tools. This leaves gaps for an AI agent to use it correctly in varied scenarios.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage for its single parameter ('agentId'), so the schema already documents it adequately. The description doesn't add any parameter-specific information beyond what's in the schema, such as format examples or constraints. With high schema coverage, the baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Generate lesson memories from skill effectiveness data.' It specifies the verb ('generate'), resource ('lesson memories'), and source data ('skill effectiveness data'). However, it doesn't explicitly differentiate from sibling tools like 'skill_get_effectiveness' or 'memory_ingest', which could handle similar data.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides some implied usage context: 'Useful for session handoff — creates lesson-type memories summarizing skill performance so future sessions can self-reason about past effectiveness.' This suggests it's for creating summarized memories from effectiveness data, but it doesn't explicitly state when to use this tool versus alternatives like 'memory_store' or 'skill_record_usage', nor does it mention any prerequisites or exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
skill_get_effectivenessCInspect
Get effectiveness metrics for a specific skill.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | The ID of the agent | |
| endDate | No | End of period for effectiveness calculation | |
| skillId | Yes | The ID of the skill | |
| startDate | No | Start of period for effectiveness calculation |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It states it 'gets' metrics, implying a read-only operation, but doesn't clarify what 'effectiveness metrics' include, whether there are rate limits, authentication requirements, or how data is returned (e.g., format, pagination). This leaves significant gaps for an agent to understand the tool's behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core purpose without unnecessary words. Every part of the sentence earns its place by specifying the action, resource, and target, making it easy to parse quickly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of retrieving metrics (which often involves data aggregation and interpretation), lack of annotations, and no output schema, the description is insufficient. It doesn't explain what 'effectiveness metrics' entail, potential constraints, or return format, leaving the agent with incomplete context for proper use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with all parameters well-documented in the schema (e.g., agentId, skillId, startDate, endDate). The description adds no additional parameter semantics beyond implying metrics are calculated for a period (via startDate and endDate), but this is already clear from the schema. Baseline 3 is appropriate as the schema does the heavy lifting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Get') and resource ('effectiveness metrics for a specific skill'), making the purpose immediately understandable. However, it doesn't differentiate from sibling tools like 'skill_get_ranking' or 'skill_list', which also retrieve skill-related data, so it doesn't achieve full sibling distinction.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. There are multiple sibling tools related to skills (e.g., skill_get_ranking, skill_list, skill_recommend), but the description doesn't mention any context, prerequisites, or exclusions for selecting this specific tool.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
skill_get_rankingCInspect
Get skills ranked by effectiveness for an agent.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of skills to return | |
| agentId | Yes | The ID of the agent |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It states what the tool does but doesn't explain how it works—such as what 'effectiveness' means, how ranking is determined, whether results are cached, or what format the output takes. This leaves significant gaps for a tool that presumably returns ranked data.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core purpose without unnecessary words. It earns its place by clearly stating the tool's function in minimal terms.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of ranking skills by effectiveness, the lack of annotations, and no output schema, the description is insufficient. It doesn't explain the ranking methodology, output format, or behavioral traits, leaving the agent with incomplete context for proper use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents both parameters (agentId and limit). The description doesn't add any parameter-specific details beyond what's in the schema, such as clarifying what 'effectiveness' entails or how the limit applies. This meets the baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Get skills ranked') and the resource ('by effectiveness for an agent'), providing a specific verb+resource combination. However, it doesn't explicitly differentiate from sibling tools like 'skill_get_effectiveness' or 'skill_recommend', which appear related but have different purposes.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'skill_get_effectiveness' or 'skill_recommend'. There's no mention of prerequisites, context, or exclusions, leaving the agent to infer usage from the tool name alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
skill_listCInspect
List skills for an agent with temperature and effectiveness info.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | The ID of the agent | |
| temperature | No | Filter by skill temperature | |
| includeEffectiveness | No | Whether to include effectiveness metrics |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the tool lists skills with 'temperature and effectiveness info', which implies a read-only operation, but doesn't clarify permissions, rate limits, pagination, or error handling. For a tool with no annotations, this is insufficient to ensure safe and effective use.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core action ('List skills') and key details. There's no wasted verbiage, making it easy to parse. However, it could be slightly more structured by explicitly separating purpose from optional features.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (3 parameters, no output schema, no annotations), the description is minimally adequate. It covers the basic purpose but lacks guidance on usage, behavioral traits, and output format. Without annotations or an output schema, the agent must infer too much, leaving gaps in understanding.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents all parameters (agentId, temperature, includeEffectiveness). The description adds minimal value beyond the schema by hinting at 'temperature and effectiveness info', which loosely relates to parameters but doesn't provide additional syntax or usage details. This meets the baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('List') and resource ('skills for an agent'), making the purpose understandable. It specifies what information is included ('temperature and effectiveness info'), which helps distinguish it from generic list operations. However, it doesn't explicitly differentiate from sibling tools like 'skill_get_effectiveness' or 'skill_get_ranking', which prevents a perfect score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing an agent ID), exclusions, or comparisons to sibling tools like 'skill_list' vs. 'skill_get_effectiveness'. This leaves the agent with minimal context for tool selection.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
skill_loadCInspect
Load a skill for an agent. Skills provide specific capabilities and are tracked for effectiveness.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | The ID of the agent | |
| skillId | Yes | The ID of the skill to load |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries full burden. It mentions that skills are 'tracked for effectiveness,' hinting at monitoring or logging behavior, but fails to disclose critical traits: whether loading is idempotent, if it requires specific permissions, potential side effects (e.g., memory usage), error conditions, or what happens post-load (e.g., immediate availability). For a mutation tool with zero annotation coverage, this leaves significant gaps in understanding its behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise and front-loaded with the core action ('Load a skill for an agent'), followed by a clarifying sentence about skills. Both sentences earn their place by defining the tool and providing context, with no wasted words. It could be slightly more structured by separating usage notes, but it's efficient overall.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (a mutation operation with no annotations and no output schema), the description is incomplete. It lacks details on behavioral traits, error handling, return values, or prerequisites. While it states the purpose, it doesn't provide enough context for safe and effective use, especially compared to siblings like 'skill_unload' which might have similar gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with both parameters ('agentId' and 'skillId') clearly documented in the schema. The description adds no additional meaning beyond the schema, such as format examples, relationship between agent and skill, or where to obtain IDs. Baseline 3 is appropriate when the schema does the heavy lifting, but no extra value is added.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Load a skill') and target ('for an agent'), with a brief explanation of what skills provide ('specific capabilities and are tracked for effectiveness'). It distinguishes from siblings like 'skill_unload' (opposite action) and 'skill_list' (listing vs. loading), though not explicitly. However, it doesn't specify how loading differs from just activating or if it's a prerequisite for usage, keeping it from a perfect score.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No explicit guidance on when to use this tool versus alternatives is provided. It doesn't mention prerequisites (e.g., whether the agent or skill must exist), when not to use it (e.g., if already loaded), or direct alternatives like 'skill_loader_recommend' for selection help. The context is implied from the action but lacks operational clarity.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
skill_loader_explainBInspect
Get the scoring breakdown and reasons for a previous skill recommendation session.
| Name | Required | Description | Default |
|---|---|---|---|
| sessionId | Yes | Session ID from a previous skill_loader_recommend call |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool retrieves data ('Get'), implying a read-only operation, but doesn't clarify if it requires authentication, has rate limits, returns structured or unstructured data, or handles errors. For a tool with no annotation coverage, this leaves significant behavioral gaps.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, clear sentence that efficiently conveys the tool's purpose without unnecessary words. It's front-loaded with the core function and includes essential context (reference to 'skill_loader_recommend'). Every part of the sentence earns its place, making it highly concise and well-structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has one parameter with full schema coverage but no annotations or output schema, the description is minimally adequate. It explains what the tool does and references the required input, but doesn't cover behavioral aspects like authentication needs, error handling, or output format. For a simple retrieval tool, this is acceptable but leaves room for improvement in transparency.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The schema description coverage is 100%, with the single parameter 'sessionId' fully documented in the schema as 'Session ID from a previous skill_loader_recommend call.' The description doesn't add any additional semantic context beyond this, such as format examples or validation rules. With high schema coverage, the baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Get the scoring breakdown and reasons for a previous skill recommendation session.' It specifies the verb ('Get') and resource ('scoring breakdown and reasons'), making the function unambiguous. However, it doesn't explicitly differentiate from sibling tools like 'skill_loader_recommend' or 'skill_get_effectiveness' beyond the implied connection to 'skill_loader_recommend' via the sessionId parameter.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context by referencing 'a previous skill_loader_recommend call,' suggesting this tool should be used after that specific sibling. However, it doesn't provide explicit guidance on when to use this versus alternatives like 'skill_get_effectiveness' or 'skill_recommend,' nor does it mention any exclusions or prerequisites beyond the sessionId requirement.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
skill_loader_recommendAInspect
Get intelligent skill recommendations for a work context. Uses multi-factor scoring (task relevance, phase match, effectiveness, temperature, tool availability, profile bias, trust) to select an optimal skill loadout within a token budget.
| Name | Required | Description | Default |
|---|---|---|---|
| phase | No | Current phase (e.g., coding, testing, review) | |
| agentId | Yes | The ID of the agent | |
| taskText | No | Description of the task to match skills against | |
| workType | Yes | Type of work the agent is performing | |
| tokenBudget | No | Maximum token budget for loaded skills (default 8000) | |
| defaultProfile | No | Default skill profile to bias towards | |
| toolAvailability | No | Tools available in the current runtime |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses behavioral traits such as using 'multi-factor scoring' and selecting within a 'token budget,' which hints at optimization and resource constraints. However, it does not detail permissions, rate limits, or what happens if no skills fit the budget, leaving gaps in behavioral context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately sized and front-loaded, starting with the core purpose and key mechanisms in a single sentence. Every sentence earns its place by explaining the scoring factors and token budget constraint, though it could be slightly more streamlined by integrating the factors list more smoothly.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of 7 parameters, no annotations, and no output schema, the description is moderately complete. It covers the purpose and scoring logic but lacks details on output format, error handling, or prerequisites. For a tool with this parameter count and no structured safety hints, it should provide more behavioral guidance.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all 7 parameters thoroughly. The description adds value by explaining the scoring factors (e.g., 'task relevance, phase match'), which relate to parameters like taskText and phase, but does not provide additional syntax or format details beyond what the schema offers.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with a specific verb ('Get') and resource ('skill recommendations'), specifying it's for a 'work context' and uses 'multi-factor scoring' to select 'an optimal skill loadout within a token budget.' It distinguishes itself from siblings like skill_recommend by focusing on loading recommendations rather than general recommendations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage in a work context with a token budget, but does not explicitly state when to use this tool versus alternatives like skill_recommend or skill_load. It mentions factors like 'task relevance' and 'phase match,' suggesting context-dependent use, but lacks clear exclusions or named alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
skill_loader_resolveAInspect
Load skill content for selected skills and assemble into a context block for a specific IDE/runtime. Call after skill_loader_recommend to get the actual skill content.
| Name | Required | Description | Default |
|---|---|---|---|
| format | No | Output format for target IDE (default: claude-code). claude-code=CLAUDE.md, cursor-rules=.mdc, codex=AGENTS.md, antigravity=instructions.md, json=structured, markdown=legacy alias | claude-code |
| agentId | Yes | The ID of the agent | |
| skillIds | Yes | Skill IDs to load (from recommend results) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden for behavioral disclosure. While it mentions the tool loads and assembles content, it doesn't describe what 'loading' entails (e.g., fetching from storage, parsing), what 'assembling' means (e.g., formatting, concatenation), whether this is a read-only operation, what permissions might be required, or what happens if skillIds are invalid. The description is insufficient for a tool that presumably accesses and processes skill data.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly concise - two sentences that directly state the tool's purpose and usage guideline with zero wasted words. It's front-loaded with the core functionality and follows with the workflow instruction.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with 3 parameters, 100% schema coverage, but no annotations and no output schema, the description is minimally adequate. It covers the basic purpose and workflow but lacks crucial behavioral context about how the tool actually works, what it returns, and potential side effects. Given the complexity implied by 'loading' and 'assembling' skill content, more detail would be helpful.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description doesn't add any meaningful parameter semantics beyond what's in the schema - it mentions 'selected skills' which maps to skillIds and 'specific IDE/runtime' which maps to format, but provides no additional context about parameter usage, constraints, or interactions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Load skill content for selected skills and assemble into a context block for a specific IDE/runtime.' It specifies both the action (load and assemble) and the resource (skill content), but doesn't explicitly differentiate it from its closest sibling 'skill_loader_resolve_multi' or other skill-related tools like 'skill_load' or 'skill_list'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear usage guidance: 'Call after skill_loader_recommend to get the actual skill content.' This establishes a prerequisite workflow relationship. However, it doesn't specify when NOT to use this tool or mention alternatives like 'skill_loader_resolve_multi' or 'skill_load'.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
skill_loader_resolve_multiAInspect
Load skill content and assemble into multiple IDE formats at once. Returns a map of format → assembled content. Useful for multi-runtime swarms.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | The ID of the agent | |
| formats | Yes | Output formats to generate (e.g., ["claude-code", "codex", "cursor-rules"]) | |
| skillIds | Yes | Skill IDs to load (from recommend results) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. While it mentions the tool loads and assembles content, it doesn't describe important behaviors like whether this is a read-only operation, what permissions are needed, whether it's idempotent, rate limits, or error conditions. The description is insufficient for a mutation-like operation with zero annotation coverage.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is perfectly concise with two sentences that each earn their place. The first sentence states the core functionality, and the second provides valuable context about when it's useful. There's zero wasted language or redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given that this appears to be a content generation/assembly tool with no annotations and no output schema, the description is incomplete. It doesn't explain what the assembled content looks like, whether there are size limits, what happens if skillIds are invalid, or how errors are handled. For a tool that presumably creates output in multiple formats, more behavioral context is needed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description doesn't add any meaningful parameter semantics beyond what's in the schema - it doesn't explain relationships between parameters, provide examples of valid skillIds, or clarify format selection strategies.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('Load skill content and assemble into multiple IDE formats at once') and distinguishes it from siblings like 'skill_loader_resolve' (single format) and 'skill_load' (no assembly). It explicitly mentions the multi-format output capability.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool ('Useful for multi-runtime swarms'), indicating it's for generating multiple output formats simultaneously. However, it doesn't explicitly state when NOT to use it or name specific alternatives like 'skill_loader_resolve' for single-format needs.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
skill_recommendCInspect
Get skill recommendations for a task based on past effectiveness.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum number of skills to recommend | |
| agentId | Yes | The ID of the agent | |
| taskDescription | Yes | Description of the task to recommend skills for |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool 'Get skill recommendations' but does not describe how recommendations are generated, whether they are personalized or general, if there are rate limits, authentication needs, or what the output format looks like. This leaves significant gaps for a tool that likely involves data processing and recommendations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence: 'Get skill recommendations for a task based on past effectiveness.' It is front-loaded with the core purpose, has zero wasted words, and is appropriately sized for the tool's complexity. Every part of the sentence earns its place by conveying essential information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (involving recommendations based on past data), lack of annotations, and no output schema, the description is incomplete. It does not explain the behavioral aspects, output format, or how recommendations are derived, which are critical for an AI agent to use the tool effectively. The description alone is insufficient for full contextual understanding.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, with clear documentation for 'agentId', 'taskDescription', and 'limit'. The description adds no additional semantic context beyond what the schema provides, such as examples or usage nuances. With high schema coverage, the baseline score of 3 is appropriate, as the description does not compensate but also does not detract.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Get skill recommendations for a task based on past effectiveness.' It specifies the verb ('Get'), resource ('skill recommendations'), and context ('for a task based on past effectiveness'). However, it does not explicitly differentiate from sibling tools like 'skill_loader_recommend' or 'skill_get_ranking', which appear related to skills, so it misses full sibling differentiation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It lacks explicit context, exclusions, or references to sibling tools such as 'skill_loader_recommend' or 'skill_get_effectiveness', which might offer similar or complementary functionality. Usage is implied only by the purpose statement.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
skill_record_usageCInspect
Record a skill usage with success/failure outcome. This updates effectiveness metrics.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | The ID of the agent | |
| context | No | Context of how the skill was used | |
| skillId | Yes | The ID of the skill | |
| success | Yes | Whether the skill usage was successful | |
| durationMs | No | Duration of skill execution in milliseconds |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool 'updates effectiveness metrics', implying a write operation, but doesn't clarify permissions needed, whether the update is immediate or batched, or if there are side effects (e.g., triggering notifications). For a mutation tool with zero annotation coverage, this is insufficient.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core action ('Record a skill usage') and adds clarifying purpose ('updates effectiveness metrics'). There is no wasted verbiage, and it's appropriately sized for the tool's complexity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given this is a mutation tool (implied by 'updates') with no annotations and no output schema, the description is incomplete. It doesn't address behavioral aspects like error conditions, response format, or idempotency. With 5 parameters and sibling tools that might overlap (e.g., skill-related queries), more context is needed for effective use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema fully documents all 5 parameters. The description adds no parameter-specific information beyond implying that 'success' and possibly other inputs affect 'effectiveness metrics'. This meets the baseline of 3 since the schema does the heavy lifting, but the description doesn't enhance understanding of parameter interactions or constraints.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Record a skill usage') and the outcome ('with success/failure outcome'), and specifies the purpose ('updates effectiveness metrics'). It distinguishes itself from sibling tools like skill_get_effectiveness or skill_list by focusing on recording rather than querying. However, it doesn't explicitly differentiate from all siblings (e.g., skill_generate_lessons might also involve skill usage).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., whether the agent or skill must exist), nor does it suggest when not to use it (e.g., for hypothetical scenarios). With many sibling tools related to skills and agents, this lack of contextual guidance is a significant gap.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
skill_unloadCInspect
Unload a skill from an agent.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | The ID of the agent | |
| skillId | Yes | The ID of the skill to unload |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries full burden. While 'unload' implies a mutation operation, it doesn't disclose whether this requires specific permissions, whether it's reversible, what happens to agent functionality after unloading, or any rate limits. The description is minimal and lacks behavioral context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, clear sentence with zero wasted words. It's front-loaded with the core action and resource, making it immediately understandable without unnecessary elaboration.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a mutation tool with no annotations and no output schema, the description is insufficient. It doesn't explain what 'unload' entails operationally, whether there are side effects, what the return value might be, or error conditions. Given the complexity of modifying agent capabilities, more context is needed.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so the schema already documents both parameters (agentId and skillId) adequately. The description adds no additional meaning about parameters beyond what's in the schema, such as format examples or relationship between them, meeting the baseline for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('unload') and resource ('a skill from an agent'), making the purpose immediately understandable. It doesn't differentiate from sibling tools like 'skill_load' or 'skill_list', but it's specific enough to understand the basic function.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'skill_load' or 'skill_delete', nor does it mention prerequisites such as whether the skill must be currently loaded. It simply states what the tool does without contextual usage information.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
trust_statusCInspect
Check your trust status — link, memory health, and safety events.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | Your agent ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions checking 'trust status' but doesn't explain what that entails—whether it's a read-only operation, requires authentication, has rate limits, or what the output format might be. This is a significant gap for a tool with potential security implications.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the core purpose without unnecessary details. It's appropriately sized for a simple tool, though it could be slightly more structured to include usage hints.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the lack of annotations and output schema, the description is incomplete. It doesn't explain what 'trust status' returns (e.g., a score, details on events) or behavioral aspects like error handling. For a tool checking critical aspects like safety events, more context is needed to guide the agent effectively.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, with the single parameter 'agentId' documented as 'Your agent ID'. The description doesn't add any meaning beyond this, such as explaining how the agentId is used or its format. With high schema coverage, the baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('check') and resources ('trust status — link, memory health, and safety events'), making it easy to understand what it does. However, it doesn't explicitly differentiate from sibling tools like 'link' or 'memory_audit', which might have overlapping functionality.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives, such as 'agent_link_verify' or 'memory_health' related tools. It lacks explicit context, prerequisites, or exclusions, leaving the agent to infer usage based on the name alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
whoamiAInspect
Check your agent identity — name, DID, status, and memory count.
| Name | Required | Description | Default |
|---|---|---|---|
| agentId | Yes | Your agent ID |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden. It describes a read-only operation ('check') that returns identity information, which implies non-destructive behavior, but does not disclose details like authentication requirements, rate limits, or error conditions. It adds basic context but lacks comprehensive behavioral traits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence that front-loads the purpose ('Check your agent identity') and lists specific return values. There is no wasted language, and it effectively communicates the tool's function without unnecessary detail.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's low complexity (one required parameter) and no output schema, the description is reasonably complete for a read-only identity check. It specifies what information is returned, but could improve by mentioning the response format or any prerequisites. Without annotations, it adequately covers the basics but has minor gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, with the single parameter 'agentId' documented as 'Your agent ID'. The description does not add meaning beyond this, as it does not explain parameter usage or constraints. With high schema coverage, the baseline score of 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verbs ('check') and resources ('agent identity'), listing the exact information returned (name, DID, status, memory count). It distinguishes from sibling tools like agent_get or agent_list by focusing on identity verification rather than retrieval or listing operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context ('check your agent identity') for self-verification scenarios, but does not explicitly state when to use this tool versus alternatives like agent_get (which might retrieve similar data) or when not to use it. It provides clear intent but lacks explicit comparison or exclusion guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!