Skip to main content
Glama

Ace Memory

Server Details

Persistent memory for AI agents. Semantic search, memory graph, W3C DID identity.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsB

Average 3.3/5 across 82 of 82 tools scored. Lowest: 2.6/5.

Server CoherenceB
Disambiguation2/5

The tool set has significant overlap and ambiguity, particularly in memory and context management. For example, 'agent_memory_store', 'remember', and 'observe' all handle memory storage with unclear distinctions, while 'assemble_context' and 'session_start' both assemble context but with different approaches. This overlap makes it difficult for an agent to reliably choose the correct tool without deep domain knowledge.

Naming Consistency4/5

Most tools follow a consistent verb_noun or noun_verb pattern (e.g., 'agent_create', 'memory_export', 'privacy_check_access'), with clear prefixes like 'agent_', 'memory_', 'skill_' for grouping. However, there are minor deviations such as 'link', 'observe', 'recall', and 'whoami' that break this pattern, slightly reducing overall consistency.

Tool Count2/5

With 82 tools, the count is excessive for a memory management server, leading to bloat and potential confusion. Many tools could be consolidated (e.g., multiple memory storage and context assembly tools) or omitted without losing functionality. This large number overwhelms the core purpose and suggests poor scoping.

Completeness5/5

The tool set provides comprehensive coverage for agent memory management, including creation, updating, deletion, benchmarking, dreaming, privacy, skills, and context assembly. It supports full CRUD operations across agents, memories, goals, and other entities, with no apparent gaps in the domain's lifecycle or workflows.

Available Tools

82 tools
agent_benchmark_historyBInspect

Get benchmark run history for the tenant. Optionally filter by run type.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum results to return. Default: 50.
runTypeNoFilter by benchmark type. Omit to get all.
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions optional filtering but fails to describe key traits such as pagination behavior (implied by 'limit' parameter), rate limits, authentication requirements, or the format of returned history data. This leaves significant gaps for an agent to understand operational constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose ('Get benchmark run history for the tenant') and adds a concise optional feature ('Optionally filter by run type'). There is no wasted wording, making it easy for an agent to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 parameters, no output schema, no annotations), the description is adequate but incomplete. It covers the basic purpose and optional filtering, but without annotations or output schema, it misses details on behavioral traits (e.g., data format, pagination) that would help an agent invoke it correctly. This results in a minimal viable description with clear gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with clear documentation for 'limit' (maximum results, default 50) and 'runType' (filter by benchmark type, enum values). The description adds minimal value beyond the schema by noting the optional filtering, but it doesn't provide additional context like typical use cases or parameter interactions. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get benchmark run history for the tenant' specifies the verb ('Get') and resource ('benchmark run history'), with an optional filtering capability. It distinguishes from most siblings (e.g., agent_benchmark_run, agent_benchmark_trend) by focusing on historical data retrieval, though it doesn't explicitly differentiate from agent_dream_history or similar history tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides implied usage through 'Optionally filter by run type,' suggesting when to apply the runType parameter. However, it lacks explicit guidance on when to use this tool versus alternatives like agent_benchmark_trend (which might analyze trends) or agent_list (which could list other entities), and no exclusions or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_benchmark_runAInspect

Run a memory benchmark. Types: "external" (MemoryBench adapter), "internal" (curated test corpus), "production" (telemetry snapshot). Returns MemScore triple (accuracy, latencyMs, contextTokens).

ParametersJSON Schema
NameRequiredDescriptionDefault
typeYesBenchmark type to run
agentIdYesAgent ID to benchmark
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses the return format (MemScore triple) and benchmark types, which is useful behavioral context. However, it omits details like execution time, resource consumption, permissions needed, or whether it's a read-only or mutating operation, leaving gaps for a tool that performs active benchmarking.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in two sentences: the first states the purpose and types, the second specifies the return value. Every phrase adds value without redundancy, and it's front-loaded with the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description partially compensates by explaining the return format (MemScore triple) and benchmark types. However, for a tool that likely involves significant computation or system impact, it lacks details on error conditions, side effects, or output structure beyond the triple names, leaving room for improvement.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already fully documents both parameters (type with enum values, agentId). The description adds no additional parameter semantics beyond what's in the schema, such as explaining what 'external' vs 'internal' entails or agentId format expectations, meeting the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Run a memory benchmark') and resource ('memory benchmark'), with explicit differentiation from siblings by specifying the unique benchmark types and return format. It distinguishes from tools like agent_benchmark_history or agent_benchmark_trend by focusing on execution rather than historical analysis.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context by listing the three benchmark types and their meanings, which implicitly guides when to use each. However, it lacks explicit when-not-to-use guidance or named alternatives among sibling tools (e.g., when to use agent_benchmark_history instead for past results).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_benchmark_trendBInspect

Get score trend for a specific benchmark over time. Shows how MemScore has changed across runs.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoNumber of recent runs to include. Default: 20.
agentIdYesAgent ID to scope the trend to
benchmarkNameYesBenchmark name (e.g., "internal-eval", "memorybench-basic", "production-telemetry")
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It mentions that the tool shows 'how MemScore has changed across runs,' which implies read-only behavior and time-series data, but doesn't address permissions, rate limits, data freshness, or error conditions. For a tool with no annotations, this leaves significant behavioral gaps unaddressed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise - just two sentences that directly state the tool's purpose and what it shows. Every word earns its place with zero redundancy or unnecessary elaboration. It's front-loaded with the core functionality.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 3 parameters, 100% schema coverage, no output schema, and no annotations, the description provides adequate basic purpose but lacks important context. It doesn't explain what format the trend data returns, how time periods are determined, or what 'MemScore' represents. For a trend analysis tool, more output context would be helpful despite the good schema coverage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description adds no additional parameter information beyond what's in the schema - it doesn't explain relationships between parameters or provide usage examples. This meets the baseline 3 when schema coverage is complete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get score trend for a specific benchmark over time' specifies the action (get trend) and resource (benchmark scores). It distinguishes from siblings like agent_benchmark_history or agent_benchmark_run by focusing on trend analysis rather than raw history or individual runs. However, it doesn't explicitly differentiate from all possible alternatives, keeping it at 4 rather than 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention when to choose agent_benchmark_trend over agent_benchmark_history or other benchmarking tools, nor does it specify prerequisites or exclusions. The lack of usage context leaves the agent without clear decision criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_context_clearAInspect

Clear the active context for an agent. Use when a task is complete or the agent needs a fresh start.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesThe ID of the agent
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the tool 'clears' context, implying a destructive mutation, but doesn't specify whether this action is reversible, what exactly gets cleared (e.g., conversation history, temporary variables), or what permissions are required. For a mutation tool with zero annotation coverage, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise with two sentences that each earn their place: the first states the purpose, the second provides usage guidance. It's front-loaded with the core action and wastes no words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (a destructive operation with one parameter) and the absence of both annotations and output schema, the description is minimally adequate. It explains what the tool does and when to use it, but doesn't address behavioral aspects like side effects, error conditions, or what happens after clearing. The 100% schema coverage helps, but more behavioral context would be needed for completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100% (the single parameter 'agentId' is fully documented in the schema), so the baseline is 3. The description adds no additional parameter information beyond what the schema already provides about the agentId parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Clear') and target ('active context for an agent'), providing a specific verb+resource combination. It distinguishes from sibling tools like 'agent_context_get' (which retrieves context) and 'agent_memory_*' tools (which manage persistent memory). However, it doesn't explicitly differentiate from all siblings like 'memory_clean' or 'context_budget_*' tools, which prevents a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use the tool ('when a task is complete or the agent needs a fresh start'), giving clear context for its application. It doesn't specify when NOT to use it or name alternatives (like whether 'agent_memory_clean' serves a different purpose), which keeps it from a score of 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_context_getCInspect

Get the active context for an agent including current task, goal, and recent memories.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesThe ID of the agent
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states what information is retrieved but does not describe how the context is structured, whether it's real-time or cached, any permissions required, rate limits, or error conditions. For a tool with no annotations, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action and details. It wastes no words and directly communicates the tool's function without redundancy. Every part of the sentence earns its place by specifying what is retrieved.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of retrieving agent context, no annotations, and no output schema, the description is incomplete. It lacks details on the structure of the returned context, how recent memories are defined, whether the operation is idempotent, or any error handling. For a tool with no structured support, more context is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with the single parameter 'agentId' fully documented in the schema. The description does not add any meaning beyond the schema, such as explaining what constitutes a valid agent ID or how to obtain it. Baseline 3 is appropriate since the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get' and the resource 'active context for an agent', specifying what information is retrieved (current task, goal, and recent memories). It distinguishes from siblings like agent_get (general agent info) or agent_memory_query (specific memory queries), though not explicitly named. The purpose is specific but could be more precise about sibling differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It does not mention when to choose it over tools like agent_get (for basic agent details), agent_memory_query (for memory-specific queries), or assemble_context (for context assembly). There is no indication of prerequisites, timing, or exclusions, leaving usage unclear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_createCInspect

Create a new agent with identity. The agent will be assigned a DID and can be linked to a user.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesDisplay name for the agent
ownerIdYesUser ID of the agent owner
descriptionNoDescription of the agent's purpose and capabilities
personalityTemplateNoPersonality template to apply
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the agent gets a DID and can be linked to a user, but fails to cover critical aspects like required permissions, whether this is a mutating operation, potential side effects, or error conditions. This leaves significant gaps for an agent to understand the tool's behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that gets straight to the point without unnecessary words. It could be slightly improved by front-loading more critical information, but it's appropriately sized and wastes no space.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of creating an agent (a mutating operation with identity implications), no annotations, and no output schema, the description is insufficient. It doesn't explain what happens after creation, what the DID assignment entails, or how linking works, leaving the agent with incomplete context for proper tool invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description adds no additional parameter information beyond what's in the schema, such as explaining relationships between parameters or usage examples. This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Create a new agent') and the resource ('agent with identity'), specifying that it assigns a DID and can link to a user. However, it doesn't explicitly differentiate from sibling tools like 'agent_update' or 'agent_get', which keeps it from a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'agent_update' or 'agent_list', nor does it mention prerequisites or exclusions. It lacks context for selection among the many agent-related tools available.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_deleteAInspect

Delete an agent (soft delete - changes status to deleted).

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesThe ID of the agent to delete
preserveMemoriesNoWhether to archive memories before deletion
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses key behavioral traits: this is a deletion operation (mutative) and specifies it's a 'soft delete' that changes status rather than permanent removal. However, it doesn't cover important aspects like required permissions, whether the operation is reversible, what happens to related data, or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise (one sentence) and front-loaded with the core action. Every word earns its place, with the parenthetical adding crucial nuance about the deletion type. There's zero waste or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a deletion tool with no annotations and no output schema, the description is minimally adequate. It covers the core action and clarifies it's a soft delete, but lacks important context about permissions, reversibility, side effects, and what the tool returns. Given the mutative nature and 2 parameters, more completeness would be expected.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already fully documents both parameters. The description doesn't add any parameter-specific information beyond what's in the schema. The baseline of 3 is appropriate when the schema does the heavy lifting, though the description could have explained the implications of the preserveMemories parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Delete') and resource ('an agent'), and distinguishes it from siblings like agent_create, agent_get, and agent_update by specifying the deletion operation. It also adds important nuance by clarifying it's a 'soft delete' rather than permanent removal.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. While it mentions 'soft delete,' it doesn't specify when this should be used instead of other agent-related tools like agent_update for status changes or agent_context_clear for clearing agent data. No prerequisites or exclusions are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_dreamAInspect

Trigger a memory dream cycle — consolidates, deduplicates, resolves contradictions, normalizes temporal references, extracts missing facts, and manages tier promotion/demotion. Use dryRun to preview changes without applying them.

ParametersJSON Schema
NameRequiredDescriptionDefault
dryRunNoIf true, returns metrics without applying changes (default: false)
agentIdYesAgent ID to dream for
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool's actions (consolidation, deduplication, etc.) and the dryRun option for previewing changes, which clarifies it's a potentially mutative operation. However, it lacks details on side effects, error conditions, or performance characteristics like execution time or resource usage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in two sentences: the first outlines the tool's purpose and operations, and the second explains the dryRun parameter. Every sentence adds essential information with no wasted words, making it easy to scan and understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of memory management operations and lack of output schema, the description adequately covers the tool's purpose and key parameter. However, it doesn't detail the output format, potential errors, or integration with sibling tools like agent_dream_history, leaving gaps for an AI agent to fully understand execution outcomes.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the input schema already documents both parameters (dryRun and agentId) with clear descriptions. The description adds minimal value by mentioning dryRun's purpose ('preview changes without applying them'), but doesn't provide additional context beyond what the schema states, such as agentId format or dryRun implications.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool's function with specific verbs ('trigger a memory dream cycle') and details the operations performed (consolidation, deduplication, contradiction resolution, etc.). It clearly distinguishes this from sibling tools like agent_memory_query or agent_facts_contradictions by focusing on a comprehensive memory optimization cycle rather than individual operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('to preview changes without applying them' via dryRun) and implies it's for memory management cycles. However, it doesn't explicitly state when NOT to use it or name specific alternatives among the many sibling tools, such as agent_memory_clean or memory_audit, which might overlap in function.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_dream_configBInspect

Get or update dream configuration for an agent. If no update fields are provided, returns the current config.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesAgent ID
enabledNoEnable/disable dreaming
maxMemoriesPerDreamNoMax memories to process per dream (default: 500)
minTimeBetweenDreamsNoMinimum hours between dream cycles (default: 24)
contradictionStrategyNoStrategy for handling contradictions
minSessionsBetweenDreamsNoMinimum session count between dreams (default: 5)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. While it mentions the conditional behavior (get vs. update based on input), it doesn't describe critical aspects: whether updates are persistent, what permissions are needed, if there are rate limits, or what the return format looks like. For a tool that can perform mutations, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise: two sentences that efficiently convey the tool's dual functionality and conditional behavior. Every word earns its place, with no redundancy or fluff. It's front-loaded with the core purpose and follows with the key usage nuance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (dual get/update functionality with 6 parameters) and lack of both annotations and output schema, the description is minimally adequate. It explains the conditional behavior but doesn't cover mutation effects, error conditions, or return values. For a tool that can modify agent configuration, more context about behavioral implications would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 6 parameters thoroughly. The description adds no additional parameter semantics beyond what's in the schema (e.g., it doesn't explain the 'contradictionStrategy' enum values or interactions between parameters). With high schema coverage, the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the dual purpose: 'Get or update dream configuration for an agent.' It specifies the verb ('Get or update') and resource ('dream configuration for an agent'), making the purpose unambiguous. However, it doesn't explicitly differentiate from sibling tools like agent_dream or agent_dream_history, which handle dream execution and history respectively.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides implied usage guidance: 'If no update fields are provided, returns the current config.' This indicates the tool defaults to read mode when only agentId is given. However, it doesn't explicitly state when to use this tool versus alternatives (e.g., agent_update for general agent settings) or mention prerequisites like required permissions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_dream_historyBInspect

Get past dream run results for an agent, ordered by most recent first.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum results to return (default: 10)
agentIdYesAgent ID
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but offers minimal behavioral context. It mentions ordering and implies pagination via 'limit', but doesn't disclose authentication needs, rate limits, error conditions, or what 'dream run results' entail. The description doesn't contradict annotations (none exist), but fails to provide adequate transparency for a tool that retrieves historical data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose. Every word earns its place, with no redundant or vague phrasing. It's appropriately sized for a simple retrieval tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read-only tool with full schema coverage but no output schema, the description is minimally complete. It specifies what's retrieved and ordering, but lacks context on result format, pagination details, or error handling. Without annotations or output schema, more behavioral detail would improve completeness, but it's adequate for basic use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents both parameters ('agentId' and 'limit'). The description adds no additional parameter semantics beyond what's in the schema—it doesn't explain parameter interactions, format expectations for 'agentId', or constraints on 'limit'. Baseline 3 is appropriate when schema does all the work.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get past dream run results') and resource ('for an agent'), with specific ordering ('ordered by most recent first'). It distinguishes from some siblings like 'agent_dream' (which likely initiates dreams) and 'agent_dream_config' (which configures dreams), but doesn't explicitly differentiate from all potential query siblings like 'agent_benchmark_history'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. It doesn't mention prerequisites, when not to use it, or compare it to similar tools like 'agent_benchmark_history' or 'agent_facts_list' that might retrieve different historical data.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_facts_contradictionsBInspect

Find contradicting facts for a subject. Returns pairs of current facts with the same key but different values.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesAgent ID
subjectIdYesSubject to check for contradictions
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions the return format ('Returns pairs of current facts with the same key but different values'), which is helpful. However, it doesn't disclose important behavioral aspects like whether this is a read-only operation, what permissions are needed, how it handles missing data, or potential side effects. For a tool that presumably queries agent facts, this leaves significant gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise with two sentences that each earn their place. The first sentence states the purpose, the second describes the return format. There's zero wasted language or redundancy, and the most important information (what the tool does) is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (contradiction detection), no annotations, and no output schema, the description provides basic but incomplete coverage. It explains what the tool does and the return format, but lacks details about behavioral characteristics, error conditions, or usage context. For a tool with no output schema, it should ideally describe the structure of returned contradiction pairs more thoroughly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with both parameters clearly documented in the schema. The description adds no additional parameter information beyond what's already in the schema. According to scoring rules, when schema coverage is high (>80%), the baseline is 3 even with no param info in the description, which applies here.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Find contradicting facts for a subject' with the specific action 'find' and resource 'contradicting facts'. It distinguishes from siblings like 'agent_facts_create', 'agent_facts_list', and 'agent_facts_update' by focusing on contradiction detection rather than CRUD operations. However, it doesn't explicitly differentiate from 'agent_facts_search' which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, appropriate contexts, or exclusions. With many sibling tools (especially other agent_facts_* tools), this lack of comparative guidance leaves the agent uncertain about tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_facts_createAInspect

Manually create a structured fact about a subject. Use this when the human explicitly shares personal information.

ParametersJSON Schema
NameRequiredDescriptionDefault
keyYesFact key in dot notation (e.g., "name", "daughter.name", "food.preference")
valueYesThe fact value
sourceYesHow this fact was obtained
agentIdYesAgent ID
categoryYesFact category
subjectIdYesSubject the fact is about
confidenceNoConfidence in the fact accuracy (default: 0.8)
privacyLevelNoPrivacy level (default: protected). Note: "secret" is not allowed — use platform secret management.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It indicates this is a manual creation operation for structured facts, which implies data persistence and potential privacy implications. However, it doesn't disclose important behavioral traits like whether this operation is idempotent, what permissions are required, how conflicts with existing facts are handled, or what happens on success/failure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise with just two sentences that each earn their place. The first sentence states the core purpose, and the second provides crucial usage guidance. There's zero wasted language, and the information is front-loaded effectively.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a data creation tool with 8 parameters and no annotations or output schema, the description provides adequate but minimal context. It covers the purpose and usage scenario well but lacks information about behavioral consequences, error conditions, or what constitutes successful execution. Given the complexity of creating structured facts with privacy implications, more complete guidance would be beneficial.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the input schema already documents all 8 parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema. It provides general context about when to use the tool but no additional parameter semantics, earning the baseline score for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('manually create a structured fact') and the resource ('about a subject'), distinguishing it from sibling tools like agent_facts_list, agent_facts_search, and agent_facts_update. It provides a precise verb+resource combination that makes the tool's purpose immediately understandable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool: 'when the human explicitly shares personal information.' This provides clear contextual guidance that helps the agent distinguish this manual creation tool from automated or inferred fact-creation alternatives that might exist in the system.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_facts_listAInspect

Get current facts for a subject, optionally filtered by category. Returns only active (non-superseded) facts.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesAgent ID that owns the facts
categoryNoOptional category filter. Omit to get all categories.
subjectIdYesSubject the facts are about (usually the human the agent serves)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses that the tool returns 'only active (non-superseded) facts,' which is a key behavioral trait not inferable from the schema alone. However, it lacks details on permissions, rate limits, error handling, or response format, leaving gaps in behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise and front-loaded, consisting of two clear sentences that directly state the tool's function and key constraint. Every word earns its place, with no redundancy or unnecessary elaboration, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (3 parameters, no output schema, no annotations), the description is adequate but incomplete. It covers the core purpose and a key behavioral trait (active facts only), but lacks details on output structure, error cases, or integration with sibling tools, leaving some contextual gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all parameters. The description adds minimal value by mentioning 'optionally filtered by category,' which aligns with the schema's 'category' parameter but does not provide additional semantics beyond what the schema already specifies. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get current facts for a subject, optionally filtered by category.' It specifies the verb ('Get'), resource ('facts'), and scope ('current' and 'active'), but does not explicitly differentiate it from sibling tools like 'agent_facts_search' or 'agent_facts_contradictions', which prevents a score of 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by mentioning optional filtering by category and that it returns only active facts, but it does not provide explicit guidance on when to use this tool versus alternatives like 'agent_facts_search' or 'agent_facts_contradictions'. No prerequisites or exclusions are stated, leaving room for ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_facts_updateAInspect

Correct a fact by superseding it with a new value. The old fact is preserved in the timeline with a valid_until timestamp.

ParametersJSON Schema
NameRequiredDescriptionDefault
factIdYesID of the fact to supersede
agentIdYesAgent ID
newValueYesThe corrected value
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses key behavioral traits: it's a mutation ('correct', 'superseding'), preserves history ('old fact is preserved in the timeline'), and adds a timestamp ('valid_until'). However, it lacks details on permissions, side effects, error conditions, or response format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action ('Correct a fact by superseding it with a new value') and follows with important behavioral context. Every word earns its place with zero waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is moderately complete for a mutation tool: it explains the action and historical preservation. However, it lacks details on permissions, error handling, or return values, leaving gaps in understanding the tool's full behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters (agentId, factId, newValue). The description adds no additional meaning beyond what the schema provides (e.g., clarifying 'newValue' as 'corrected value' is redundant with schema). Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Correct a fact by superseding it with a new value') and identifies the resource ('fact'), distinguishing it from siblings like 'agent_facts_create' (creates new facts) and 'agent_facts_list' (lists facts). It precisely defines the operation beyond just the tool name.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives (e.g., 'agent_facts_create' for new facts, 'agent_update' for general agent updates) or any prerequisites. It implies usage for correcting facts but lacks explicit context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_getBInspect

Get agent details by ID including identity information and current status.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesThe ID of the agent
includeBondNoAlias for includeLink; include ownership bond/credential information
includeLinkNoWhether to include ownership link/credential information
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It states the tool retrieves details, implying a read-only operation, but doesn't disclose behavioral traits such as authentication requirements, rate limits, error handling, or what happens if the agent ID is invalid. The description is minimal and lacks context beyond the basic purpose.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose ('Get agent details by ID') and adds supplementary information ('including identity information and current status'). There is no wasted language, and it's appropriately sized for a simple retrieval tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (a simple read operation), 100% schema coverage, and no output schema, the description is minimally adequate. It covers the purpose but lacks behavioral context and usage guidelines. For a tool with no annotations, it should do more to explain authentication, errors, or output format, making it incomplete for full agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all three parameters (agentId, includeBond, includeLink). The description doesn't add any parameter-specific details beyond what's in the schema, such as explaining the relationship between includeBond and includeLink or providing examples. Baseline 3 is appropriate since the schema handles parameter documentation adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Get') and resource ('agent details by ID'), specifying it includes 'identity information and current status.' It distinguishes from sibling 'agent_list' (which likely lists multiple agents) by focusing on a single agent via ID. However, it doesn't explicitly differentiate from other agent-specific tools like 'agent_context_get' or 'agent_update,' which slightly reduces clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention siblings like 'agent_list' for listing agents or 'agent_context_get' for context details, nor does it specify prerequisites or exclusions. Usage is implied only by the description's focus on ID-based retrieval, but no explicit guidelines are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_handoff_initiateCInspect

Initiate a context handoff. Creates a handoff package with summary, key learnings, and progressive links for resuming later.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesThe ID of the agent
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the tool 'creates a handoff package' but doesn't specify whether this is a read-only or mutative operation, what permissions are required, how the package is stored or accessed, or any rate limits. The description is minimal and lacks critical behavioral details for safe invocation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that efficiently conveys the core action and outcome without unnecessary details. It is front-loaded with the main purpose and avoids redundancy, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of a handoff operation with no annotations and no output schema, the description is insufficient. It doesn't explain what the handoff package contains in detail, how it's used by 'agent_handoff_resume', what happens if the agentId is invalid, or what the tool returns. For a tool that likely involves state management, more context is needed for safe and effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the single parameter 'agentId' documented as 'The ID of the agent'. The description adds no additional semantic context about this parameter, such as format examples or how it relates to the handoff process. Baseline score of 3 is appropriate since the schema adequately covers the parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('initiate', 'creates') and identifies the resource ('handoff package') and its components ('summary, key learnings, and progressive links'). It distinguishes from siblings like 'agent_handoff_resume' by focusing on initiation rather than resumption, though it doesn't explicitly contrast with all related tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'agent_handoff_resume' or 'agent_context_clear', nor does it mention prerequisites or context. It implies usage for 'resuming later' but lacks explicit conditions or exclusions, leaving the agent to infer timing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_handoff_latestAInspect

Get the most recent handoff package for an agent. Use to check previous session state before resuming.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdNoAgent ID (optional if session identity set)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions retrieving a 'handoff package' and checking 'previous session state,' which implies a read-only operation, but it doesn't specify permissions, rate limits, or what happens if no handoff exists. For a tool with zero annotation coverage, this leaves significant gaps in understanding its behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loaded with the core purpose and followed by usage context. Every word serves a clear purpose without redundancy, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (retrieving session state), no annotations, and no output schema, the description is somewhat complete but lacks details. It explains the purpose and usage but doesn't cover behavioral aspects like error handling or return format. This leaves room for improvement in providing a fuller context for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the parameter 'agentId' documented as optional if session identity is set. The description doesn't add any additional meaning beyond this, such as explaining the format of 'agentId' or the implications of omitting it. With high schema coverage, a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get the most recent handoff package for an agent.' It specifies the verb ('Get') and resource ('most recent handoff package'), making it easy to understand. However, it doesn't explicitly differentiate from sibling tools like 'agent_handoff_resume' or 'agent_context_get', which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for usage: 'Use to check previous session state before resuming.' This gives a specific scenario when the tool should be used. However, it doesn't explicitly state when not to use it or name alternatives among sibling tools, such as 'agent_handoff_resume' for resuming sessions directly.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_handoff_resumeCInspect

Resume from a handoff package. Restores context and provides access to previous session information.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesThe ID of the agent
packageIdYesThe ID of the handoff package to resume from
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions restoring context and providing access to session information, but fails to detail critical aspects: whether this is a read-only or mutating operation, what permissions or authentication are required, how the resumed context integrates with the current session, or any side effects like overwriting existing data. For a tool handling session state with no annotation coverage, this is a significant gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded and efficient, using two concise sentences that directly state the tool's function without fluff. Every sentence earns its place by covering core actions. However, it could be slightly more structured by explicitly separating purpose from outcomes, keeping it from a perfect 5.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity in managing session handoffs, the absence of annotations and output schema, and the description's lack of behavioral details, it is incomplete. It doesn't explain what 'restores context' entails operationally, what 'access to previous session information' includes, or the return format. For a tool with no structured safety or output guidance, the description should provide more comprehensive context to ensure safe and effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with both parameters ('agentId', 'packageId') documented in the schema. The description adds no additional meaning beyond implying these IDs are needed for resumption, but doesn't clarify their format, sourcing, or interrelationships. Since the schema does the heavy lifting, the baseline score of 3 is appropriate, though the description could have enhanced understanding with examples or context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Resume', 'Restores', 'provides access') and identifies the resource ('handoff package', 'previous session information'). It distinguishes from siblings like 'agent_handoff_initiate' and 'agent_handoff_latest' by focusing on resuming rather than initiating or fetching. However, it doesn't explicitly contrast with all sibling tools, keeping it at 4 instead of 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing an existing handoff package), exclusions, or compare it to related tools like 'agent_handoff_initiate' for starting a handoff or 'agent_context_get' for general context retrieval. This lack of usage context leaves the agent guessing about appropriate scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_listCInspect

List agents for the current tenant/user.

ParametersJSON Schema
NameRequiredDescriptionDefault
pageNoPage number for pagination
limitNoMaximum number of agents to return
statusNoFilter by agent status
ownerIdNoFilter by owner user ID
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. While 'List' implies a read operation, it doesn't disclose important behavioral traits like whether this requires specific permissions, how pagination works beyond the parameters, what the response format looks like, or any rate limits. The description is too minimal for a tool with 4 parameters.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with zero wasted words. It's appropriately sized for a list operation and front-loads the essential information. Every word earns its place in this concise formulation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 4 parameters, no annotations, and no output schema, the description is insufficiently complete. It doesn't explain what 'agents' means in this context, what fields are returned, how pagination works in practice, or any authentication requirements. For a list operation with filtering capabilities, more context is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already fully documents all 4 parameters with their types, defaults, and descriptions. The description adds no additional parameter semantics beyond what's in the schema, meeting the baseline 3 for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('List') and resource ('agents') with scope ('for the current tenant/user'), providing a specific verb+resource combination. However, it doesn't distinguish this tool from other agent-related tools like 'agent_get' or 'agent_facts_list', which would require explicit differentiation to earn a 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With multiple sibling tools like 'agent_get' (retrieve single agent), 'agent_facts_list' (list agent facts), and 'agent_create', there's no indication of when this list operation is appropriate versus those other operations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_memory_expandAInspect

Expand a memory to see more detail. Use this when a memory summary is not detailed enough.

ParametersJSON Schema
NameRequiredDescriptionDefault
levelNoLevel of expansion. Detailed provides more context, full provides complete raw content.detailed
memoryIdYesThe ID of the memory to expand
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions the tool 'expands' a memory to show more detail, implying a read operation, but doesn't disclose behavioral traits like whether it requires permissions, has rate limits, or what the output format looks like. This leaves gaps in understanding how the tool behaves beyond its basic function.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the purpose and followed by usage guidance. Every sentence earns its place without waste, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description provides basic purpose and usage but lacks details on behavioral aspects and return values. It's adequate for a simple tool with good schema coverage, but could be more complete to compensate for missing structured data.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters ('memoryId' and 'level' with enum values). The description doesn't add any meaning beyond this, such as explaining the practical difference between 'detailed' and 'full' levels. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('expand a memory') and the resource ('memory'), specifying it's for seeing more detail when a summary isn't enough. However, it doesn't explicitly differentiate from sibling tools like 'agent_memory_query' or 'agent_get', which might also retrieve memory details, so it's not fully distinguished.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use this tool ('when a memory summary is not detailed enough'), which helps the agent decide based on the level of detail needed. It doesn't mention when not to use it or name specific alternatives among siblings, so it's not fully explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_memory_queryBInspect

Query agent memories using semantic search. Returns relevant memories based on the query text.

ParametersJSON Schema
NameRequiredDescriptionDefault
tierNoLevel of detail to return. Summary saves tokens, full provides complete content.summary
limitNoMaximum number of memories to return
modelNoFilter by LLM model name.
queryYesThe search query to find relevant memories
typesNoFilter by memory types
agentIdYesThe ID of the agent to query memories for
minTrustNoMinimum trust score (0-1). Omit to include all.
platformNoFilter by runtime platform (e.g., claude_code, cursor).
sessionIdNoFilter by session/conversation ID.
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions semantic search and returning relevant memories, but lacks critical details such as how results are ranked, whether pagination is involved, error conditions, or performance characteristics. For a query tool with 9 parameters, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and front-loaded, consisting of just two sentences that directly state the tool's function and output. Every word serves a purpose with zero redundancy, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (9 parameters, no annotations, no output schema), the description is minimally adequate but incomplete. It covers the basic purpose and output type, but lacks details on result format, error handling, or behavioral nuances. For a semantic search tool, more context would be beneficial, though the high schema coverage mitigates some gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds minimal parameter semantics beyond the schema, which has 100% coverage. It mentions 'query text' and 'relevant memories', aligning with the 'query' parameter and the tool's purpose, but doesn't explain interactions between parameters or provide usage examples. With high schema coverage, the baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Query agent memories using semantic search. Returns relevant memories based on the query text.' It specifies the verb ('query'), resource ('agent memories'), and method ('semantic search'), making the function unambiguous. However, it doesn't explicitly differentiate from sibling tools like 'agent_facts_search' or 'recall', which prevents a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With many sibling tools related to memory and search (e.g., 'agent_facts_search', 'recall', 'remember'), the absence of explicit usage context or exclusions leaves the agent without clear direction for tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_memory_storeBInspect

Store a memory for an agent. Memories are persisted across sessions and can be retrieved later. Use this to save important facts, events, lessons learned, or context.

ParametersJSON Schema
NameRequiredDescriptionDefault
typeNoThe type of memory: fact (static info), event (something that happened), lesson (learned insight), context (current situation), goal (objective), task (work item)fact
modelNoLLM model name (e.g., claude-sonnet-4-20250514). Auto-detected from MCP connection if omitted.
goalIdNoOptional goal ID to bind this memory to. Goal-bound memories are retained until the goal completes.
agentIdYesThe ID of the agent to store the memory for
contentYesThe memory content to store
platformNoRuntime platform (e.g., claude_code, cursor, codex). Auto-detected from MCP connection if omitted.
sessionIdNoSession/conversation ID for grouping memories.
importanceNoImportance score from 0 to 1. Higher importance memories are retained longer.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states that memories are 'persisted across sessions' which is valuable behavioral context, but doesn't mention authentication requirements, rate limits, error conditions, or what happens on duplicate storage. The description doesn't contradict annotations since none exist.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately concise with two sentences that each serve a purpose: the first states the core function, the second provides usage examples. It's front-loaded with the primary action and wastes no words, though it could be slightly more structured with bullet points for the examples.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a write operation with 8 parameters and no annotations or output schema, the description provides basic context but lacks important details. It doesn't explain what the tool returns, error conditions, or how memories interact with the broader system. Given the complexity and lack of structured metadata, the description should do more to compensate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema already documents all 8 parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema. The baseline of 3 is appropriate when the schema does the heavy lifting, though the description could have explained parameter relationships or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Store a memory for an agent' with specific examples of what can be stored (facts, events, lessons learned, context). It distinguishes from retrieval-focused siblings like agent_memory_query but doesn't explicitly differentiate from agent_memory_expand or other memory-related tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides implied usage guidance by stating 'Memories are persisted across sessions and can be retrieved later' and listing example use cases. However, it doesn't explicitly state when to use this vs. alternatives like agent_memory_expand or other storage mechanisms, nor does it mention prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_transferCInspect

Transfer agent ownership to another user. Requires signatures from both parties.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesThe ID of the agent to transfer
toUserIdYesUser ID of the new owner
toSignatureYesSignature from new owner accepting transfer
fromSignatureYesSignature from current owner authorizing transfer
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the signature requirement, which is useful context about authentication/authorization needs. However, it doesn't describe other critical behavioral traits: whether this is a destructive/mutative operation (implied but not stated), what happens to the agent during/after transfer, error conditions, or any rate limits. For a tool that changes ownership with significant implications, this is inadequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise (one sentence) and front-loaded with the core purpose. Every word earns its place: 'Transfer agent ownership to another user' states the action, and 'Requires signatures from both parties' adds essential context. There is zero waste or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of an ownership transfer operation with no annotations and no output schema, the description is incomplete. It lacks information about what the tool returns (success/failure indicators, new owner details), error handling, side effects, or security implications. For a high-stakes mutation tool with 4 required parameters, this minimal description leaves significant gaps for an AI agent to understand proper usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all four parameters thoroughly. The description adds no additional parameter semantics beyond what's in the schema (e.g., format of signatures, how to obtain them, or relationship between parameters). The baseline of 3 is appropriate when the schema does the heavy lifting, though the description could have added value by explaining the signature workflow.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('transfer agent ownership') and resource ('to another user'), making the purpose unambiguous. It distinguishes from sibling tools like agent_create or agent_update by focusing specifically on ownership transfer. However, it doesn't explicitly differentiate from other agent-related tools that might involve ownership changes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides minimal usage guidance, mentioning only that it 'requires signatures from both parties' as a prerequisite. It offers no explicit guidance on when to use this tool versus alternatives (e.g., agent_update for other changes) or when not to use it. No sibling tool comparisons or contextual exclusions are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_updateCInspect

Update agent details.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameNoNew display name
statusNoNew status
agentIdYesThe ID of the agent to update
descriptionNoNew description
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. 'Update agent details' implies a mutation operation but doesn't disclose behavioral traits like required permissions, whether changes are reversible, rate limits, or what happens to unspecified fields. This is inadequate for a mutation tool with zero annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with zero wasted words. It's appropriately sized and front-loaded, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given this is a mutation tool with no annotations and no output schema, the description is incomplete. It doesn't explain what happens during the update, what values are returned, or error conditions. With 4 parameters and siblings that handle similar resources, more context is needed for proper agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all four parameters (agentId, name, status, description) with their types and descriptions. The description adds no additional meaning beyond what's in the schema, but the baseline is 3 when schema coverage is high.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Update agent details' clearly states the action (update) and resource (agent details), which is better than a tautology. However, it's vague about what 'details' specifically means and doesn't distinguish this tool from sibling tools like agent_create, agent_delete, or agent_get, which all operate on agents.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With siblings like agent_create, agent_delete, agent_get, and agent_list available, there's no indication of prerequisites, appropriate contexts, or exclusions for using this update function.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

assemble_contextAInspect

Assemble agent context within budget constraints. Call this at session start to get identity, personality, constitution, facts, goals, lessons, and skills content sized to fit the context window. Each component is truncated to its budget allocation.

ParametersJSON Schema
NameRequiredDescriptionDefault
presetNoBudget preset to use (overrides stored config for this call)
agentIdYesAgent ID to assemble context for
subjectIdNoSubject ID for facts lookup (default: agentId)
contextWindowSizeNoContext window size in tokens (default: 200000 for Claude)
includeComponentsNoSpecific components to include (default: all)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing key behavioral traits: it's for session initialization, operates within budget constraints, truncates components to fit context windows, and returns multiple content types. It doesn't mention authentication needs, rate limits, or whether this is idempotent, but covers the core operational behavior adequately for a tool with no annotation support.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with zero waste - first states purpose and key constraint, second explains the truncation behavior. Every word earns its place. The description is front-loaded with the core functionality and efficiently covers the essential details without unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 5 parameters, 100% schema coverage, but no annotations and no output schema, the description provides good context about the assembly process, budget constraints, and component truncation. It doesn't describe the return format or structure, which would be helpful given no output schema, but covers the operational context well for a session initialization tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 5 parameters thoroughly. The description mentions 'budget constraints' which relates to the preset parameter and contextWindowSize, and 'components' which maps to includeComponents, but doesn't add significant semantic meaning beyond what the schema provides. The baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('assemble agent context') and resource ('agent context') with precise scope ('within budget constraints'). It distinguishes from siblings like agent_context_get (which presumably retrieves without assembly) and agent_context_clear by specifying it's for session start to gather multiple components. The verb 'assemble' implies a construction/aggregation operation not present in other tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use ('Call this at session start') and what it provides ('identity, personality, constitution, facts, goals, lessons, and skills content'). However, it doesn't explicitly state when NOT to use it or name specific alternatives among the many sibling tools (e.g., agent_context_get for retrieving without assembly, or individual component tools). The guidance is helpful but lacks explicit exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

authorizeAInspect

Initiate agent-first OAuth 2.0 Device Flow (RFC 8628) to register and bond with a human without needing an API key. Returns a short user_code (e.g. "BCDF-GHJK") to display to the human. The human visits the verification_uri, enters the user_code, and approves. Once approved, subsequent MCP calls (remember, recall, etc.) use the provisional tenant automatically.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesYour display name (e.g., "code-assistant", "research-agent")
descriptionNoOptional description of what you do
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the OAuth 2.0 Device Flow process, including the return value ('short user_code'), what the human must do ('visits the verification_uri, enters the user_code, and approves'), and the system behavior after approval ('subsequent MCP calls use the provisional tenant automatically'). It doesn't mention error conditions, timeout periods, or rate limits, but provides substantial operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in three sentences: the first states the purpose and technology, the second describes the return value and human interaction, the third explains the system behavior after approval. Every sentence earns its place by providing essential information about this complex authentication flow.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex authentication tool with no annotations and no output schema, the description provides substantial context about the OAuth flow, human interaction requirements, and system behavior. It explains what happens before, during, and after the authorization process. The main gap is the lack of information about return format details (beyond mentioning 'short user_code') and potential error scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters thoroughly. The description doesn't add any additional meaning about the parameters beyond what's in the schema. It focuses on the tool's purpose and behavior rather than parameter details, which is appropriate given the comprehensive schema documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Initiate agent-first OAuth 2.0 Device Flow'), the resource involved ('to register and bond with a human'), and the technology standard ('RFC 8628'). It distinguishes this from sibling tools by focusing on authentication/authorization without API keys, unlike other tools that handle memory, goals, skills, or agent operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: 'to register and bond with a human without needing an API key' and mentions that subsequent MCP calls will use the provisional tenant. However, it doesn't explicitly state when NOT to use it or name specific alternatives among the sibling tools (e.g., when API key authentication is preferred).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

constitution_check_conflictsCInspect

Check for conflicts between constitution tiers.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesThe ID of the agent
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions 'checking' conflicts, which implies a read-only or analysis operation, but doesn't specify if it's safe (e.g., non-destructive), what the output entails (e.g., a list of conflicts, a boolean result), or any side effects like rate limits or authentication needs. This leaves key behavioral traits unclear.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence that directly states the tool's function without unnecessary words. It's front-loaded with the core action, making it easy to parse. However, it could be slightly more informative without losing conciseness, such as hinting at the output or context.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of checking conflicts (which could involve nuanced logic), the lack of annotations, and no output schema, the description is incomplete. It doesn't explain what constitutes a conflict, the format of results, or any dependencies. This makes it hard for an agent to understand the tool's full behavior and integrate it effectively into workflows.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with 'agentId' documented as 'The ID of the agent'. The description doesn't add any parameter details beyond this, such as why the agentId is needed or how it relates to constitution tiers. Since the schema already provides adequate parameter information, the baseline score of 3 is appropriate, as no extra value is added.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the action ('Check for conflicts') and the subject ('between constitution tiers'), which provides a basic purpose. However, it's vague about what 'conflicts' means (e.g., logical inconsistencies, overlapping rules, or implementation issues) and doesn't distinguish it from sibling tools like 'constitution_get_tier' or 'constitution_validate_action', which might involve similar concepts. It's not tautological but lacks specificity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description offers no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing an agentId), exclusions, or related tools like 'constitution_validate_action' that might handle validation. Without any context, users must infer usage from the name alone, which is insufficient for effective tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

constitution_getAInspect

Get the merged constitution for an agent. Returns the effective rules after combining System > User > Agent tiers.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesThe ID of the agent
includeConflictsNoWhether to include conflict resolution information
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that the tool returns merged rules from multiple tiers, which is useful behavioral context. However, it does not mention permissions, rate limits, or other operational traits like whether it's read-only or has side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the purpose and key behavior. There is no wasted text, and it directly communicates the tool's function and output.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 2 parameters, 100% schema coverage, and no output schema, the description is adequate but has gaps. It explains the merging behavior but lacks details on output format, error handling, or prerequisites. Without annotations, it should provide more operational context for a read operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters (agentId and includeConflicts). The description does not add any parameter-specific details beyond what the schema provides, such as explaining what 'conflict resolution information' entails. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Get') and resource ('merged constitution for an agent'), specifying it returns 'effective rules after combining System > User > Agent tiers.' It distinguishes from sibling tools like constitution_get_tier (which gets a specific tier) and constitution_check_conflicts (which focuses on conflicts).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when needing the merged constitution, but does not explicitly state when to use this tool versus alternatives like constitution_get_tier or constitution_check_conflicts. It provides clear context about what it returns, but lacks explicit exclusions or named alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

constitution_get_tierBInspect

Get constitution rules for a specific tier (system, user, or agent).

ParametersJSON Schema
NameRequiredDescriptionDefault
tierYesThe constitution tier to retrieve
userIdNoUser ID (required for user tier)
agentIdNoAgent ID (required for agent tier)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but only states the action without disclosing behavioral traits. It doesn't mention whether this is a read-only operation, what permissions are required, how results are formatted, or any rate limits. The description is minimal and leaves critical behavioral aspects unspecified.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It's appropriately sized for a simple retrieval tool and front-loads the essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 3 parameters, no annotations, and no output schema, the description is insufficient. It doesn't explain the relationship between tier and optional parameters (userId for user tier, agentId for agent tier), what format the constitution rules are returned in, or any error conditions. The description leaves too many contextual gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are well-documented in the schema itself. The description adds no additional parameter semantics beyond mentioning 'tier' values, which are already covered by the enum in the schema. This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Get') and resource ('constitution rules') with specific scope ('for a specific tier'). It distinguishes from sibling 'constitution_get' by specifying tier-based retrieval, but doesn't explicitly contrast with other constitution tools like 'constitution_check_conflicts' or 'constitution_list_proposals'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when needing constitution rules for a particular tier (system, user, or agent), but provides no explicit guidance on when to choose this over alternatives like 'constitution_get' or 'constitution_list_proposals'. It mentions tier specificity but lacks context about prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

constitution_list_proposalsCInspect

List pending constitution change proposals for an agent.

ParametersJSON Schema
NameRequiredDescriptionDefault
statusNoFilter by proposal status
agentIdYesThe ID of the agent
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions 'pending' proposals but does not clarify if this includes all statuses (the schema allows filtering by status), whether it's a read-only operation, what permissions are required, or how results are returned (e.g., pagination, format). This leaves significant gaps for a tool that interacts with agent data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It is front-loaded with the core action ('List pending constitution change proposals'), making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of listing proposals for an agent, the lack of annotations and output schema means the description should compensate by providing more context. It fails to explain behavioral aspects (e.g., safety, permissions), usage guidelines, or what the output entails, making it incomplete for effective agent use despite the clear schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, clearly documenting both parameters (agentId and status with enum values). The description adds no additional meaning beyond the schema, such as explaining the relationship between agentId and proposals or the implications of status filtering. With high schema coverage, a baseline score of 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'List' and the resource 'pending constitution change proposals for an agent,' making the purpose specific and understandable. However, it does not explicitly differentiate from sibling tools like 'constitution_propose_change' or 'constitution_get,' which are related but serve different functions (proposing changes vs. retrieving constitution details).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as 'constitution_get' for general constitution details or 'constitution_propose_change' for creating proposals. It lacks context on prerequisites, exclusions, or typical scenarios for listing proposals, leaving usage unclear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

constitution_propose_changeBInspect

Propose a change to the agent's constitution. Requires user approval.

ParametersJSON Schema
NameRequiredDescriptionDefault
reasonYesReason for proposing this change
agentIdYesThe ID of the agent proposing the change
proposedRulesYesThe proposed rules to add or change
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions 'requires user approval,' which is a critical behavioral trait indicating this is a proposal mechanism rather than an immediate change. However, it lacks details on what happens after proposal submission (e.g., is it queued, logged, or triggers notifications?), whether changes are reversible, or any rate limits. For a mutation tool with zero annotation coverage, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with just two sentences that directly convey the core action and a key constraint. It's front-loaded with the main purpose and wastes no words, making it easy to parse quickly. Every sentence earns its place by providing essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool that proposes changes to an agent's constitution—a potentially complex and impactful operation—the description is too minimal. With no annotations, no output schema, and only basic behavioral hints, it fails to provide sufficient context about the proposal lifecycle, approval mechanisms, or potential side effects. This leaves significant gaps for an AI agent to understand the full implications of using this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with clear documentation for agentId, proposedRules, and reason. The description doesn't add any parameter-specific information beyond what's in the schema, such as format examples or constraints. Given the high schema coverage, the baseline score of 3 is appropriate, as the schema already does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('propose a change') and the target ('agent's constitution'), making the purpose immediately understandable. It distinguishes itself from sibling tools like constitution_get or constitution_list_proposals by focusing on modification rather than retrieval. However, it doesn't specify whether this is for adding, modifying, or removing rules, which prevents a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides some usage context by mentioning 'requires user approval,' which implies this should be used when seeking to modify the constitution with oversight. However, it doesn't explicitly state when to use this tool versus alternatives like constitution_validate_action or when not to use it (e.g., for minor updates vs. major overhauls). The guidance is implied rather than explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

constitution_validate_actionCInspect

Validate whether an action is allowed by the constitution.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesThe ID of the agent
actionTypeYesType of action to validate
actionDetailsNoDetails of the action
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool validates actions but doesn't describe what happens during validation (e.g., checks permissions, returns boolean/explanation, requires specific agent roles, or has side effects like logging). This is a significant gap for a validation tool with zero annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence with zero waste. It's appropriately sized and front-loaded, directly stating the tool's purpose without unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of validating actions against a constitution, no annotations, and no output schema, the description is incomplete. It doesn't explain the validation process, return values (e.g., allowed/denied with reasons), or prerequisites (e.g., agent permissions). This leaves critical gaps for an AI agent to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters (agentId, actionType, actionDetails). The description doesn't add meaning beyond what the schema provides, such as examples of action types or details. Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Validate whether an action is allowed by the constitution.' It uses a specific verb ('validate') and resource ('action'), but doesn't explicitly differentiate from sibling tools like 'constitution_check_conflicts' or 'constitution_get', which appear related to constitution operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention sibling tools like 'constitution_check_conflicts' or explain scenarios where validation is needed versus other constitution-related operations. Usage is implied but not articulated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

context_budget_apply_presetCInspect

Apply a named budget preset to an agent.

ParametersJSON Schema
NameRequiredDescriptionDefault
presetYesPreset name
agentIdYesAgent ID
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the action ('apply') but does not explain what this entails—e.g., whether it overwrites existing settings, requires specific permissions, has side effects, or returns confirmation. This leaves critical behavioral traits unspecified for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, direct sentence with no wasted words, efficiently conveying the core action. It is appropriately sized and front-loaded, making it easy to parse without unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is incomplete. It lacks details on behavioral implications, success/failure responses, or integration with sibling tools. Given the complexity of applying budget presets to agents, more context is needed for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with clear parameter definitions and an enum for 'preset'. The description does not add any semantic details beyond the schema, such as explaining preset effects or agent selection criteria. Given the high schema coverage, a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('apply') and target ('a named budget preset to an agent'), specifying both the resource (budget preset) and recipient (agent). However, it does not explicitly differentiate this tool from sibling tools like 'context_budget_get' or 'context_budget_set', which handle budget-related operations but with different functions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as 'context_budget_set' for custom budgets or other agent configuration tools. It lacks context about prerequisites, typical scenarios, or exclusions, leaving usage decisions ambiguous.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

context_budget_getBInspect

Get the current budget configuration for an agent.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesAgent ID
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states 'Get' implies a read operation, but does not mention potential side effects, error conditions, authentication needs, or rate limits. This is inadequate for a tool with zero annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the tool's purpose without unnecessary words. It is appropriately sized and front-loaded, with zero waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple input schema (1 parameter, 100% coverage) and no output schema, the description is minimally adequate but lacks completeness. It does not explain return values or error handling, which is a gap for a tool with no annotations to guide the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the input schema fully documents the 'agentId' parameter. The description adds no additional meaning beyond what the schema provides, such as format examples or constraints, resulting in the baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get') and resource ('current budget configuration for an agent'), making the purpose unambiguous. However, it does not explicitly differentiate from sibling tools like 'context_budget_set' or 'context_budget_apply_preset', which would require a 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'context_budget_set' or other agent-related tools. It lacks context about prerequisites, such as whether the agent must exist or be accessible, leaving usage unclear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

context_budget_setCInspect

Set a custom budget configuration for an agent. Component allocations must sum to <= totalBudget.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesAgent ID
componentsYesPer-component budget allocations
totalBudgetYesTotal budget as fraction of context window (0.01-1.0, default: 0.15)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states this is a 'Set' operation (implying a write/mutation) but doesn't mention whether this requires specific permissions, whether it overwrites existing configurations, what happens if allocations exceed the total budget, or what the response looks like. For a mutation tool with zero annotation coverage, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose and includes a crucial constraint. Every word earns its place with zero waste, making it optimally concise and well-structured for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given this is a mutation tool with no annotations and no output schema, the description is incomplete. It doesn't cover behavioral aspects like permissions, side effects, error conditions, or response format. While it mentions the allocation sum constraint, it lacks context about default values, validation rules beyond the sum, or how this interacts with other agent configuration tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already fully documents all parameters (agentId, totalBudget, components). The description adds the constraint that 'component allocations must sum to <= totalBudget', which provides useful validation context beyond the schema. However, it doesn't explain the meaning or impact of these allocations, keeping it at the baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Set a custom budget configuration') and the resource ('for an agent'), making the purpose immediately understandable. However, it doesn't explicitly differentiate this tool from sibling tools like 'context_budget_apply_preset' or 'context_budget_get', which would require a 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'context_budget_apply_preset' (which applies a preset budget) or 'context_budget_get' (which retrieves budget settings). It mentions the constraint that 'component allocations must sum to <= totalBudget' but doesn't explain when custom configuration is preferred over presets or other budget-related operations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

goal_completeCInspect

Mark a goal as complete. Records success/failure for tracking.

ParametersJSON Schema
NameRequiredDescriptionDefault
notesNoCompletion notes
goalIdYesThe ID of the goal
successYesWhether the goal was successfully completed
lessonsLearnedNoLessons learned during goal execution
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the tool 'marks a goal as complete' and 'records success/failure,' implying a write/mutation operation, but lacks critical details: it doesn't specify permissions required, whether the action is reversible, what happens to in-progress goals, or error conditions. For a mutation tool with zero annotation coverage, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences with zero waste: 'Mark a goal as complete. Records success/failure for tracking.' It's front-loaded with the primary action and efficiently adds context. Every word earns its place, making it easy for an agent to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (a mutation tool with no annotations and no output schema), the description is incomplete. It lacks information on behavioral traits (e.g., side effects, error handling), usage context, and return values. While the schema covers parameters well, the overall context for safe and correct invocation is insufficient, especially for a tool that modifies data.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all four parameters (goalId, success, notes, lessonsLearned) with clear descriptions. The description adds no additional meaning beyond what's in the schema—it doesn't explain parameter interactions, formatting, or examples. This meets the baseline of 3 when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Mark a goal as complete') and the resource ('goal'), with the additional context of 'Records success/failure for tracking.' This distinguishes it from sibling tools like goal_create, goal_update_progress, or goal_get, which have different purposes. However, it doesn't explicitly differentiate from all siblings (e.g., goal_update_progress might also involve status changes), so it's not a perfect 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing an existing goal), exclusions (e.g., not for partial updates), or direct alternatives like goal_update_progress for incremental tracking. This leaves the agent without context for tool selection among similar goal-related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

goal_createCInspect

Create a new goal for an agent. Goals track progress and success rates over time.

ParametersJSON Schema
NameRequiredDescriptionDefault
titleYesTitle of the goal
agentIdYesThe ID of the agent
descriptionYesDetailed description of what needs to be accomplished
parentGoalIdNoID of parent goal if this is a sub-goal
successCriteriaNoList of criteria that define successful completion
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden but lacks behavioral details. It states the tool creates a goal but doesn't disclose permissions needed, whether creation is idempotent, error handling, or what the response contains. The mention of tracking progress adds some context but is insufficient for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with zero waste: the first states the action and resource, the second adds purpose. It's front-loaded and appropriately sized, though it could integrate usage hints more efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is incomplete. It lacks details on behavioral traits (e.g., side effects, auth), response format, and error conditions, leaving gaps for an AI agent to invoke it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are well-documented in the schema. The description adds no additional parameter semantics beyond implying goal creation, which aligns with the schema. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Create a new goal') and the resource ('for an agent'), with additional context about purpose ('Goals track progress and success rates over time'). It distinguishes from siblings like goal_complete or goal_update_progress by focusing on creation, though it doesn't explicitly contrast with them.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. While it implies usage for goal creation, it doesn't mention prerequisites (e.g., agent must exist), exclusions, or comparisons to sibling tools like goal_update_progress for modifications.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

goal_find_similarCInspect

Find similar past goals to learn from previous attempts.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of similar goals to return
agentIdYesThe ID of the agent
goalDescriptionYesDescription of the goal to find similar past goals for
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool finds similar goals but doesn't explain how similarity is determined, what 'learn from previous attempts' entails, whether this is a read-only operation, or any performance or error-handling traits. This leaves significant gaps for a tool with behavioral implications.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose without unnecessary words. It's appropriately sized for the tool's complexity, making every word count and avoiding redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of annotations and output schema, the description is incomplete. It doesn't cover behavioral aspects like how similarity is computed, what the return format includes, or error conditions. For a tool that likely involves algorithmic matching and learning intent, more context is needed to guide the agent effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, clearly documenting all three parameters. The description doesn't add any semantic details beyond the schema, such as explaining the 'goalDescription' format or how 'agentId' influences results. With high schema coverage, a baseline score of 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Find') and resource ('similar past goals'), and it provides the intent ('to learn from previous attempts'). However, it doesn't explicitly differentiate this tool from potential sibling tools like 'goal_list' or 'goal_get', which might also involve goal retrieval, so it doesn't reach the highest score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description offers no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, exclusions, or compare it to sibling tools like 'goal_list' or 'goal_get', leaving the agent to infer usage context without explicit direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

goal_getCInspect

Get goal details including progress and success metrics.

ParametersJSON Schema
NameRequiredDescriptionDefault
goalIdYesThe ID of the goal
includeMetricsNoWhether to include historical success metrics
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It states this is a read operation ('Get'), implying it's non-destructive, but doesn't disclose behavioral traits like authentication needs, rate limits, error conditions, or what happens if the goalId is invalid. For a tool with no annotation coverage, this leaves significant gaps in understanding how it behaves.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action ('Get goal details') and adds specifics ('including progress and success metrics'). There's no wasted text, and it's appropriately sized for a simple retrieval tool, though it could be slightly more structured with usage hints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (2 parameters, no nested objects) and high schema coverage (100%), the description is somewhat complete but lacks output information (no output schema) and behavioral context. It covers the basic purpose but doesn't compensate for missing annotations or provide enough guidance for effective use, making it adequate but with clear gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with both parameters (goalId and includeMetrics) well-documented in the schema. The description adds minimal value beyond the schema, as it mentions 'progress and success metrics' which loosely relates to includeMetrics but doesn't explain parameter interactions or usage. With high schema coverage, the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Get') and resource ('goal details'), specifying what information is retrieved ('including progress and success metrics'). It distinguishes this from sibling tools like goal_list (which lists goals) and goal_update_progress (which modifies progress), though it doesn't explicitly name these alternatives. The purpose is specific but could be more differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing a goalId), when not to use it (e.g., for listing goals), or refer to sibling tools like goal_list or goal_find_similar. Usage is implied by the action 'Get goal details,' but no explicit context is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

goal_get_success_rateCInspect

Get the overall success rate for an agent's goals.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesThe ID of the agent
endDateNoEnd of period to calculate success rate
startDateNoStart of period to calculate success rate
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool retrieves data ('Get'), implying a read-only operation, but doesn't clarify if it requires authentication, has rate limits, or what the output format looks like (e.g., percentage, raw counts). For a tool with no annotations, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose without unnecessary details. It uses minimal words to convey the essential function, making it easy to parse and understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of calculating success rates and the lack of annotations and output schema, the description is incomplete. It doesn't explain what 'success rate' means (e.g., based on goal completion, progress updates), how it's derived, or what the return value includes. This leaves critical contextual gaps for effective tool use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with clear documentation for 'agentId', 'endDate', and 'startDate'. The description adds no additional parameter semantics beyond what the schema provides, such as explaining how success rate is calculated or default time periods. Given the high schema coverage, a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get the overall success rate for an agent's goals.' It specifies the verb ('Get'), resource ('success rate'), and scope ('for an agent's goals'), making the function unambiguous. However, it doesn't differentiate from sibling tools like 'goal_get' or 'skill_get_effectiveness', which might retrieve related but different metrics.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, such as requiring an existing agent, or compare it to siblings like 'goal_list' or 'goal_get', which might offer overlapping functionality. This leaves the agent without context for tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

goal_listCInspect

List goals for an agent with optional filters.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of goals to return
statusNoFilter by goal status
agentIdYesThe ID of the agent
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It states it's a list operation with filtering, which implies read-only behavior, but doesn't disclose important traits like pagination (implied by 'limit' parameter), authentication needs, rate limits, error conditions, or what the output looks like. For a tool with no annotation coverage, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose ('List goals for an agent') and adds qualifying information ('with optional filters'). There's zero waste or redundancy, making it appropriately concise for a straightforward list tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete for a tool with three parameters. It doesn't explain the return format (e.g., list structure, fields included), error handling, or how filtering interacts with the 'limit' parameter. For a list tool with filtering capabilities, more context about behavior and output is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all three parameters (agentId, limit, status) with descriptions and enum values. The description adds no additional parameter semantics beyond mentioning 'optional filters', which the schema already covers. This meets the baseline of 3 when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('List') and resource ('goals for an agent'), making the purpose understandable. However, it doesn't differentiate this tool from potential sibling tools like 'goal_get' or 'goal_find_similar', which might also retrieve goal information in different ways. The description is specific about the action but lacks sibling distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides minimal guidance with 'optional filters' hinting at when to use parameters, but it doesn't explain when to choose this tool over alternatives like 'goal_get' (for a single goal) or 'goal_find_similar' (for similarity-based retrieval). There's no mention of prerequisites, exclusions, or comparison to sibling tools, leaving usage context vague.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

goal_update_progressCInspect

Update the progress of a goal.

ParametersJSON Schema
NameRequiredDescriptionDefault
notesNoOptional notes about progress update
goalIdYesThe ID of the goal
progressYesProgress percentage (0-100)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states 'Update' which implies a mutation, but fails to mention permissions, side effects, or what happens to existing goal data. This is inadequate for a mutation tool with zero annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, direct sentence with no wasted words, making it easy to parse. It's appropriately sized for the tool's apparent complexity and front-loads the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is insufficient. It doesn't cover behavioral aspects like error conditions, response format, or how progress updates interact with other goal operations, leaving significant gaps in understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the input schema fully documents parameters like 'goalId' and 'progress'. The description adds no additional meaning beyond what's in the schema, such as explaining progress units or update constraints, meeting the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Update') and resource ('progress of a goal'), making the purpose immediately understandable. However, it doesn't differentiate from sibling tools like 'goal_complete' or 'goal_get', which could also involve goal progress manipulation or retrieval, leaving room for ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives such as 'goal_complete' or 'goal_get'. The description lacks context about prerequisites, timing, or exclusions, leaving the agent to infer usage based on the tool name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_adaptersAInspect

List all available memory platform adapters and their capabilities (import, export, sync support, limitations).

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but only states what the tool returns, not behavioral traits. It doesn't disclose whether this is a read-only operation, potential performance characteristics, authentication requirements, rate limits, or error conditions. The description is purely functional without behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that efficiently communicates the tool's purpose. It's front-loaded with the main action ('List all available memory platform adapters') followed by clarifying details about what information is included. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a zero-parameter tool with no output schema, the description adequately explains what the tool returns. However, without annotations or output schema, it lacks details about return format, structure, or potential limitations. The description is complete enough for basic understanding but leaves operational details unspecified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has zero parameters with 100% schema description coverage, so no parameter documentation is needed. The description appropriately focuses on what the tool does rather than parameter details, earning a baseline score of 4 for this dimension.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('List') and resource ('all available memory platform adapters'), specifying the scope of what will be listed. It distinguishes from sibling tools like memory_import, memory_export, and memory_sync by focusing on adapter metadata rather than performing operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by mentioning 'capabilities (import, export, sync support, limitations)', suggesting this tool helps determine which adapters support specific operations. However, it doesn't explicitly state when to use this versus alternatives like memory_source_list or provide clear exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_auditCInspect

Scan your memories for safety issues. Returns a health summary with trust score distribution and flagged count.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesYour agent ID
autoFixNoIf true, quarantines flagged memories automatically. Default: false.
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions 'Returns a health summary with trust score distribution and flagged count,' which gives some output context, but lacks critical behavioral details: whether this is a read-only operation, if it modifies data (especially with autoFix parameter), performance characteristics, or error handling. For a safety scanning tool, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise—just two sentences that directly state the purpose and output. Every word earns its place with zero redundancy. It's front-loaded with the core function and efficiently communicates essential information without unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of memory safety scanning and lack of annotations/output schema, the description is incomplete. It doesn't cover behavioral aspects like side effects, permissions needed, or what 'safety issues' entail. The output mention is vague ('health summary'), and there's no context about when this tool should be invoked relative to other memory operations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are fully documented in the schema. The description doesn't add any parameter-specific information beyond what's in the schema (e.g., it doesn't explain what 'autoFix' entails beyond quarantining). This meets the baseline for high schema coverage but doesn't provide extra semantic value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Scan your memories for safety issues.' It specifies the action (scan) and resource (memories) with a safety focus. However, it doesn't explicitly differentiate from sibling memory tools like memory_clean or memory_ingest, which prevents a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With many sibling tools related to memory management (e.g., memory_clean, memory_query), there's no indication of context, prerequisites, or comparisons. This leaves the agent guessing about appropriate usage scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_cleanAInspect

Clean up old or low-quality memories. Preview first (dry run), then confirm to delete.

ParametersJSON Schema
NameRequiredDescriptionDefault
typesNoOnly clean memories of these types
agentIdYesYour agent ID
confirmNoSet to true to actually delete. False (default) for dry-run preview.
minTrustNoDelete memories with trust score below this threshold
maxAgeDaysNoDelete memories older than this many days
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It reveals the tool's destructive nature ('delete') and the dry-run safety mechanism, which are crucial behavioral traits. However, it doesn't mention permission requirements, rate limits, what constitutes 'low-quality,' or how deletions affect related data, leaving gaps in behavioral understanding.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise and well-structured in two sentences. The first sentence states the core purpose, and the second provides essential usage guidance. Every word earns its place with zero redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive tool with 5 parameters and no annotations or output schema, the description is minimally adequate. It covers the basic purpose and safety workflow but lacks details on what 'clean up' entails operationally, how deletions are performed, or what the preview output looks like. Given the complexity and absence of structured behavioral data, more context would be beneficial.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all 5 parameters. The description adds minimal parameter semantics beyond the schema—it only clarifies that 'confirm' controls actual deletion versus dry-run preview. This matches the baseline expectation when schema coverage is complete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Clean up old or low-quality memories.' It specifies the action (clean up) and target (memories) with qualifying criteria (old or low-quality). However, it doesn't explicitly differentiate from sibling tools like 'agent_context_clear' or 'memory_audit' that might also involve cleanup operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage guidance: 'Preview first (dry run), then confirm to delete.' This establishes a recommended workflow and explains the purpose of the 'confirm' parameter. It doesn't explicitly mention when not to use this tool or name alternatives among siblings, but the workflow guidance is practical and helpful.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_exportCInspect

Export AIS memories to an external platform. Translates memories to the target format and pushes them.

ParametersJSON Schema
NameRequiredDescriptionDefault
sinceNoISO timestamp — only export memories created after this date
typesNoMemory types to export (e.g., ["fact", "lesson"])
agentIdYesAgent ID to export memories from
platformYesPlatform adapter name (e.g., "mem0")
credentialsYesPlatform credentials
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but is minimal. It mentions translation and pushing, but doesn't disclose critical behaviors: whether this is a one-time or recurring export, if it overwrites existing data on the platform, authentication requirements beyond credentials, rate limits, error handling, or what 'pushes them' entails (e.g., immediate vs. batched). The description is too vague for a mutation tool with external integration.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose. It avoids redundancy but could be more structured by separating key actions (translate, push). Every word earns its place, though it's slightly terse for a complex export operation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 5 parameters, no annotations, and no output schema, the description is incomplete. It lacks details on behavioral traits (e.g., side effects, idempotency), error cases, and what success looks like (e.g., confirmation message, exported count). Given the complexity of exporting to external platforms, more context is needed to guide effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are well-documented in the schema. The description adds no additional meaning about parameters beyond implying export scope (e.g., 'memories' relates to agentId, types). It doesn't clarify parameter interactions or provide examples (e.g., valid platform values). Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Export AIS memories to an external platform') and the resource ('memories'), with additional detail about translation and pushing. It distinguishes from siblings like memory_import (which imports) and memory_audit/clean (which don't export), but doesn't explicitly differentiate from memory_sync, which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives like memory_sync or memory_adapters. The description implies usage for exporting memories, but lacks context about prerequisites (e.g., needing platform credentials), timing (e.g., after memory creation), or exclusions (e.g., not for real-time syncing).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_importBInspect

Import memories from an external platform (e.g., Mem0, Claude) into AIS. Connects to the platform, fetches memories, deduplicates, and stores with provenance.

ParametersJSON Schema
NameRequiredDescriptionDefault
sinceNoISO timestamp — only import memories created after this date
agentIdYesAgent ID to import memories for
platformYesPlatform adapter name (e.g., "mem0", "claude_code")
credentialsYesPlatform credentials (apiKey, filePath, etc.)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses key behavioral traits: connecting to an external platform, fetching memories, deduplicating, and storing with provenance. However, it lacks details on permissions needed, rate limits, error handling, or what 'provenance' entails, which are important for a tool that handles external data import.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that efficiently covers the tool's purpose and key steps. It's front-loaded with the main action and avoids unnecessary details, though it could be slightly more concise by omitting the parenthetical example if not critical.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is moderately complete for a tool with 4 parameters and complex behavior (external integration, deduplication). It covers the high-level process but lacks specifics on output format, error cases, or integration details, which could hinder an agent's ability to use it effectively without trial and error.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all parameters. The description adds no additional parameter semantics beyond what's in the schema, such as explaining platform-specific credential requirements or the impact of the 'since' parameter on deduplication. This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool imports memories from external platforms into AIS, specifying the action (import), resource (memories), and target system (AIS). It distinguishes from siblings like memory_export (exporting) and memory_ingest (general ingestion), but doesn't explicitly differentiate from memory_sync or memory_adapters, which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when importing from platforms like Mem0 or Claude, but doesn't provide explicit guidance on when to use this versus alternatives such as memory_ingest or memory_sync. No exclusions or prerequisites are mentioned, leaving the agent to infer context from the tool name and description alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_ingestAInspect

Ingest a single memory from an external source. The memory passes through the trust & safety pipeline before being accepted or quarantined.

ParametersJSON Schema
NameRequiredDescriptionDefault
typeNoMemory type classificationfact
agentIdYesAgent ID
contentYesMemory content
validAtNoISO timestamp when the fact became true
sourceIdYesSource ID depositing the memory
importanceNoImportance score
sourceMemoryIdNoOriginal ID in the source system
originalCreatedAtNoISO timestamp when created in source
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It adds valuable context about the trust & safety pipeline and the possibility of quarantine, which goes beyond basic parameter semantics. However, it doesn't cover aspects like rate limits, authentication needs, or error handling, keeping it from a perfect score.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise with two sentences that each earn their place: the first states the core action, and the second adds critical behavioral context. There's zero waste or redundancy, and it's front-loaded with the essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 8 parameters, no annotations, and no output schema, the description provides adequate but incomplete context. It covers the core action and safety pipeline, but doesn't explain return values, error conditions, or prerequisites. Given the complexity, it should do more to be fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 8 parameters thoroughly. The description doesn't add any additional meaning about parameters beyond what the schema provides, such as explaining relationships between fields or usage patterns. This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'ingest' and resource 'a single memory from an external source', making the purpose specific and understandable. However, it doesn't explicitly differentiate from sibling tools like 'memory_import' or 'memory_ingest_batch', which would be needed for a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'memory_import' or 'memory_ingest_batch' from the sibling list. It mentions the trust & safety pipeline, but this is behavioral context rather than usage guidance. No explicit when/when-not instructions are present.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_ingest_batchAInspect

Ingest multiple memories from an external source in a single batch (max 100). Each memory passes through the trust & safety pipeline independently.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesAgent ID
memoriesYesArray of memories to ingest
sourceIdYesSource ID depositing the memories
syncCursorNoOpaque cursor for incremental sync
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses the batch size limit (max 100) and that each memory goes through trust & safety independently, which are useful behavioral traits. However, it doesn't mention authentication needs, rate limits, whether this is a write operation, or what happens on failure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with zero waste. The first sentence covers purpose, scope, and constraints. The second adds important behavioral context about trust & safety processing. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a write operation with 4 parameters and no annotations or output schema, the description is adequate but has gaps. It covers the batch nature and safety processing but doesn't explain what 'ingest' means operationally, what the tool returns, or error handling for a mutation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the baseline is 3. The description doesn't add meaningful parameter semantics beyond what's already in the schema descriptions (e.g., what 'agentId' or 'sourceId' represent in context). It mentions 'external source' which relates to 'sourceId' but doesn't elaborate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('ingest multiple memories'), resource ('from an external source'), and scope ('in a single batch, max 100'). It distinguishes from the sibling 'memory_ingest' by specifying batch capability and external source focus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (batch ingestion from external sources with up to 100 items). However, it doesn't explicitly state when NOT to use it or name alternatives like 'memory_ingest' for single items or 'memory_import' for different source types.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_source_approveAInspect

Approve a pending memory source. Issues a MemorySourceAuthorizationCredential (W3C VC) granting the source permission to deposit memories.

ParametersJSON Schema
NameRequiredDescriptionDefault
userIdYesUser ID of the human approving this source
agentIdYesAgent ID
sourceIdYesSource ID to approve
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that the tool issues a credential and grants permission, which indicates a write/mutation operation. However, it lacks details on behavioral traits such as required permissions, whether the action is reversible, rate limits, or error conditions, which are critical for a tool that authorizes access.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core action in the first sentence and adds necessary detail in the second. It is appropriately sized with zero waste, efficiently conveying purpose and outcome without redundancy or unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete for a mutation tool. It explains what the tool does but lacks details on behavioral aspects like authorization requirements, side effects, or return values. However, it covers the basic purpose and outcome, making it minimally adequate but with clear gaps in context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters (userId, agentId, sourceId) with clear descriptions. The description does not add any additional meaning or context beyond what the schema provides, such as explaining relationships between parameters or usage examples, meeting the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Approve a pending memory source') and the resource ('memory source'), distinguishing it from sibling tools like 'memory_source_register' or 'memory_source_revoke'. It also specifies the outcome ('Issues a MemorySourceAuthorizationCredential (W3C VC) granting permission to deposit memories'), making the purpose explicit and distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by mentioning 'pending memory source', suggesting it should be used after registration but before revocation. However, it does not explicitly state when to use this tool versus alternatives like 'memory_source_register' or 'memory_source_revoke', nor does it provide exclusions or prerequisites, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_source_listCInspect

List all registered memory sources for an agent, optionally filtered by status.

ParametersJSON Schema
NameRequiredDescriptionDefault
statusNoFilter by status. Omit to list all.
agentIdYesAgent ID
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool lists memory sources with optional filtering, but does not cover critical aspects like whether this is a read-only operation, potential rate limits, authentication needs, or what the output format looks like. This is a significant gap for a tool with no annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose and includes the optional filtering detail. There is no wasted wording, making it appropriately sized and well-structured for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has no annotations and no output schema, the description is incomplete. It lacks details on behavioral traits, output format, and usage context, which are essential for an agent to effectively invoke this tool. The description does not compensate for the missing structured data, leaving significant gaps in understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters ('agentId' and 'status') with descriptions and enum values. The description adds minimal value by mentioning optional filtering by status, but does not provide additional semantic context beyond what the schema offers, aligning with the baseline score for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('List') and resource ('all registered memory sources for an agent'), making the purpose specific and understandable. However, it does not explicitly differentiate this tool from sibling tools like 'memory_audit' or 'memory_ingest', which might also involve memory sources, so it misses full sibling differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as other memory-related tools in the sibling list. It mentions optional filtering by status but does not specify contexts, prerequisites, or exclusions for usage, leaving the agent without clear selection criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_source_registerAInspect

Register a new external memory source (e.g., ChatGPT, Claude Code) for this agent. Returns a DID and API key. Source starts in "pending" status and must be approved before it can deposit memories.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesHuman-readable name for the source (e.g., "My ChatGPT")
agentIdYesAgent ID to register the source for
platformYesPlatform identifier
allowedTypesNoMemory types this source is allowed to submit. Defaults to [fact, event, lesson, context].
rateLimitPerHourNoMaximum ingests per hour. Default: 100.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It does reveal important behavioral traits: that registration returns a DID and API key, and that sources start in 'pending' status requiring approval. However, it doesn't mention permission requirements, whether this is a mutating operation, potential side effects, or error conditions. For a registration tool with no annotation coverage, this leaves significant gaps in behavioral understanding.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise at two sentences with zero wasted words. The first sentence states the core purpose and return values, while the second provides crucial workflow information about the pending status. Every sentence earns its place and the information is front-loaded effectively.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (registration with approval workflow), lack of annotations, and no output schema, the description provides a basic but incomplete picture. It covers the purpose and initial workflow state but doesn't explain what happens after approval, how the returned DID and API key should be used, error scenarios, or relationship to other memory operations. For a registration tool in a memory management system, more contextual information would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema already documents all 5 parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema. It mentions the concept of 'external memory source' which relates to the platform parameter, but doesn't provide additional semantic context about parameter interactions or usage patterns. The baseline of 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Register a new external memory source'), identifies the resource ('for this agent'), and distinguishes from siblings by specifying it's about memory source registration rather than approval, listing, or revocation. It provides concrete examples of what constitutes an external memory source (e.g., ChatGPT, Claude Code).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context about when to use this tool: when registering a new external memory source. It mentions the 'pending' status and approval requirement, which gives important workflow context. However, it doesn't explicitly state when NOT to use it or name specific alternatives among the sibling tools (like memory_source_approve or memory_source_list).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_source_revokeAInspect

Revoke a memory source authorization. Revokes the VC and blocks the source. Optionally deletes all memories deposited by this source.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesAgent ID
sourceIdYesSource ID to revoke
deleteMemoriesNoIf true, cascade-delete all memories from this source. Default: false.
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses that the tool revokes authorization and blocks the source, and mentions optional memory deletion. However, it lacks critical behavioral details: whether this action is reversible, what permissions are required, if there are rate limits, what happens to dependent data, or what the response looks like. For a destructive tool with zero annotation coverage, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with just two sentences that directly state the tool's purpose and key optional behavior. Every word earns its place with no redundancy or unnecessary elaboration. The structure is front-loaded with the main action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a destructive tool with no annotations and no output schema, the description provides basic purpose and parameter context but lacks important completeness elements. It doesn't explain what 'revoking VC' means, what 'blocks the source' entails operationally, whether there are confirmation steps, or what the tool returns. The description is adequate but has clear gaps for a tool that performs authorization revocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description adds value by explaining the behavioral consequence of the deleteMemories parameter ('deletes all memories deposited by this source'), which goes beyond the schema's technical description. However, it doesn't provide additional context for agentId or sourceId parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('revoke', 'blocks', 'deletes') and identifies the resource ('memory source authorization'). It distinguishes itself from siblings like memory_source_approve and memory_source_list by focusing on revocation rather than approval or listing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when needing to revoke a memory source, but provides no explicit guidance on when to use this tool versus alternatives like agent_delete or privacy_revoke_consent. It mentions an optional parameter but doesn't explain when to set deleteMemories to true versus false.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_syncBInspect

Incremental sync with an external platform. Uses stored cursor to fetch only new memories since last sync.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesAgent ID
platformYesPlatform adapter name
credentialsYesPlatform credentials
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions the tool uses a stored cursor for incremental fetching, which is useful behavioral context. However, it lacks details on permissions needed, rate limits, error handling, whether it's idempotent, or what happens if credentials are invalid—critical for a sync operation with external platforms.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose ('incremental sync') and adds key behavioral detail ('uses stored cursor'). There is no wasted verbiage, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description is incomplete for a tool that performs external synchronization. It misses details on what 'memories' entail, the sync outcome (e.g., success/failure indicators), error scenarios, or how the cursor is managed. For a 3-parameter tool with complex credentials, this leaves significant gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters (agentId, platform, credentials). The description adds no additional meaning about parameters beyond implying 'platform' refers to an external platform adapter and 'credentials' are for authentication. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs an 'incremental sync' with an external platform for 'memories', specifying it fetches only new items using a stored cursor. It distinguishes from siblings like memory_import or memory_ingest by focusing on delta updates, but doesn't explicitly contrast with all memory-related tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when needing to fetch only new memories since a previous sync, suggesting a recurring synchronization context. However, it doesn't explicitly state when to use this versus alternatives like memory_import (full import) or memory_ingest (one-time ingestion), nor does it mention prerequisites like having an existing cursor.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

observeAInspect

Automatically capture conversation content as a memory without manual curation. Unlike "remember", you do not need to decide the type or importance — the system classifies the content and extracts structured facts automatically. Use this to passively record what happened during a session turn.

ParametersJSON Schema
NameRequiredDescriptionDefault
hintNoOptional type hint if you know what kind of content this is. Omit to let the system classify.
goalIdNoOptional goal ID to bind this memory to. Goal-bound memories are retained until goal completion.
agentIdNoYour agent ID (optional if session identity is set)
contentYesThe conversation content, observation, or event to capture (raw text — no curation needed)
sessionIdNoSession/conversation ID for grouping captured memories.
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses key behavioral traits: the tool automatically classifies content and extracts structured facts, and it's for passive recording. However, it doesn't mention potential side effects like storage limits, error conditions, or what happens if classification fails. For a tool with no annotations, this leaves some behavioral aspects unclear.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence, followed by differentiation from 'remember' and usage instructions. Every sentence adds value: the first defines the tool, the second contrasts with siblings, and the third provides context. It's efficiently structured with zero waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (5 parameters, no output schema, no annotations), the description is fairly complete. It explains the purpose, usage, and automation features well. However, without annotations or output schema, it could benefit from more details on behavioral aspects like error handling or memory retention. It's adequate but has minor gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 5 parameters thoroughly. The description mentions 'content' implicitly but doesn't add meaning beyond what the schema provides for any parameters. It states 'no curation needed' for content, which slightly clarifies usage but doesn't enhance parameter semantics significantly. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Automatically capture conversation content as a memory without manual curation.' It specifies the verb ('capture'), resource ('conversation content'), and distinguishes it from the sibling tool 'remember' by explaining the automation aspect. The description is specific and avoids tautology.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'Use this to passively record what happened during a session turn.' It distinguishes it from 'remember' by stating 'Unlike "remember", you do not need to decide the type or importance — the system classifies the content and extracts structured facts automatically.' This gives clear context and alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

privacy_check_accessCInspect

Check if a requesting agent can access specific facts. Does not log the check.

ParametersJSON Schema
NameRequiredDescriptionDefault
factIdsYesFact IDs to check access for
targetAgentIdYesAgent whose data is requested
relationshipTypeYesRelationship between requesting and target agent
requestingAgentIdYesAgent requesting access
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the tool 'Does not log the check,' which is useful non-functional context, but lacks details on permissions required, rate limits, side effects (e.g., whether it's read-only or has other impacts), or response format. For a privacy/access-check tool with zero annotation coverage, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and front-loaded: two sentences with zero waste. The first sentence states the core purpose, and the second adds a key behavioral note. Every word earns its place, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (privacy/access checking with 4 required parameters) and lack of annotations and output schema, the description is incomplete. It doesn't explain what the tool returns (e.g., access granted/denied, reasons), error handling, or integration with other privacy tools. For a tool in this domain, more context is needed to use it effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all four parameters. The description adds no additional parameter semantics beyond implying the tool checks 'access' for 'facts,' which is already covered by parameter names and descriptions. This meets the baseline for high schema coverage, but doesn't enhance understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Check if a requesting agent can access specific facts.' It specifies the verb ('check') and resource ('access to specific facts'), and distinguishes it from logging operations by noting 'Does not log the check.' However, it doesn't explicitly differentiate from sibling tools like 'privacy_disclosure_log' or 'privacy_grant_consent', which keeps it from a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It mentions 'Does not log the check,' which implies a contrast with logging tools, but doesn't name specific siblings or explain use cases. There's no mention of prerequisites, error conditions, or typical workflows, leaving usage context unclear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

privacy_create_presentationCInspect

Create an authorized SD-JWT presentation for selective disclosure after checking privacy policy and consent.

ParametersJSON Schema
NameRequiredDescriptionDefault
factIdYesFact ID to create presentation for
agentIdYesAgent that owns the fact being presented
derivedClaimNoFor partial disclosure: the claim to prove (e.g., "age >= 18")
disclosureLevelYesHow much to reveal: full (raw value), partial (derived claim only), existence_only (just proves the fact exists)
relationshipTypeYesRelationship between the requesting and target agent
requestingAgentIdYesAgent requesting the presentation
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It mentions authorization and policy/consent checking, but doesn't describe what happens during creation (e.g., whether this generates a token, stores data, requires specific permissions, or has rate limits). For a tool with 6 parameters and no annotation coverage, this is insufficient behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that states the core purpose upfront. It could potentially be more structured with separate clauses for prerequisites and outcomes, but it's appropriately sized with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 6 parameters, no annotations, and no output schema, the description provides basic purpose but lacks important context about what the tool actually returns, error conditions, or behavioral details. The 100% schema coverage helps, but the description alone is incomplete for a privacy-sensitive creation operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so the schema already documents all parameters thoroughly with descriptions and enum values. The description doesn't add any parameter-specific information beyond what's in the schema, making the baseline 3 appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('create an authorized SD-JWT presentation for selective disclosure') and the resource involved, with a specific purpose of privacy policy and consent checking. It doesn't explicitly differentiate from sibling tools like privacy_check_access or privacy_grant_consent, but the creation focus is distinct enough for a 4.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions checking privacy policy and consent as prerequisites, but provides no guidance on when to use this tool versus alternatives like privacy_check_access or privacy_grant_consent. There's no explicit when/when-not usage context or sibling tool comparisons.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

privacy_disclosure_logAInspect

View the disclosure audit trail for an agent. Shows who requested what data and the decision.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum entries to return (default: 50)
agentIdYesAgent ID to view audit log for
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. While it indicates this is a read-only operation ('view', 'shows'), it does not specify permissions required, rate limits, pagination behavior (beyond the implied limit parameter), or what the output format looks like. For a tool handling sensitive privacy data with zero annotation coverage, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose ('View the disclosure audit trail for an agent') and adds clarifying detail ('Shows who requested what data and the decision'). There is no wasted language, and every word contributes to understanding the tool's function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (privacy-related audit logging) and lack of annotations or output schema, the description is minimally adequate. It covers the basic purpose but omits critical details like output format, error handling, or security implications. The high schema coverage helps, but for a tool in this domain, more context on behavior and results would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with clear descriptions for both parameters (agentId and limit). The description does not add any additional meaning beyond what the schema provides, such as explaining the audit trail structure or decision outcomes. With high schema coverage, the baseline score of 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('view', 'shows') and resources ('disclosure audit trail for an agent'), including what information is displayed ('who requested what data and the decision'). It distinguishes itself from sibling tools like privacy_check_access or privacy_list_grants by focusing on audit logs rather than access checks or consent management.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by specifying it's for viewing an audit trail for an agent, suggesting it should be used when monitoring data disclosure activities. However, it lacks explicit guidance on when to use this versus alternatives like privacy_check_access (which checks access) or privacy_list_grants (which lists consents), and does not mention prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

privacy_list_grantsCInspect

List active (non-revoked, non-expired) consent grants for an agent.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesAgent ID to list grants for
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It implies a read-only operation by using 'List', but doesn't disclose behavioral traits like authentication requirements, rate limits, pagination, error handling, or what constitutes 'active' beyond the non-revoked/expired criteria. The description is minimal and lacks context on how the tool behaves in practice.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the key information: action, resource, and scope. There's no wasted verbiage, and it directly communicates the tool's purpose without unnecessary details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of annotations and output schema, the description is incomplete for a tool that likely returns sensitive privacy data. It doesn't explain the return format (e.g., list structure, fields like grant IDs or permissions), error cases, or security implications. For a privacy-related tool with no structured support, more context is needed to ensure safe and effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with the single parameter 'agentId' fully documented in the schema as 'Agent ID to list grants for'. The description doesn't add any meaning beyond this, such as format examples or validation rules. With high schema coverage, the baseline score of 3 is appropriate as the description doesn't compensate but also doesn't detract.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('List') and resource ('active consent grants for an agent'), specifying the scope as 'non-revoked, non-expired'. It distinguishes from sibling tools like privacy_grant_consent (create) and privacy_revoke_consent (revoke), but doesn't explicitly contrast with privacy_check_access or privacy_disclosure_log, which have different purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like privacy_check_access or privacy_disclosure_log. It doesn't mention prerequisites, such as needing the agent to exist or having appropriate permissions, nor does it suggest scenarios where listing active grants is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recallAInspect

Search your memories by meaning (semantic search). Finds relevant memories from any session.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMax results to return (default: 10)
queryYesWhat to search for (natural language)
agentIdNoYour agent ID
minTrustNoMinimum trust score (0-1). Omit to include all.
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions the search method ('semantic search') and scope ('from any session'), but lacks details on permissions, rate limits, response format, or potential side effects. For a search tool with no annotation coverage, this leaves significant gaps in understanding its behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded and efficient, consisting of two concise sentences that directly state the tool's function and scope without any wasted words. Every sentence earns its place by providing essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (semantic search across sessions) and no annotations or output schema, the description is adequate but incomplete. It covers the purpose and scope but lacks details on behavioral traits, return values, or error handling, which are important for a search operation in this context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all parameters (query, limit, agentId, minTrust). The description adds no additional parameter semantics beyond what's in the schema, such as examples or usage tips. Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Search') and resource ('memories'), and distinguishes it from siblings by specifying 'by meaning (semantic search)' and 'from any session', which differentiates it from tools like agent_memory_query or memory_audit that might have different scopes or methods.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('Search your memories by meaning'), implying it's for semantic rather than exact-match searches. However, it does not explicitly state when not to use it or name alternatives among the many sibling tools, such as agent_facts_search or memory_ingest, which could serve similar purposes.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

registerAInspect

Self-register as a new agent. Creates your identity (DID), sets up free-tier memory, and generates a claim link your human can use to link with you. No API key or human approval needed.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesYour display name (e.g., "code-assistant", "research-agent")
descriptionNoOptional description of what you do
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: it's a creation/mutation tool (implied by 'Creates'), sets up free-tier resources, and generates a claim link. However, it lacks details on error conditions, rate limits, or what happens on repeated registration, leaving some gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence, followed by specific outcomes. Every sentence adds value: the first defines the action, the second lists results, and the third clarifies usage conditions. There is no wasted text, making it highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no annotations and no output schema, the description does well by explaining the tool's purpose, usage, and key behaviors. It covers the essential 'what' and 'when,' though it could improve by detailing output format or error handling. Given the complexity (a self-registration mutation), it's mostly complete but not exhaustive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters ('name' and 'description'). The description does not add any additional semantic context about the parameters beyond what the schema provides, such as formatting constraints or examples for the 'name' field. Baseline 3 is appropriate when the schema handles parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Self-register as a new agent') and enumerates the concrete outcomes: creating a DID identity, setting up free-tier memory, and generating a claim link. It distinguishes itself from siblings like 'agent_create' by emphasizing self-service registration without API keys or human approval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool ('No API key or human approval needed'), which differentiates it from tools like 'agent_create' that might require such prerequisites. It also implies this is for initial setup by mentioning 'your human can use to link with you,' providing clear context for usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rememberBInspect

Store a memory. Persists across sessions and tools. Memories are organized into 3 tiers (active, session, long-term) automatically.

ParametersJSON Schema
NameRequiredDescriptionDefault
typeNoType of memory (default: fact)fact
modelNoLLM model name (auto-detected from MCP connection if omitted)
agentIdNoYour agent ID
contentYesThe memory content to store
platformNoRuntime platform (auto-detected from MCP connection if omitted)
sessionIdNoSession/conversation ID
importanceNoImportance score 0-1 (default: 0.5)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses key behavioral traits: persistence across sessions/tools and automatic tier organization. However, it doesn't mention important aspects like whether this operation is idempotent, what happens on duplicate content, performance characteristics, or error conditions. The description adds value but leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise - just two sentences that efficiently convey the core functionality. Every word earns its place: first sentence states the primary action and key feature (persistence), second sentence adds crucial context about tier organization. No wasted words or redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 7 parameters, no annotations, and no output schema, the description is minimal but covers essential aspects. It explains what the tool does and mentions persistence and tier organization, but doesn't address return values, error handling, or how it differs from similar memory tools. Given the complexity and lack of structured metadata, more context would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 7 parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema. It mentions memory tiers but doesn't explain how parameters like 'importance' or 'type' relate to tier assignment. Baseline 3 is appropriate when schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Store a memory' with the key characteristic of persistence 'across sessions and tools'. It specifies the verb ('store') and resource ('memory'), but doesn't explicitly differentiate from sibling tools like 'agent_memory_store' or 'recall', which appear to be related memory operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. While it mentions memory organization into tiers, it doesn't specify when to choose 'remember' over other memory-related tools like 'agent_memory_store', 'memory_ingest', or 'recall'. There's no mention of prerequisites, constraints, or typical use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

session_startAInspect

Call this FIRST at the start of every session. Automatically recommends and loads the best skills for your current work context. Returns your identity, loaded skills, and assembled skill content — all in one call. Report the loaded skills in one line so your human knows what capabilities are active.

ParametersJSON Schema
NameRequiredDescriptionDefault
formatNoOutput format for your IDE/runtime (default: claude-code)claude-code
agentIdYesYour agent ID (from register or whoami)
contextNoWhat you are working on — branch name, task description, or user request
workTypeNoType of work (default: dev). Infer from context if unsure.dev
tokenBudgetNoMaximum token budget for loaded skills (default: 8000)
toolAvailabilityNoTools available in the current runtime (e.g., ["git", "grep", "read"]). Skills with requiredTools will only be recommended if those tools are available.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool's behavior: it's a session initialization tool that automatically recommends and loads skills based on context, returns multiple pieces of information in one call, and includes a reporting requirement. It doesn't mention error handling, performance characteristics, or side effects, but covers the core behavioral aspects well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly structured and concise. The first sentence establishes the primary directive, the second explains the core functionality, the third details the return values, and the fourth provides a specific reporting instruction. Every sentence earns its place with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 6-parameter tool with no annotations and no output schema, the description does well by explaining the tool's purpose, timing, behavior, and reporting requirements. However, it doesn't describe the format or structure of the returned data (beyond listing what's included), which would be helpful given the absence of an output schema. The description is mostly complete but could benefit from more detail about the response format.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so the baseline is 3. The description doesn't add any parameter-specific information beyond what's already documented in the schema. It mentions 'current work context' which relates to the 'context' parameter but doesn't provide additional semantic context beyond the schema's documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('call this FIRST', 'automatically recommends and loads the best skills') and distinguishes it from siblings by emphasizing it's the session initialization tool. It explicitly mentions what it returns ('identity, loaded skills, and assembled skill content') and provides a reporting instruction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: 'Call this FIRST at the start of every session' establishes clear timing. While it doesn't name specific alternatives, the 'FIRST' directive and context of session initialization provide strong implicit guidance about when to use this versus other tools in the extensive sibling list.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

skill_generate_lessonsBInspect

Generate lesson memories from skill effectiveness data. Useful for session handoff — creates lesson-type memories summarizing skill performance so future sessions can self-reason about past effectiveness.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesThe ID of the agent
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions that the tool 'creates lesson-type memories,' implying a write operation, but doesn't specify permissions needed, whether it's idempotent, rate limits, or what happens to existing data. The description adds some context about session handoff and future reasoning, but lacks critical behavioral details for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded, with two sentences that directly explain the tool's purpose and usage. There's no wasted text, and it efficiently communicates key information. However, it could be slightly more structured by separating purpose from context more clearly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (a mutation tool with no annotations and no output schema), the description is moderately complete. It explains what the tool does and its high-level purpose, but lacks details on behavioral traits, output format, error handling, or integration with sibling tools. This leaves gaps for an AI agent to use it correctly in varied scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for its single parameter ('agentId'), so the schema already documents it adequately. The description doesn't add any parameter-specific information beyond what's in the schema, such as format examples or constraints. With high schema coverage, the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Generate lesson memories from skill effectiveness data.' It specifies the verb ('generate'), resource ('lesson memories'), and source data ('skill effectiveness data'). However, it doesn't explicitly differentiate from sibling tools like 'skill_get_effectiveness' or 'memory_ingest', which could handle similar data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides some implied usage context: 'Useful for session handoff — creates lesson-type memories summarizing skill performance so future sessions can self-reason about past effectiveness.' This suggests it's for creating summarized memories from effectiveness data, but it doesn't explicitly state when to use this tool versus alternatives like 'memory_store' or 'skill_record_usage', nor does it mention any prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

skill_get_effectivenessCInspect

Get effectiveness metrics for a specific skill.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesThe ID of the agent
endDateNoEnd of period for effectiveness calculation
skillIdYesThe ID of the skill
startDateNoStart of period for effectiveness calculation
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states it 'gets' metrics, implying a read-only operation, but doesn't clarify what 'effectiveness metrics' include, whether there are rate limits, authentication requirements, or how data is returned (e.g., format, pagination). This leaves significant gaps for an agent to understand the tool's behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose without unnecessary words. Every part of the sentence earns its place by specifying the action, resource, and target, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of retrieving metrics (which often involves data aggregation and interpretation), lack of annotations, and no output schema, the description is insufficient. It doesn't explain what 'effectiveness metrics' entail, potential constraints, or return format, leaving the agent with incomplete context for proper use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with all parameters well-documented in the schema (e.g., agentId, skillId, startDate, endDate). The description adds no additional parameter semantics beyond implying metrics are calculated for a period (via startDate and endDate), but this is already clear from the schema. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get') and resource ('effectiveness metrics for a specific skill'), making the purpose immediately understandable. However, it doesn't differentiate from sibling tools like 'skill_get_ranking' or 'skill_list', which also retrieve skill-related data, so it doesn't achieve full sibling distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. There are multiple sibling tools related to skills (e.g., skill_get_ranking, skill_list, skill_recommend), but the description doesn't mention any context, prerequisites, or exclusions for selecting this specific tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

skill_get_rankingCInspect

Get skills ranked by effectiveness for an agent.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of skills to return
agentIdYesThe ID of the agent
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states what the tool does but doesn't explain how it works—such as what 'effectiveness' means, how ranking is determined, whether results are cached, or what format the output takes. This leaves significant gaps for a tool that presumably returns ranked data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose without unnecessary words. It earns its place by clearly stating the tool's function in minimal terms.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of ranking skills by effectiveness, the lack of annotations, and no output schema, the description is insufficient. It doesn't explain the ranking methodology, output format, or behavioral traits, leaving the agent with incomplete context for proper use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents both parameters (agentId and limit). The description doesn't add any parameter-specific details beyond what's in the schema, such as clarifying what 'effectiveness' entails or how the limit applies. This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get skills ranked') and the resource ('by effectiveness for an agent'), providing a specific verb+resource combination. However, it doesn't explicitly differentiate from sibling tools like 'skill_get_effectiveness' or 'skill_recommend', which appear related but have different purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'skill_get_effectiveness' or 'skill_recommend'. There's no mention of prerequisites, context, or exclusions, leaving the agent to infer usage from the tool name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

skill_listCInspect

List skills for an agent with temperature and effectiveness info.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesThe ID of the agent
temperatureNoFilter by skill temperature
includeEffectivenessNoWhether to include effectiveness metrics
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the tool lists skills with 'temperature and effectiveness info', which implies a read-only operation, but doesn't clarify permissions, rate limits, pagination, or error handling. For a tool with no annotations, this is insufficient to ensure safe and effective use.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action ('List skills') and key details. There's no wasted verbiage, making it easy to parse. However, it could be slightly more structured by explicitly separating purpose from optional features.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (3 parameters, no output schema, no annotations), the description is minimally adequate. It covers the basic purpose but lacks guidance on usage, behavioral traits, and output format. Without annotations or an output schema, the agent must infer too much, leaving gaps in understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all parameters (agentId, temperature, includeEffectiveness). The description adds minimal value beyond the schema by hinting at 'temperature and effectiveness info', which loosely relates to parameters but doesn't provide additional syntax or usage details. This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('List') and resource ('skills for an agent'), making the purpose understandable. It specifies what information is included ('temperature and effectiveness info'), which helps distinguish it from generic list operations. However, it doesn't explicitly differentiate from sibling tools like 'skill_get_effectiveness' or 'skill_get_ranking', which prevents a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing an agent ID), exclusions, or comparisons to sibling tools like 'skill_list' vs. 'skill_get_effectiveness'. This leaves the agent with minimal context for tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

skill_loadCInspect

Load a skill for an agent. Skills provide specific capabilities and are tracked for effectiveness.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesThe ID of the agent
skillIdYesThe ID of the skill to load
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions that skills are 'tracked for effectiveness,' hinting at monitoring or logging behavior, but fails to disclose critical traits: whether loading is idempotent, if it requires specific permissions, potential side effects (e.g., memory usage), error conditions, or what happens post-load (e.g., immediate availability). For a mutation tool with zero annotation coverage, this leaves significant gaps in understanding its behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded with the core action ('Load a skill for an agent'), followed by a clarifying sentence about skills. Both sentences earn their place by defining the tool and providing context, with no wasted words. It could be slightly more structured by separating usage notes, but it's efficient overall.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (a mutation operation with no annotations and no output schema), the description is incomplete. It lacks details on behavioral traits, error handling, return values, or prerequisites. While it states the purpose, it doesn't provide enough context for safe and effective use, especially compared to siblings like 'skill_unload' which might have similar gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with both parameters ('agentId' and 'skillId') clearly documented in the schema. The description adds no additional meaning beyond the schema, such as format examples, relationship between agent and skill, or where to obtain IDs. Baseline 3 is appropriate when the schema does the heavy lifting, but no extra value is added.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Load a skill') and target ('for an agent'), with a brief explanation of what skills provide ('specific capabilities and are tracked for effectiveness'). It distinguishes from siblings like 'skill_unload' (opposite action) and 'skill_list' (listing vs. loading), though not explicitly. However, it doesn't specify how loading differs from just activating or if it's a prerequisite for usage, keeping it from a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives is provided. It doesn't mention prerequisites (e.g., whether the agent or skill must exist), when not to use it (e.g., if already loaded), or direct alternatives like 'skill_loader_recommend' for selection help. The context is implied from the action but lacks operational clarity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

skill_loader_explainBInspect

Get the scoring breakdown and reasons for a previous skill recommendation session.

ParametersJSON Schema
NameRequiredDescriptionDefault
sessionIdYesSession ID from a previous skill_loader_recommend call
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool retrieves data ('Get'), implying a read-only operation, but doesn't clarify if it requires authentication, has rate limits, returns structured or unstructured data, or handles errors. For a tool with no annotation coverage, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence that efficiently conveys the tool's purpose without unnecessary words. It's front-loaded with the core function and includes essential context (reference to 'skill_loader_recommend'). Every part of the sentence earns its place, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has one parameter with full schema coverage but no annotations or output schema, the description is minimally adequate. It explains what the tool does and references the required input, but doesn't cover behavioral aspects like authentication needs, error handling, or output format. For a simple retrieval tool, this is acceptable but leaves room for improvement in transparency.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, with the single parameter 'sessionId' fully documented in the schema as 'Session ID from a previous skill_loader_recommend call.' The description doesn't add any additional semantic context beyond this, such as format examples or validation rules. With high schema coverage, the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get the scoring breakdown and reasons for a previous skill recommendation session.' It specifies the verb ('Get') and resource ('scoring breakdown and reasons'), making the function unambiguous. However, it doesn't explicitly differentiate from sibling tools like 'skill_loader_recommend' or 'skill_get_effectiveness' beyond the implied connection to 'skill_loader_recommend' via the sessionId parameter.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by referencing 'a previous skill_loader_recommend call,' suggesting this tool should be used after that specific sibling. However, it doesn't provide explicit guidance on when to use this versus alternatives like 'skill_get_effectiveness' or 'skill_recommend,' nor does it mention any exclusions or prerequisites beyond the sessionId requirement.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

skill_loader_recommendAInspect

Get intelligent skill recommendations for a work context. Uses multi-factor scoring (task relevance, phase match, effectiveness, temperature, tool availability, profile bias, trust) to select an optimal skill loadout within a token budget.

ParametersJSON Schema
NameRequiredDescriptionDefault
phaseNoCurrent phase (e.g., coding, testing, review)
agentIdYesThe ID of the agent
taskTextNoDescription of the task to match skills against
workTypeYesType of work the agent is performing
tokenBudgetNoMaximum token budget for loaded skills (default 8000)
defaultProfileNoDefault skill profile to bias towards
toolAvailabilityNoTools available in the current runtime
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses behavioral traits such as using 'multi-factor scoring' and selecting within a 'token budget,' which hints at optimization and resource constraints. However, it does not detail permissions, rate limits, or what happens if no skills fit the budget, leaving gaps in behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, starting with the core purpose and key mechanisms in a single sentence. Every sentence earns its place by explaining the scoring factors and token budget constraint, though it could be slightly more streamlined by integrating the factors list more smoothly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of 7 parameters, no annotations, and no output schema, the description is moderately complete. It covers the purpose and scoring logic but lacks details on output format, error handling, or prerequisites. For a tool with this parameter count and no structured safety hints, it should provide more behavioral guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 7 parameters thoroughly. The description adds value by explaining the scoring factors (e.g., 'task relevance, phase match'), which relate to parameters like taskText and phase, but does not provide additional syntax or format details beyond what the schema offers.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Get') and resource ('skill recommendations'), specifying it's for a 'work context' and uses 'multi-factor scoring' to select 'an optimal skill loadout within a token budget.' It distinguishes itself from siblings like skill_recommend by focusing on loading recommendations rather than general recommendations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage in a work context with a token budget, but does not explicitly state when to use this tool versus alternatives like skill_recommend or skill_load. It mentions factors like 'task relevance' and 'phase match,' suggesting context-dependent use, but lacks clear exclusions or named alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

skill_loader_resolveAInspect

Load skill content for selected skills and assemble into a context block for a specific IDE/runtime. Call after skill_loader_recommend to get the actual skill content.

ParametersJSON Schema
NameRequiredDescriptionDefault
formatNoOutput format for target IDE (default: claude-code). claude-code=CLAUDE.md, cursor-rules=.mdc, codex=AGENTS.md, antigravity=instructions.md, json=structured, markdown=legacy aliasclaude-code
agentIdYesThe ID of the agent
skillIdsYesSkill IDs to load (from recommend results)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. While it mentions the tool loads and assembles content, it doesn't describe what 'loading' entails (e.g., fetching from storage, parsing), what 'assembling' means (e.g., formatting, concatenation), whether this is a read-only operation, what permissions might be required, or what happens if skillIds are invalid. The description is insufficient for a tool that presumably accesses and processes skill data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise - two sentences that directly state the tool's purpose and usage guideline with zero wasted words. It's front-loaded with the core functionality and follows with the workflow instruction.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 3 parameters, 100% schema coverage, but no annotations and no output schema, the description is minimally adequate. It covers the basic purpose and workflow but lacks crucial behavioral context about how the tool actually works, what it returns, and potential side effects. Given the complexity implied by 'loading' and 'assembling' skill content, more detail would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description doesn't add any meaningful parameter semantics beyond what's in the schema - it mentions 'selected skills' which maps to skillIds and 'specific IDE/runtime' which maps to format, but provides no additional context about parameter usage, constraints, or interactions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Load skill content for selected skills and assemble into a context block for a specific IDE/runtime.' It specifies both the action (load and assemble) and the resource (skill content), but doesn't explicitly differentiate it from its closest sibling 'skill_loader_resolve_multi' or other skill-related tools like 'skill_load' or 'skill_list'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage guidance: 'Call after skill_loader_recommend to get the actual skill content.' This establishes a prerequisite workflow relationship. However, it doesn't specify when NOT to use this tool or mention alternatives like 'skill_loader_resolve_multi' or 'skill_load'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

skill_loader_resolve_multiAInspect

Load skill content and assemble into multiple IDE formats at once. Returns a map of format → assembled content. Useful for multi-runtime swarms.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesThe ID of the agent
formatsYesOutput formats to generate (e.g., ["claude-code", "codex", "cursor-rules"])
skillIdsYesSkill IDs to load (from recommend results)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. While it mentions the tool loads and assembles content, it doesn't describe important behaviors like whether this is a read-only operation, what permissions are needed, whether it's idempotent, rate limits, or error conditions. The description is insufficient for a mutation-like operation with zero annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise with two sentences that each earn their place. The first sentence states the core functionality, and the second provides valuable context about when it's useful. There's zero wasted language or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that this appears to be a content generation/assembly tool with no annotations and no output schema, the description is incomplete. It doesn't explain what the assembled content looks like, whether there are size limits, what happens if skillIds are invalid, or how errors are handled. For a tool that presumably creates output in multiple formats, more behavioral context is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description doesn't add any meaningful parameter semantics beyond what's in the schema - it doesn't explain relationships between parameters, provide examples of valid skillIds, or clarify format selection strategies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Load skill content and assemble into multiple IDE formats at once') and distinguishes it from siblings like 'skill_loader_resolve' (single format) and 'skill_load' (no assembly). It explicitly mentions the multi-format output capability.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('Useful for multi-runtime swarms'), indicating it's for generating multiple output formats simultaneously. However, it doesn't explicitly state when NOT to use it or name specific alternatives like 'skill_loader_resolve' for single-format needs.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

skill_recommendCInspect

Get skill recommendations for a task based on past effectiveness.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of skills to recommend
agentIdYesThe ID of the agent
taskDescriptionYesDescription of the task to recommend skills for
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool 'Get skill recommendations' but does not describe how recommendations are generated, whether they are personalized or general, if there are rate limits, authentication needs, or what the output format looks like. This leaves significant gaps for a tool that likely involves data processing and recommendations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence: 'Get skill recommendations for a task based on past effectiveness.' It is front-loaded with the core purpose, has zero wasted words, and is appropriately sized for the tool's complexity. Every part of the sentence earns its place by conveying essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (involving recommendations based on past data), lack of annotations, and no output schema, the description is incomplete. It does not explain the behavioral aspects, output format, or how recommendations are derived, which are critical for an AI agent to use the tool effectively. The description alone is insufficient for full contextual understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with clear documentation for 'agentId', 'taskDescription', and 'limit'. The description adds no additional semantic context beyond what the schema provides, such as examples or usage nuances. With high schema coverage, the baseline score of 3 is appropriate, as the description does not compensate but also does not detract.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get skill recommendations for a task based on past effectiveness.' It specifies the verb ('Get'), resource ('skill recommendations'), and context ('for a task based on past effectiveness'). However, it does not explicitly differentiate from sibling tools like 'skill_loader_recommend' or 'skill_get_ranking', which appear related to skills, so it misses full sibling differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It lacks explicit context, exclusions, or references to sibling tools such as 'skill_loader_recommend' or 'skill_get_effectiveness', which might offer similar or complementary functionality. Usage is implied only by the purpose statement.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

skill_record_usageCInspect

Record a skill usage with success/failure outcome. This updates effectiveness metrics.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesThe ID of the agent
contextNoContext of how the skill was used
skillIdYesThe ID of the skill
successYesWhether the skill usage was successful
durationMsNoDuration of skill execution in milliseconds
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool 'updates effectiveness metrics', implying a write operation, but doesn't clarify permissions needed, whether the update is immediate or batched, or if there are side effects (e.g., triggering notifications). For a mutation tool with zero annotation coverage, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core action ('Record a skill usage') and adds clarifying purpose ('updates effectiveness metrics'). There is no wasted verbiage, and it's appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given this is a mutation tool (implied by 'updates') with no annotations and no output schema, the description is incomplete. It doesn't address behavioral aspects like error conditions, response format, or idempotency. With 5 parameters and sibling tools that might overlap (e.g., skill-related queries), more context is needed for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all 5 parameters. The description adds no parameter-specific information beyond implying that 'success' and possibly other inputs affect 'effectiveness metrics'. This meets the baseline of 3 since the schema does the heavy lifting, but the description doesn't enhance understanding of parameter interactions or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Record a skill usage') and the outcome ('with success/failure outcome'), and specifies the purpose ('updates effectiveness metrics'). It distinguishes itself from sibling tools like skill_get_effectiveness or skill_list by focusing on recording rather than querying. However, it doesn't explicitly differentiate from all siblings (e.g., skill_generate_lessons might also involve skill usage).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., whether the agent or skill must exist), nor does it suggest when not to use it (e.g., for hypothetical scenarios). With many sibling tools related to skills and agents, this lack of contextual guidance is a significant gap.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

skill_unloadCInspect

Unload a skill from an agent.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesThe ID of the agent
skillIdYesThe ID of the skill to unload
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. While 'unload' implies a mutation operation, it doesn't disclose whether this requires specific permissions, whether it's reversible, what happens to agent functionality after unloading, or any rate limits. The description is minimal and lacks behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence with zero wasted words. It's front-loaded with the core action and resource, making it immediately understandable without unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations and no output schema, the description is insufficient. It doesn't explain what 'unload' entails operationally, whether there are side effects, what the return value might be, or error conditions. Given the complexity of modifying agent capabilities, more context is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters (agentId and skillId) adequately. The description adds no additional meaning about parameters beyond what's in the schema, such as format examples or relationship between them, meeting the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('unload') and resource ('a skill from an agent'), making the purpose immediately understandable. It doesn't differentiate from sibling tools like 'skill_load' or 'skill_list', but it's specific enough to understand the basic function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'skill_load' or 'skill_delete', nor does it mention prerequisites such as whether the skill must be currently loaded. It simply states what the tool does without contextual usage information.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

trust_statusCInspect

Check your trust status — link, memory health, and safety events.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesYour agent ID
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions checking 'trust status' but doesn't explain what that entails—whether it's a read-only operation, requires authentication, has rate limits, or what the output format might be. This is a significant gap for a tool with potential security implications.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose without unnecessary details. It's appropriately sized for a simple tool, though it could be slightly more structured to include usage hints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of annotations and output schema, the description is incomplete. It doesn't explain what 'trust status' returns (e.g., a score, details on events) or behavioral aspects like error handling. For a tool checking critical aspects like safety events, more context is needed to guide the agent effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the single parameter 'agentId' documented as 'Your agent ID'. The description doesn't add any meaning beyond this, such as explaining how the agentId is used or its format. With high schema coverage, the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('check') and resources ('trust status — link, memory health, and safety events'), making it easy to understand what it does. However, it doesn't explicitly differentiate from sibling tools like 'link' or 'memory_audit', which might have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as 'agent_link_verify' or 'memory_health' related tools. It lacks explicit context, prerequisites, or exclusions, leaving the agent to infer usage based on the name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

whoamiAInspect

Check your agent identity — name, DID, status, and memory count.

ParametersJSON Schema
NameRequiredDescriptionDefault
agentIdYesYour agent ID
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It describes a read-only operation ('check') that returns identity information, which implies non-destructive behavior, but does not disclose details like authentication requirements, rate limits, or error conditions. It adds basic context but lacks comprehensive behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the purpose ('Check your agent identity') and lists specific return values. There is no wasted language, and it effectively communicates the tool's function without unnecessary detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (one required parameter) and no output schema, the description is reasonably complete for a read-only identity check. It specifies what information is returned, but could improve by mentioning the response format or any prerequisites. Without annotations, it adequately covers the basics but has minor gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the single parameter 'agentId' documented as 'Your agent ID'. The description does not add meaning beyond this, as it does not explain parameter usage or constraints. With high schema coverage, the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('check') and resources ('agent identity'), listing the exact information returned (name, DID, status, memory count). It distinguishes from sibling tools like agent_get or agent_list by focusing on identity verification rather than retrieval or listing operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context ('check your agent identity') for self-verification scenarios, but does not explicitly state when to use this tool versus alternatives like agent_get (which might retrieve similar data) or when not to use it. It provides clear intent but lacks explicit comparison or exclusion guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.

Resources