VaultCrux Memory Core

by io.github.CueCrux

Server Details

VaultCrux Memory Core — 32 tools: knowledge, decisions, constraints, signals, coverage

Status: Healthy
Last Tested: 2026-05-25 10:31
Transport: Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

B3.2/5.0

Tool DescriptionsB

Average 3.8/5 across 63 of 63 tools scored. Lowest: 2.4/5.

Server CoherenceB

Disambiguation2/5

Many tools have overlapping or unclear boundaries, causing significant ambiguity. For example, 'assess_answerability', 'assess_coverage', and 'investigate_question' all seem to evaluate question readiness, while 'check_claim' and 'verify_before_acting' both verify answers. Tools like 'get_freshness_report' and 'get_versioned_snapshot' also overlap in checking data recency, making it hard for an agent to choose the right one.

Naming Consistency4/5

Tool names mostly follow a consistent verb_noun pattern (e.g., 'assess_answerability', 'build_timeline', 'get_constraints'), with clear and descriptive naming. There are minor deviations like 'list_topics' and 'log_progress' that slightly break the pattern, but overall, the naming is predictable and readable.

Tool Count2/5

With 63 tools, the count is excessive for a memory and decision support server, leading to cognitive overload and redundancy. The high number suggests poor scoping, as many tools could be consolidated (e.g., multiple assessment and verification tools), making it difficult for agents to navigate efficiently.

Completeness5/5

The tool set provides comprehensive coverage for memory management, decision support, and operational workflows. It includes full CRUD operations (e.g., declare/update constraints, submit/retract skills), lifecycle management (e.g., from elicitation to verification), and extensive querying capabilities, leaving no obvious gaps for the domain.

Available Tools

73 tools

admit_character_kernel_entryAdmit Character Kernel EntryCInspect

Privileged curation tool for MemoryCrux v2.4 AgentCharacterKernel. Adds or supersedes an invariant expected behaviour tuple for an agent; operator-class API credentials are required by the API.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes
`rationale`	Yes
`supersedes_id`	No
`expected_behavior`	Yes
`situation_fingerprint`	Yes
`expected_sufficiency_max`	No
`expected_sufficiency_min`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description should disclose behavioral traits. It mentions the tool is privileged and requires operator-class credentials, but does not explain side effects (e.g., whether superseded entries are deleted, if the operation is reversible, or what happens to existing invariants).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences and no fluff. However, it could be structured with bullet points for better readability without increasing word count significantly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 7 parameters, nested objects, and an output schema, the description is severely lacking. It does not explain the purpose of parameters, usage context, or expected behavior, making it insufficient for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It provides zero explanation of any of the 7 parameters (agent_id, situation_fingerprint, expected_behavior, rationale, etc.), leaving the agent with no understanding of how to populate the input.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool adds or supersedes invariant expected behaviour tuples for an agent, which is a specific verb and resource. It distinguishes itself from sibling tools (many are read-only or checks) by calling it a 'privileged curation tool' and explaining the action.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives (e.g., other tools that modify the character kernel). It only mentions credential requirements but does not specify prerequisites, scenarios, or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

answer_elicitationAnswer ElicitationCInspect

Submit an answer to an elicitation question. Triggers coverage re-assessment, follow-up question selection, and automatic output routing to appropriate record types (operating rhythms, constraints, decision context, operator profile). Returns the next question or session completion.

ParametersJSON Schema

Name	Required	Description	Default
`answer`	Yes
`session_id`	Yes
`question_id`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses key behavioral traits: it triggers processes (coverage re-assessment, follow-up selection, routing) and returns outcomes (next question or session completion). However, it lacks details on permissions, error handling, or side effects, which are important for a submission tool. This is adequate but has clear gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences that are front-loaded with the core action and efficiently detail the triggers and returns. Every phrase adds value without redundancy, making it appropriately sized and well-structured for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (submission with triggers and routing), no annotations, and an output schema (which handles return values), the description is partially complete. It covers the purpose and outcomes but lacks parameter semantics and detailed behavioral context, making it minimally viable but insufficient for full understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate by explaining parameters. It mentions 'session_id', 'question_id', and 'answer' implicitly but provides no semantic details about what these parameters represent, their formats, or how they relate to the elicitation process. This fails to add value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Submit an answer to an elicitation question') and resource ('elicitation question'), and it distinguishes this from siblings like 'run_elicitation' (which likely initiates rather than answers) and 'assess_answerability' (which likely evaluates rather than submits). However, it doesn't explicitly contrast with all possible alternatives, keeping it at a 4 rather than a 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions that it 'Triggers coverage re-assessment, follow-up question selection, and automatic output routing', which implies usage during an elicitation session, but it provides no explicit guidance on when to use this tool versus alternatives like 'run_elicitation' or 'assess_answerability'. There are no exclusions or prerequisites stated, resulting in minimal guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

assess_answerabilityAssess AnswerabilityAInspect

Sufficiency gate — can this question be answered with current evidence? Pass your query and optionally the fact rows you have gathered. Returns: answerable (yes/no), missing fields, contradictory fields, recommended next tool, and confidence. Use this BEFORE forcing a best-guess answer. If answerable=false, it is better to say 'insufficient evidence' than to guess wrong.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes
`candidate_rows`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses key behavioral traits: it acts as a 'sufficiency gate' to evaluate evidence, returns structured outputs (answerable status, missing/contradictory fields, next tool recommendation, confidence), and emphasizes a cautious approach ('better to say 'insufficient evidence' than to guess wrong'). However, it doesn't detail error handling, performance characteristics, or specific constraints like input validation, leaving some gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with every sentence earning its place. The first sentence defines the tool's role, the second explains parameters and returns, and the third provides usage guidance. There is no redundant or verbose language, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (evaluating evidence sufficiency) and the presence of an output schema (which covers return values), the description is mostly complete. It explains the tool's purpose, usage, and key behaviors without annotations. However, it could benefit from more detail on parameter handling or edge cases, but the output schema reduces the need for extensive return value explanation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It explains the 'query' parameter implicitly as the question to assess and mentions 'candidate_rows' as 'the fact rows you have gathered,' adding meaning beyond the schema's generic definitions. However, it doesn't specify format requirements for 'candidate_rows' or examples, which limits clarity. With two parameters and no schema descriptions, this provides moderate but incomplete compensation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs and resources: 'can this question be answered with current evidence?' and 'Returns: answerable (yes/no), missing fields, contradictory fields, recommended next tool, and confidence.' It distinguishes from siblings by specifying it's a 'sufficiency gate' used 'BEFORE forcing a best-guess answer,' which differentiates it from tools like 'answer_elicitation' or 'investigate_question' that might provide answers directly.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'Use this BEFORE forcing a best-guess answer.' It also specifies when not to use it: 'If answerable=false, it is better to say 'insufficient evidence' than to guess wrong,' indicating it should be used to avoid premature answering. This contrasts with alternatives like 'derive_from_facts' or 'check_claim' that might involve more direct analysis.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

assess_character_driftAssess Character DriftCInspect

Run the on-demand AgentCharacterKernel distribution-layer sweep for an agent and return emitted drift events for the requested window.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes
`window_end`	No
`window_start`	No
`min_decision_count`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full responsibility for disclosing behavioral traits. It only states the tool runs a sweep and returns drift events, but does not describe side effects, authorization needs, rate limits, or whether the operation is read-only. This lack of transparency could lead to unexpected behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, which is concise and front-loads the action. However, it lacks important details that could be included without sacrificing brevity. Several sentences might be needed to cover necessary context, making the current version slightly under-specified.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has multiple parameters, no annotations, and an output schema (not shown), the description is insufficient. It does not explain return values beyond mentioning 'drift events', nor does it cover parameter semantics or behavioral context. The description is too brief for the complexity of the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 4 parameters, but the description does not reference any of them. Schema description coverage is 0%, meaning the description should explain parameter meaning beyond the schema, but it fails to do so. For example, it does not clarify that 'window_start' and 'window_end' define the time range or that 'min_decision_count' filters results.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action: run a sweep and return drift events for an agent. The verb 'assess' and the specific resource 'CharacterDrift' are explicit. However, it does not distinguish this tool from sibling tools like 'assess_coverage' or 'get_character_kernel', and the term 'drift events' is not further explained, leaving some ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It does not mention prerequisites, exclusions, or typical use cases. Without this, an agent may misuse the tool or select a less suitable sibling.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

assess_coverageAssess CoverageAInspect

Question-scoped readiness check. Given a task description, returns what the system knows and doesn't know: artefact counts by domain, freshness stats, and knowledge gaps. Use BEFORE answering to decide if you should search more or commit. If coverage is thin on the question's topic, search with different terms before answering. Addresses 'do I have enough evidence to answer this?'

ParametersJSON Schema

Name	Required	Description	Default
`domains`	No
`action_types`	No
`task_description`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It describes the tool's purpose and output (knowledge assessment), but lacks details on permissions, rate limits, or error handling. It does add context about its role in decision-making ('decide if you should search more or commit'), which is useful but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, followed by usage guidelines. Every sentence adds value: the first defines the tool, the second specifies when to use it, and the third provides actionable follow-up advice. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (assessing knowledge coverage) and the presence of an output schema, the description is reasonably complete. It explains the tool's role in workflow decisions and what it returns, though it could benefit from more detail on parameter usage. The output schema likely covers return values, reducing the need for that in the description.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate for undocumented parameters. It only mentions 'task_description' implicitly ('Given a task description'), but does not explain the purpose or usage of 'domains' or 'action_types' parameters. This leaves two of three parameters without semantic clarification.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs a 'readiness check' that returns what the system knows and doesn't know about a task, including artefact counts, freshness stats, and knowledge gaps. It specifies the verb ('assess') and resource ('coverage'), though it doesn't explicitly differentiate from sibling tools like 'assess_answerability' or 'get_knowledge_gaps'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'Use BEFORE answering to decide if you should search more or commit.' It also offers an alternative action: 'If coverage is thin on the question's topic, search with different terms before answering.' This directly addresses when to use it versus other actions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

assess_delegation_readinessAssess Delegation ReadinessAInspect

Multi-dimensional delegation readiness assessment. Checks operator profile completeness, operating rhythm coverage, constraint coverage, knowledge coverage, and decision framework coverage relative to the task. Returns overall readiness signal (ready/likely_ready/needs_work/not_ready), dimension-level gaps, and prioritised recommended actions. This is the pre-flight check for delegation, not for action (that's verify_before_acting).

ParametersJSON Schema

Name	Required	Description	Default
`operator_id`	No
`task_description`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses the tool's purpose (assessment) and output structure (readiness signal, gaps, actions), but lacks details on permissions, rate limits, side effects, or error conditions. The description adds value but doesn't fully compensate for the absence of annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with every sentence earning its place. It starts with the core purpose, details the assessment dimensions, specifies the return values, and concludes with usage guidance, all without wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multi-dimensional assessment) and the presence of an output schema (which covers return values), the description is reasonably complete. It explains the tool's role, what it assesses, and what it returns, though it could benefit from more behavioral details given the lack of annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It mentions 'operator profile' and 'task' which map to the two parameters, but doesn't explain their semantics, formats, or constraints beyond what's implied. The description adds some context but doesn't fully document the parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('assess', 'checks', 'returns') and resources ('delegation readiness', 'operator profile', 'task'). It distinguishes from sibling 'verify_before_acting' by specifying this is a 'pre-flight check for delegation, not for action'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool ('pre-flight check for delegation') and when not to use it ('not for action (that's verify_before_acting)'), providing clear alternatives and context for usage relative to sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

build_timelineBuild TimelineAInspect

Deterministic timeline constructor for temporal reasoning. Finds all dated events matching your query, normalizes dates, and returns them sorted chronologically. Use for 'what order', 'before/after', 'earliest/latest' questions. Returns unresolved events (found but no date) separately.

ParametersJSON Schema

Name	Required	Description	Default
`as_of`	No
`query`	Yes
`relation`	No
`anchor_event`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It describes key behaviors: deterministic construction, date normalization, chronological sorting, and separate handling of unresolved events. However, it doesn't mention performance characteristics, error handling, data sources, or authentication requirements, leaving some behavioral aspects unclear for a tool with 4 parameters.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with three sentences that each add distinct value: purpose statement, usage guidelines, and output behavior. There's no wasted language, and the most important information (what the tool does) comes first, followed by when to use it and what it returns.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, temporal reasoning functionality) and the presence of an output schema (which means return values are documented elsewhere), the description provides good context about purpose and usage. However, the complete lack of parameter explanation and minimal behavioral details beyond the core functionality leaves some gaps in understanding how to effectively use all tool capabilities.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage for all 4 parameters, the description provides no information about what the parameters mean or how they should be used. While the description mentions 'query' generally, it doesn't explain the 'as_of', 'relation', or 'anchor_event' parameters at all. The description fails to compensate for the complete lack of schema documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('finds', 'normalizes', 'returns') and resources ('dated events'), distinguishing it from all sibling tools which focus on reasoning, assessment, or system operations rather than timeline construction. It explicitly defines what the tool does: finds events matching a query, normalizes dates, and returns them sorted chronologically.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'for "what order", "before/after", "earliest/latest" questions.' This gives clear use cases and distinguishes it from sibling tools that handle different types of queries like checking claims, assessing answerability, or getting context. The guidance is specific and actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_claimCheck ClaimAInspect

Verify a proposed answer against memory before committing to it. Pass your candidate answer as claim_text. Returns supporting and contradicting evidence with confidence scores. Use as a pre-answer gate: if contradicting evidence exists or support is weak, investigate further before answering.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`agent_id`	No
`claim_text`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`matches`	Yes
`verdict`	Yes
`confidence`	Yes

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses key behavioral traits: the tool returns 'supporting and contradicting evidence with confidence scores' and serves as a 'pre-answer gate'. However, it lacks details on error handling, performance characteristics, or authentication needs, leaving gaps for a tool with significant responsibility.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly front-loaded and concise: three sentences with zero waste. Each sentence earns its place by defining purpose, parameters, and usage guidelines efficiently without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (verification with evidence scoring), no annotations, and an output schema (which covers return values), the description is reasonably complete. It explains the tool's role, key parameters, and usage context well, though more behavioral details would enhance completeness for a critical verification tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It explains 'claim_text' as 'your candidate answer', adding semantic meaning. However, it doesn't mention 'limit' or 'agent_id' parameters at all, leaving two of three parameters undocumented. The partial coverage results in a baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('verify', 'check') and resource ('proposed answer against memory'), distinguishing it from siblings like 'query_memory' or 'investigate_question' by focusing on verification before commitment. It explicitly mentions the core function of returning evidence with confidence scores.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidelines: 'Use as a pre-answer gate' with conditions ('if contradicting evidence exists or support is weak, investigate further before answering'). This clearly defines when to use this tool versus alternatives like direct answering or other verification tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_constraintsCheck ConstraintsCInspect

Check an action against all active constraints. Returns matched constraints, match types (structural/semantic), and a combined verdict (pass/warn/block).

ParametersJSON Schema

Name	Required	Description	Default
`team_id`	No
`metadata`	No
`target_resources`	No
`action_description`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions the tool returns 'matched constraints, match types, and a combined verdict', which gives some insight into output behavior. However, it lacks critical details: whether this is a read-only operation, if it has side effects (e.g., logging), performance characteristics (e.g., rate limits), or error handling. For a tool with 4 parameters and no annotations, this is a significant gap in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that efficiently conveys the core functionality and output. It front-loads the purpose ('Check an action against all active constraints') and follows with key return details, with no wasted words. Every part earns its place, making it highly concise and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (4 parameters, no annotations, but an output schema exists), the description is partially complete. It explains the tool's purpose and output, which is adequate since the output schema can handle return value details. However, it lacks parameter explanations and behavioral context (e.g., side effects, permissions), making it insufficient for full understanding without external documentation. It meets a minimum viable level but has clear gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the schema provides no parameter descriptions. The tool description does not mention any parameters or their purposes (e.g., what 'action_description', 'team_id', 'metadata', or 'target_resources' are for). This leaves all 4 parameters undocumented, with the description adding no value beyond what the bare schema provides, failing to compensate for the coverage gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: checking an action against constraints and returning match details and a verdict. It specifies the verb ('check'), resource ('action'), and outcome ('matched constraints, match types, combined verdict'), making it distinct from siblings like 'get_constraints' or 'declare_constraint'. However, it doesn't explicitly differentiate from similar tools like 'check_claim' or 'verify_before_acting', which might involve constraint checking, so it's not a perfect 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites (e.g., needing active constraints), exclusions, or comparisons to siblings like 'check_claim' or 'verify_before_acting'. The context is implied (e.g., for action validation), but without explicit usage rules, it leaves the agent guessing about appropriate scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_deploy_readinessCheck Deploy ReadinessAInspect

Is it safe to deploy these changes? Cross-references your changed modules against active constraints, recent incidents, knowledge freshness, and active alerts. Returns a composite verdict (ready/caution/block) with per-module breakdown and actionable recommendations. Use BEFORE deploying to catch constraint violations, recent regressions in the same area, stale knowledge that needs verification, and active alerts that might interact with your changes.

ParametersJSON Schema

Name	Required	Description
`changed_files`	No	Specific files changed (for constraint matching)
`deploy_target`	No	Deploy target environment (default 'production')
`changed_modules`	Yes	Modules being deployed (e.g. ['Engine', 'VaultCrux'])

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by explaining the tool's behavior: it performs cross-referencing across multiple data sources, returns a composite verdict with breakdowns, and provides actionable recommendations. It doesn't mention rate limits, authentication needs, or error conditions, but covers the core behavioral traits adequately.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Perfectly structured with two sentences: first explains what the tool does and returns, second provides usage guidance. Every word earns its place with zero redundancy, and the most critical information (purpose) comes first.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multi-source assessment), no annotations, but with 100% schema coverage and an output schema, the description is complete. It explains the assessment logic, return format, and usage context without needing to detail parameters or output structure that are covered elsewhere.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters. The description doesn't add any parameter-specific information beyond what's in the schema descriptions, maintaining the baseline score of 3 for adequate but not enhanced parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('cross-references', 'returns') and resources ('changed modules', 'active constraints', 'recent incidents', 'knowledge freshness', 'active alerts'). It distinguishes from sibling tools by focusing on deployment readiness assessment rather than general constraint checking or alert retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Use BEFORE deploying' with specific scenarios when to use it: 'to catch constraint violations, recent regressions in the same area, stale knowledge that needs verification, and active alerts that might interact with your changes.' This provides clear timing and use-case guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

checkpoint_decision_stateCheckpoint Decision StateAInspect

Create a receipted snapshot of your current decision state during a long-running session. Records decisions made, assumptions in effect, and open questions. Enables resumption by the same or different agent from the last checkpoint rather than replaying from zero.

ParametersJSON Schema

Name	Required	Description	Default
`summary`	Yes
`session_id`	Yes
`open_questions`	No
`decisions_so_far`	No
`assumptions_in_effect`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that the tool creates a 'receipted snapshot' (implying persistence and verification) and enables resumption, which are useful behavioral traits. However, it lacks details on permissions, rate limits, or whether the snapshot is immutable or editable later, leaving gaps for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence, followed by a benefit statement. Every sentence earns its place by explaining what the tool does and why to use it, with zero wasted words, making it highly efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, mutation operation) and no annotations, the description does well by covering purpose, usage, and parameter mapping. However, with an output schema present, it doesn't need to explain return values, but it could add more on behavioral aspects like error handling or idempotency to be fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must compensate. It adds meaning by explaining that parameters capture 'decisions made, assumptions in effect, and open questions,' which maps to 'decisions_so_far,' 'assumptions_in_effect,' and 'open_questions' in the schema. However, it doesn't clarify 'session_id' or 'summary' semantics, leaving some parameters partially unexplained.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Create a receipted snapshot') and resources ('current decision state'), distinguishing it from siblings like 'get_checkpoints' (which retrieves checkpoints) or 'record_decision_context' (which records context without snapshotting). It explicitly mentions what is recorded: decisions, assumptions, and open questions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidelines: 'during a long-running session' indicates when to use it, and 'enables resumption by the same or different agent from the last checkpoint rather than replaying from zero' explains the benefit and alternative (replaying from zero). It distinguishes from tools like 'log_progress' by focusing on state snapshotting for resumption.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

compare_versionsCompare VersionsAInspect

Temporal-ordered view of all values recorded for an (entity, predicate) pair, with the most-recent-before-as_of flagged as current. Use this when the agent sees contradictory facts ('Rachel moved to Chicago' AND 'Rachel moved to the suburbs') and needs to know which is current. Example calls: {entity:'user', predicate:'mortgage_amount'} returns every mortgage figure you've recorded, current first. {entity:'user', predicate:'family_trip_destination'} resolves the latest family-trip destination across sessions. {entity:'user', predicate:'previous_occupation'} returns the user's earlier roles when they changed jobs. Returns current_value directly plus has_multiple_versions so the caller can signal uncertainty, and may also include a cached narrative summary when FEATURE_CAUSAL_NARRATIVES is enabled.

ParametersJSON Schema

Name	Required	Description	Default
`entity`	Yes
`predicate`	Yes
`as_of_current`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses key behavioral traits: the tool returns a temporal-ordered view, flags the most recent value as current, and includes 'has_multiple_versions' to signal uncertainty. However, it lacks details on permissions, rate limits, or error handling, leaving gaps for a tool that queries potentially sensitive data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, followed by usage guidelines and examples. Each sentence adds value, but it could be slightly more concise by integrating the example calls more seamlessly. Overall, it's well-structured without unnecessary fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (version comparison with temporal ordering) and no annotations, the description does a good job explaining purpose, usage, and output behavior ('Returns current_value directly plus has_multiple_versions'). With an output schema present, it doesn't need to detail return values, but could benefit from more on error cases or data constraints.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It explains the semantics of 'entity' and 'predicate' with examples (e.g., entity:'user', predicate:'mortgage_amount'), clarifying they form a pair for tracking values. The optional 'as_of_current' parameter is implied in 'most-recent-before-as_of' but not explicitly detailed, leaving some ambiguity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Temporal-ordered view of all values recorded for an (entity, predicate) pair, with the most-recent-before-as_of flagged as current.' It uses specific verbs ('view', 'recorded', 'flagged') and distinguishes from siblings by focusing on version comparison for contradictory facts, unlike tools like 'get_versioned_snapshot' or 'get_contradictions' which may serve different functions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool: 'Use this when the agent sees contradictory facts... and needs to know which is current.' It provides a clear scenario with examples ('Rachel moved to Chicago' vs. 'Rachel moved to the suburbs'), guiding the agent on appropriate contexts without mentioning alternatives, which is sufficient given the tool's unique focus.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

declare_available_modelsDeclare Available ModelsAInspect

Declare which models are available in this session for orchestration routing. Called once at session start.

ParametersJSON Schema

Name	Required	Description	Default
`models`	Yes
`session_id`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A3.5/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions the tool is called at session start, implying it's a setup/initialization action, but lacks details on permissions, side effects (e.g., whether it overwrites existing declarations), error handling, or response behavior. The description is minimal and doesn't compensate for the absence of annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the core purpose and followed by usage timing. Every word earns its place with no redundancy or fluff, making it highly efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (2 parameters, no annotations, but has an output schema), the description is incomplete. It covers purpose and usage well but lacks parameter explanations and behavioral details. The output schema existence reduces the need to describe return values, but the description doesn't fully address the tool's operational context, leaving gaps in understanding how to use it effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It does not mention any parameters (session_id, models) or their semantics (e.g., what models array contains, what session_id is for). The description adds no meaning beyond the schema, leaving parameters undocumented and unclear in purpose.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('declare') and resource ('available models'), and specifies the context ('for orchestration routing'). It distinguishes from siblings by focusing on session initialization, though it doesn't explicitly contrast with other tools. The purpose is not tautological and is adequately specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool: 'Called once at session start.' This provides clear, actionable guidance on timing and frequency, distinguishing it from other tools that might be used during the session. No alternatives are mentioned, but the usage context is precise and sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

declare_constraintDeclare ConstraintCInspect

Declare an organisational constraint (boundary, relationship, policy, or context flag) that agents must respect. This is a mutation operation.

ParametersJSON Schema

Name	Required	Description	Default
`scope`	No
`team_id`	No
`evidence`	No
`severity`	No
`assertion`	Yes
`expires_at`	No
`constraint_type`	Yes
`assertion_structured`	No
`review_interval_days`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but only minimally discloses behavior. It identifies this as a 'mutation operation' which is critical, but doesn't describe what happens after declaration (persistence, visibility to other agents, enforcement mechanisms), permission requirements, or potential side effects. For a mutation tool with 9 parameters, this is inadequate behavioral disclosure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief (two sentences) and front-loaded with the core purpose. However, the second sentence about being a 'mutation operation' feels tacked on rather than integrated, and the description lacks the detail needed for a complex 9-parameter mutation tool, making it under-specified rather than concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with 9 parameters (2 required), 0% schema description coverage, no annotations, and complex nested objects, the description is severely incomplete. While an output schema exists (reducing need to describe return values), the description fails to explain parameter meanings, usage context, behavioral implications, or how this tool relates to sibling constraint tools. This leaves significant gaps for an AI agent to use it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning none of the 9 parameters have descriptions in the schema. The tool description provides no information about any parameters - not even the two required ones (constraint_type, assertion). The description doesn't explain what these parameters mean, their relationships, or how they affect the constraint declaration.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Declare') and resource ('organisational constraint') with specific examples (boundary, relationship, policy, context flag). It distinguishes from sibling 'update_constraint' by being a declaration rather than an update, but doesn't explicitly differentiate from 'suggest_constraint' or 'check_constraints'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides minimal guidance - only stating this is for declaring constraints that 'agents must respect' and identifying it as a 'mutation operation.' No explicit when-to-use vs alternatives (like 'suggest_constraint' or 'update_constraint'), no prerequisites, and no context about when this tool is appropriate versus other constraint-related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

decompose_expertiseDecompose ExpertiseAInspect

Decompose a high-level task class (e.g. 'handle marketing', 'review contracts') into specific steps with knowledge requirements, judgment requirements, delegation difficulty (trivial/moderate/hard/expert_only), existing coverage, and elicitation signals. Identifies which steps can be immediately delegated, which need elicitation first, and which are genuinely expert-only.

ParametersJSON Schema

Name	Required	Default
`depth`	No	overview
`task_class`	Yes
`operator_id`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A3.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It describes the tool's function and output structure but lacks details on permissions, rate limits, error handling, or performance characteristics. For a tool that analyzes task complexity, more behavioral context (e.g., processing time, data sensitivity) would be helpful.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in two sentences: the first explains the decomposition process and output components, the second clarifies the identification of delegation/elicitation/expert-only steps. Every phrase adds value without redundancy, making it front-loaded and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (task decomposition with multiple output dimensions) and the presence of an output schema (which handles return values), the description is reasonably complete. It covers the core purpose and output structure but could benefit from more behavioral details (since annotations are absent) and fuller parameter explanations to compensate for low schema coverage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the schema provides no parameter documentation. The description mentions 'high-level task class' which aligns with the 'task_class' parameter, but does not explain 'depth' (with enum values 'overview'/'detailed') or 'operator_id'. It adds some meaning for one parameter but leaves others unexplained, partially compensating for the schema gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('decompose', 'identifies') and resources ('high-level task class'), detailing the output components (steps with knowledge/judgment requirements, delegation difficulty, etc.). It distinguishes itself from sibling tools by focusing on task decomposition rather than assessment, elicitation, or context retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by specifying it handles 'high-level task classes' and identifies delegation/elicitation needs, but does not explicitly state when to use this tool versus alternatives like 'assess_delegation_readiness' or 'run_elicitation'. It provides clear functional scope but lacks explicit comparative guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

derive_from_factsDerive From FactsBInspect

Safe math and selection over a fact row set. Operations: sum, count, difference, max, min, latest, earliest. Pass the rows from enumerate_memory_facts and get a deterministic result with a computation trace. Removes arithmetic slop from totals, comparisons, and 'which is highest' questions.

ParametersJSON Schema

Name	Required	Description	Default
`rows`	Yes
`operation`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses key behavioral traits: 'safe math and selection' (implying non-destructive operations), 'deterministic result with a computation trace' (indicating reliability and auditability), and 'removes arithmetic slop from totals, comparisons, and 'which is highest' questions' (highlighting precision benefits). However, it lacks details on error handling, performance characteristics, or specific constraints beyond the basic operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized with three sentences that are front-loaded: the first sentence states the core purpose and operations, the second explains input and output behavior, and the third highlights benefits. There's minimal waste, though the last sentence could be more tightly integrated.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (mathematical/selection operations on fact rows), no annotations, and an output schema present, the description is moderately complete. It covers the purpose, basic usage, and key behavioral traits but lacks details on parameter semantics, error cases, or examples. The output schema existence means return values don't need explanation, but more context on how operations work would help.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It mentions 'rows' and 'operation' but provides minimal semantic context: 'rows' is linked to 'enumerate_memory_facts' output, and 'operation' lists the enum values without explaining what each does (e.g., what 'latest' means). This adds some meaning but falls short of fully documenting the two parameters, especially given the lack of schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs 'safe math and selection over a fact row set' with specific operations listed (sum, count, difference, max, min, latest, earliest). It distinguishes from siblings by mentioning it processes rows from 'enumerate_memory_facts', but doesn't explicitly differentiate from all potential alternatives like 'query_memory' or other analysis tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by stating 'Pass the rows from enumerate_memory_facts', suggesting this tool should be used after that sibling tool. However, it doesn't provide explicit guidance on when to use this versus other computational or analysis tools in the sibling list, nor does it mention any exclusions or prerequisites beyond the row input.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dismiss_skillDismiss SkillAInspect

Dismiss a pending or quarantined skill submission. The skill is marked as dismissed and will not be promoted.

ParametersJSON Schema

Name	Required	Description
`notes`	No	Additional notes
`reason`	Yes	Reason for dismissal
`submission_id`	Yes	ID of the skill submission to dismiss

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It clearly indicates a destructive action ('dismissed and will not be promoted'), which is helpful. However, it lacks details on permissions needed, whether the action is reversible, rate limits, or error conditions, leaving gaps in behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the key action and outcome with zero wasted words. It directly communicates the tool's purpose without unnecessary elaboration, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (not provided here but indicated in context signals), the description does not need to explain return values. It adequately covers the purpose and outcome for a dismissal action. However, as a mutation tool with no annotations, it could benefit from more behavioral details like error handling or side effects to be fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters (submission_id, reason, notes). The description does not add any additional semantic details about the parameters beyond what the schema provides, such as format examples or usage tips, meeting the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Dismiss'), the target resource ('a pending or quarantined skill submission'), and the outcome ('marked as dismissed and will not be promoted'). It distinguishes from sibling tools like 'promote_skill' and 'retract_skill' by specifying this is for pending/quarantined submissions only.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by specifying 'pending or quarantined skill submission,' suggesting when this tool is applicable. However, it does not explicitly state when to use it versus alternatives like 'retract_skill' or 'promote_skill,' nor does it mention prerequisites or exclusions, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

enumerate_memory_factsEnumerate Memory FactsAInspect

Deterministic fact-table extraction for aggregation questions and curated ESI lookups. Returns a structured row set (subject, predicate, object, date, session_id, confidence) instead of prose. Use this for 'how many', 'total', 'list all' questions — count the rows instead of hoping the LLM enumerates correctly. For curated ESI facts, pass predicate and optional projectionVersionTag; then query is optional and the server performs an exact predicate lookup without text search. Includes missing_dimensions to flag what might not have been found. Pass mode: "aggregation" (with FEATURE_AGGREGATION_PREDICATE_EXPANSION=true on the server) to enable: synonym expansion for pickup/return/acquire verbs, category-broadened counts from entity_categories, entity_type filters (expected_entity_types), and per-object dedup (dedup_by_object).

ParametersJSON Schema

Name	Required	Description	Default
`mode`	No
`as_of`	No
`limit`	No
`query`	No
`category`	No
`predicate`	No
`predicates`	No
`subject_match`	No
`dedup_by_object`	No
`expand_synonyms`	No
`projectionVersionTag`	No
`dedup_by_subject_root`	No
`expected_entity_types`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: it returns structured data (not prose), is designed for deterministic extraction, includes completeness indicators (missing_dimensions), and is optimized for aggregation tasks. However, it doesn't mention potential limitations like performance characteristics, error handling, or data freshness constraints that might be relevant for a fact-extraction tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in three sentences that each add distinct value: purpose statement, usage guidance, and output characteristics. There's no redundant information, and it's front-loaded with the core functionality. Every sentence earns its place by providing essential context that isn't available elsewhere.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (fact extraction with structured outputs) and the presence of an output schema (which handles return value documentation), the description provides strong context about when and why to use this tool versus alternatives. It covers the tool's specialized purpose, target use cases, and key behavioral characteristics. The main gap is lack of parameter semantics, but with an output schema present, this is somewhat mitigated as the description focuses appropriately on usage context rather than technical specifications.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the schema provides no parameter documentation. The description mentions 'missing_dimensions' which isn't in the input schema (it appears to be part of the output), but doesn't explain the three input parameters (query, as_of, limit). While the description's focus on aggregation questions gives context for the 'query' parameter, it doesn't provide semantic details about parameter usage, formats, or constraints beyond what's implied by the tool's purpose.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool's purpose as 'Deterministic fact-table extraction for aggregation questions' and specifies it returns 'a structured row set (subject, predicate, object, date, session_id, confidence) instead of prose.' This clearly distinguishes it from sibling tools like 'query_memory' or 'investigate_question' by focusing on structured data extraction for counting/list operations rather than general querying or investigation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'Use this for "how many", "total", "list all" questions — count the rows instead of hoping the LLM enumerates correctly.' It also mentions what the tool includes ('missing_dimensions to flag what might not have been found'), helping differentiate it from alternatives that might not provide such structured outputs or completeness indicators.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

escalate_with_contextEscalate With ContextAInspect

Contextual escalation — packages your full reasoning state (evidence gathered, options considered, recommended action) and routes to a human for review. Preserves work so the human responds with full context, not from scratch. Use when you hit genuine uncertainty that the system cannot evaluate.

ParametersJSON Schema

Name	Required	Description	Default
`urgency`	No
`question`	Yes
`reasoning`	Yes
`session_id`	No
`evidence_gathered`	No
`options_considered`	No
`recommended_action`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by explaining key behaviors: it preserves work for human review, routes to a human, and handles uncertainty. It doesn't mention response time, permissions needed, or rate limits, but covers the core operational context adequately.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with zero waste: first defines the tool, second explains its benefit, third gives usage guidance. Each sentence earns its place by adding distinct value, and it's front-loaded with the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters with 0% schema coverage and no annotations, the description does well by explaining the tool's context and key parameters. Since an output schema exists, it doesn't need to explain return values. It could be more complete by mentioning all parameters, but it covers the essential semantics adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It explains that parameters package 'full reasoning state (evidence gathered, options considered, recommended action)', which maps to key parameters like evidence_gathered, options_considered, and recommended_action. However, it doesn't detail all 7 parameters or their formats.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('packages', 'routes') and resources ('full reasoning state', 'human for review'). It distinguishes from siblings by focusing on escalation when the system cannot evaluate uncertainty, unlike other tools for assessment, checking, or information retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'Use when you hit genuine uncertainty that the system cannot evaluate.' This provides clear context for choosing this tool over alternatives like assessment or decision-making tools in the sibling list.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

expand_hit_contextExpand Hit ContextAInspect

Session-neighborhood expansion around promising retrieval hits. When you find a relevant chunk but the specific fact (name, date, amount) is in a nearby turn, use this to fetch ±N turns from the same session. Recovers facts like 'my parents', '$6', or 'Disney+' that are near but not in the retrieved chunk.

ParametersJSON Schema

Name	Required	Description	Default
`mode`	No
`hit_ids`	Yes
`radius_turns`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It explains the expansion behavior ('fetch ±N turns from the same session') and gives examples of what gets recovered, but doesn't disclose important behavioral traits like rate limits, authentication requirements, error conditions, or what happens when hit_ids exceed maxItems. It provides some context but leaves operational details unspecified.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly front-loaded with the core purpose in the first sentence, followed by usage guidance and concrete examples. Every sentence earns its place by adding distinct value - purpose, when-to-use, and illustrative examples - with zero wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (3 parameters, no annotations, but with output schema), the description provides strong purpose and usage context. The existence of an output schema means return values don't need explanation. The description covers the essential 'what' and 'when' well, though additional behavioral transparency would make it more complete for a tool with no annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must compensate. It explains the core concept of 'radius_turns' ('±N turns') and implies the purpose of 'hit_ids' ('around promising retrieval hits'), though it doesn't explicitly mention the 'mode' parameter or its enum values. The description adds meaningful context about what parameters achieve, compensating well for the lack of schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('expand', 'fetch') and resources ('retrieval hits', 'session-neighborhood'). It distinguishes from siblings by focusing on expanding context around specific hits rather than general retrieval or assessment functions, with explicit examples of what it recovers.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('When you find a relevant chunk but the specific fact... is in a nearby turn') and what it's designed for ('Recovers facts... that are near but not in the retrieved chunk'). It clearly defines the specific scenario where this tool should be selected.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

extract_facts_from_sessionsExtract Facts From SessionsAInspect

Scan the tenant's seeded sessions with rule-based extractors (money, counts, dates, project-role, acquire, version-chain) and emit structured facts to the projection stream so they become queryable via enumerate_memory_facts. Use when enumerate_memory_facts returns insufficient rows for aggregation, version-chain, or money questions and you suspect the fact exists but was under-predicated at ingest. Idempotent — safe to re-run (duplicate fact_hashes skipped unless overwrite_existing=true). Profile 'comprehensive' runs all rule families; narrower profiles ('money', 'counts', 'dates', 'version_chains') target a single family. Returns facts_added + rules_matched + receipt_id. Gated by FACT_EXTRACTION_MODE on the server.

ParametersJSON Schema

Name	Required	Description	Default
`session_ids`	No
`extraction_profile`	No
`overwrite_existing`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and delivers comprehensive behavioral details. It explains idempotency ('safe to re-run'), duplicate handling ('duplicate fact_hashes skipped unless overwrite_existing=true'), profile options and their effects, return values ('Returns facts_added + rules_matched + receipt_id'), and server-side gating ('Gated by FACT_EXTRACTION_MODE on the server').

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is dense but every sentence earns its place. It front-loads the core purpose, then provides usage guidelines, behavioral details, parameter semantics, and constraints in a logical flow without redundancy. Despite information density, it remains focused and efficiently structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (rule-based extraction with multiple profiles), no annotations, and an output schema present, the description is remarkably complete. It covers purpose, usage context, behavioral traits, parameter meanings, idempotency, server gating, and return values—effectively providing all necessary context for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage and 3 parameters, the description fully compensates by explaining all parameters: 'session_ids' is implied through 'tenant's seeded sessions,' 'extraction_profile' is detailed with all enum values and their meanings ('comprehensive' vs. narrower profiles), and 'overwrite_existing' is explained in the idempotency context. This adds significant meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Scan the tenant's seeded sessions with rule-based extractors... and emit structured facts to the projection stream.' It specifies the exact extractor types (money, counts, dates, project-role, acquire, version-chain) and distinguishes it from sibling 'enumerate_memory_facts' by explaining this tool populates the stream that the sibling queries.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance is provided: 'Use when enumerate_memory_facts returns insufficient rows for aggregation, version-chain, or money questions and you suspect the fact exists but was under-predicated at ingest.' This clearly states when to use this tool versus its sibling, including specific scenarios and conditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

fill_gapFill Knowledge GapBInspect

Fill a previously reported gap with new knowledge. Gap must have been reported by a different tenant for cross-tenant credit.

ParametersJSON Schema

Name	Required	Description	Default
`content`	Yes
`evidence`	Yes
`gap_receipt_id`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It mentions the cross-tenant aspect, which adds useful context about collaboration and credit systems. However, it fails to describe critical behaviors: whether this is a write operation (implied but not stated), what permissions are needed, how the filled gap affects the system, or what the output contains. For a tool that modifies knowledge, this is a significant gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that communicates the core purpose and key constraint without unnecessary words. It's front-loaded with the main action and resource, making it easy to parse. Every part of the sentence earns its place by adding either operational context or constraints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (knowledge modification with 3 parameters) and the presence of an output schema (which reduces need to describe returns), the description is partially complete. It covers the purpose and cross-tenant constraint but lacks details on parameter meanings, behavioral implications, and usage context relative to siblings. With no annotations and 0% schema coverage, it should do more to compensate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate for all three parameters. It only indirectly references 'gap_receipt_id' by mentioning 'previously reported gap', but provides no explanation for 'content' or 'evidence' parameters. The description adds minimal value beyond what the schema names imply, failing to clarify what content format is expected or what constitutes valid evidence.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Fill') and resource ('previously reported gap with new knowledge'), making the purpose understandable. It distinguishes from siblings like 'get_knowledge_gaps' (which likely reports gaps) by specifying the fill operation. However, it doesn't explicitly contrast with tools like 'submit_correction' or 'update_constraint' that might involve similar knowledge updates, preventing a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides some context by stating the gap must be reported by a different tenant for cross-tenant credit, implying this tool is used in multi-tenant scenarios for collaborative knowledge improvement. However, it lacks explicit guidance on when to use this versus alternatives like 'submit_correction' or 'update_constraint', and doesn't mention prerequisites or exclusions beyond the tenant requirement.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_active_alertsGet Active AlertsBInspect

Get active watch alerts across all watches for the tenant from the last 7 days.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It mentions the 7-day time constraint but doesn't disclose other behavioral traits like pagination, rate limits, authentication needs, error conditions, or what 'active' means operationally. For a read operation with zero annotation coverage, this leaves significant gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose with no wasted words. Every element (verb, resource, scope, time constraint) earns its place without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (which handles return values), no annotations, and low complexity, the description is moderately complete. It covers what the tool does but lacks parameter documentation and behavioral details that would be needed for full agent understanding without annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It doesn't mention the 'limit' parameter at all, leaving it undocumented. However, with only one parameter and the description providing context about time scope and resource, it partially compensates but doesn't fully address the parameter gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get') and resource ('active watch alerts'), specifying scope ('across all watches for the tenant') and time constraint ('from the last 7 days'). It distinguishes from potential siblings by focusing on active alerts with a time window, though it doesn't explicitly name alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, exclusions, or compare to other alert-related tools (none are listed in siblings, but general context lacks usage context).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_architecture_rationaleGet Architecture RationaleAInspect

Why is this module built this way? Aggregates all architectural decisions, active constraints, corrections, and skills for a domain into a coherent narrative. Use BEFORE refactoring or questioning a design choice — the answer is often 'it's that way because of compliance/performance/incident X'. Returns decisions sorted by recency, active constraints that still apply, and correction history showing what was tried and reverted.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max results per category (default 30)
`since`	No	How far back to search (ISO datetime, default 180 days)
`domain`	Yes	Domain or module to explain (e.g. 'retrieval', 'auth', 'billing', 'infra')
`include`	No	Which artefact types to include (default: all)

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does well by disclosing behavioral traits: it describes what the tool aggregates (decisions, constraints, corrections, skills), how it returns data (sorted by recency, with active constraints and correction history), and its purpose in design contexts. However, it lacks details on permissions, rate limits, or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with a clear purpose question, followed by specific functions and usage guidelines in two efficient sentences. Every sentence earns its place by adding value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, 100% schema coverage, output schema exists), the description is mostly complete: it explains the tool's purpose, usage, and return structure. With an output schema, it doesn't need to detail return values, but could improve by mentioning authentication or error scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description adds minimal value beyond the schema by implying domain usage in design contexts but doesn't provide additional syntax or format details. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('aggregates', 'returns') and resources ('architectural decisions, active constraints, corrections, and skills'), distinguishing it from siblings like get_constraints or get_decisions_on_stale_context by emphasizing a comprehensive narrative for a domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explicitly states when to use the tool ('Use BEFORE refactoring or questioning a design choice') and provides context on alternatives by implying that answers might come from compliance/performance/incident sources, though it doesn't name specific sibling tools, the guidance is clear and actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_audit_trailGet Audit TrailCInspect

Read VaultCrux Memory Core import audit history and linked receipt hashes for a topic.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`topic`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`items`	Yes
`topic`	Yes

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states 'Read', implying a read-only operation, but does not cover aspects like authentication needs, rate limits, error handling, or the format of the audit history. This leaves significant gaps in understanding how the tool behaves beyond its basic purpose.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the key action and resource without any wasted words. It is appropriately sized for the tool's complexity, making it easy to parse and understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema, the description does not need to explain return values. However, with no annotations, 0% schema coverage, and two parameters, the description is incomplete—it lacks behavioral details and parameter explanations. It meets a baseline for a read operation but falls short of providing full context for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning parameters 'topic' and 'limit' are undocumented in the schema. The description mentions 'for a topic', which hints at the 'topic' parameter, but does not explain its semantics, format, or the optional 'limit' parameter. This fails to compensate for the low schema coverage, providing minimal additional meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Read' and the resource 'VaultCrux Memory Core import audit history and linked receipt hashes for a topic', making the purpose specific and understandable. However, it does not explicitly differentiate from sibling tools like 'get_freshness_report' or 'get_versioned_snapshot', which might also involve audit or history-related functions, so it lacks sibling distinction for a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as other audit or history-related sibling tools. It mentions the resource but offers no context on prerequisites, exclusions, or comparative use cases, leaving the agent with minimal usage direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_causal_chainGet Causal ChainCInspect

Get the causal chain graph for a specific decision, showing how decisions, actions, and supersessions relate.

ParametersJSON Schema

Name	Required	Description	Default
`decision_id`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It states the tool retrieves a graph showing relationships, but does not specify whether this is a read-only operation, if it requires specific permissions, what format the graph is in, or any rate limits. For a tool with no annotation coverage, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose without unnecessary details. Every word contributes to understanding the tool's function, making it appropriately concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (which likely defines the graph structure), the description need not detail return values. However, with no annotations, 0% schema description coverage, and one parameter, the description is adequate but lacks guidance on usage, behavioral traits, and parameter specifics, leaving room for improvement in context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, with one required parameter ('decision_id') undocumented in the schema. The description mentions 'for a specific decision', implying the parameter's purpose, but does not explain what a 'decision_id' is, its format, or where to obtain it. This adds minimal semantic value beyond the schema's structural definition.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get the causal chain graph for a specific decision, showing how decisions, actions, and supersessions relate.' It specifies the verb ('Get'), resource ('causal chain graph'), and scope ('for a specific decision'), distinguishing it from siblings like 'get_decision_context' or 'get_audit_trail' by focusing on causal relationships rather than general context or audit logs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It does not mention prerequisites, exclusions, or compare it to sibling tools (e.g., 'get_decision_context' for broader context or 'get_correction_chain' for corrections), leaving the agent to infer usage based on the name and description alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_character_kernelGet Character KernelAInspect

Return active AgentCharacterKernel entries for an agent, including expected behaviour tuples and sufficiency bands used by drift checks.

ParametersJSON Schema

Name	Required	Description	Default
`agent_id`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must disclose all behavioral traits. It correctly identifies this as a read operation returning 'active' entries, implying filtering. However, it omits details like idempotency, auth requirements, or pagination (though output schema may cover structure). Adequate but not exhaustive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with the action and key details. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple retrieval tool with one parameter and an output schema, the description covers the essential return content. However, it could benefit from situating this within the broader character assessment workflow (e.g., how this relates to drift checks). Still, mostly complete given the tool's straightforward nature.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description only vaguely references 'for an agent'. For the sole parameter agent_id, it fails to clarify format, source, or validation rules beyond the schema's minLength. The description adds minimal value beyond the schema itself.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns active AgentCharacterKernel entries for an agent, including specific content like behaviour tuples and sufficiency bands. This distinguishes it from sibling tools like admit_character_kernel_entry (write) and assess_character_drift (analysis).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when kernel entries are needed, but it lacks explicit guidelines on when to use this vs alternatives (e.g., assess_character_drift) or prerequisites (e.g., agent must exist). No when-not-to-use guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_checkpointsGet CheckpointsAInspect

Retrieve decision checkpoints for a session. Returns the linked list of checkpoints in reverse chronological order. Use this to resume work from a prior checkpoint after session failure or handoff.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`session_id`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses key behavioral traits: it returns data in 'reverse chronological order' and is used for resuming work, which adds context beyond basic retrieval. However, it lacks details on permissions, rate limits, error handling, or whether it's read-only (implied but not stated), leaving gaps for a tool with no annotation support.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized with two sentences that are front-loaded and efficient. The first sentence states the purpose, and the second provides usage context, with no wasted words or redundancy, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 parameters, no annotations, but with an output schema), the description is somewhat complete but has gaps. It explains the purpose and usage context but lacks details on parameters and behavioral aspects like error handling. The presence of an output schema reduces the need to describe return values, but overall completeness is adequate yet not thorough.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate for undocumented parameters. It mentions 'session_id' implicitly in the context but does not explain the purpose or usage of 'limit' or provide any details on parameter semantics beyond what's inferred. This fails to add meaningful value over the bare schema, especially for the 'limit' parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('retrieve') and resource ('decision checkpoints for a session'), making the purpose specific and understandable. However, it does not explicitly differentiate this tool from sibling tools like 'get_decision_context' or 'get_versioned_snapshot', which might have overlapping retrieval functions, so it lacks sibling differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: 'to resume work from a prior checkpoint after session failure or handoff.' This gives practical guidance on its intended use case. However, it does not mention when not to use it or name specific alternatives among the many sibling tools, such as for other types of session data retrieval.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_constraintsGet ConstraintsBInspect

List active organisational constraints, optionally filtered by type, status, or team.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`status`	No
`team_id`	No
`constraint_type`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It states this is a list operation but doesn't mention pagination behavior (e.g., how 'limit' parameter works), rate limits, authentication requirements, or what 'active' means operationally. This is inadequate for a tool with 4 parameters and no annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that efficiently conveys the core functionality and filtering options. Every word earns its place with no redundancy or unnecessary elaboration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (which handles return values), 4 parameters with 0% schema coverage, and no annotations, the description is minimally adequate. It covers the basic purpose and mentions filtering parameters but lacks behavioral context and doesn't explain all parameters or usage scenarios, leaving gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description mentions filtering by 'type, status, or team', which maps to 3 of the 4 parameters (constraint_type, status, team_id). However, with 0% schema description coverage, it doesn't explain the 'limit' parameter or provide details on enum values (e.g., what 'boundary' vs 'policy' constraint types mean). The description adds some value but doesn't fully compensate for the schema coverage gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('List') and resource ('active organisational constraints'), and mentions optional filtering criteria. However, it doesn't explicitly differentiate from sibling tools like 'check_constraints' or 'suggest_constraint', which appear related but have different functions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'check_constraints' or 'suggest_constraint'. It mentions optional filtering but doesn't specify scenarios or prerequisites for usage, leaving the agent without context for tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_contradictionsGet ContradictionsAInspect

Find conflicting information across the user's memory. Returns groups of artefacts that contradict each other on the same topic. Use after gathering evidence for an answer — if your evidence sources disagree, this reveals which version is correct (typically the most recent).

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`items`	Yes

Tool Definition Quality

A4.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses that the tool returns groups of contradictory artefacts and implies it operates on memory data, but doesn't specify behavioral aspects like whether it's read-only (implied by 'Find' and 'Returns'), performance characteristics, error conditions, or authentication requirements. It adds some context about recency bias but lacks comprehensive behavioral disclosure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise with three sentences that each earn their place: the first states the core function, the second explains the return value, and the third provides usage guidance. It's front-loaded with the essential purpose and wastes no words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (contradiction detection in memory), no annotations, and the presence of an output schema (which handles return value documentation), the description is reasonably complete. It explains what the tool does, when to use it, and the basic logic behind contradiction resolution. The main gap is lack of detailed behavioral transparency beyond the core function.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 1 parameter with 0% description coverage, so the description must compensate. While it doesn't explicitly mention the 'limit' parameter, it provides crucial semantic context about what the tool analyzes (conflicting information in memory) and when to use it. Since there's only one parameter and the description meaningfully explains the tool's purpose, it adequately compensates for the schema gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Find conflicting information', 'Returns groups of artefacts') and resources ('across the user's memory'). It explicitly distinguishes this tool from siblings by focusing on contradiction detection rather than other memory operations like query_memory or enumerate_memory_facts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('Use after gathering evidence for an answer — if your evidence sources disagree') and includes a practical alternative ('typically the most recent [version is correct]'). This clearly differentiates it from other tools that might gather or query information without analyzing contradictions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_correction_chainGet Correction ChainAInspect

Trace how a fact or decision evolved over time. When you find a value (e.g. 'Rachel moved to Chicago'), call this to check if a more recent session supersedes it. Returns the full version chain with timestamps. ALWAYS use for 'current', 'now', 'most recent' questions before answering with the first value you find.

ParametersJSON Schema

Name	Required	Description	Default
`decision_id`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses key behavioral traits: the tool traces evolution over time, checks for supersession, and returns version chains with timestamps. However, it doesn't mention error conditions, performance characteristics, or authentication requirements that would be helpful for a mutation-checking tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with three sentences that each earn their place: establishes purpose, provides usage trigger, and gives mandatory application rule. No wasted words, and critical information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (temporal version tracking), no annotations, and an output schema that presumably covers return values, the description is reasonably complete. It explains what the tool does and when to use it, though additional behavioral details about error handling or performance would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It implies the parameter relates to identifying the fact/decision ('decision_id'), but doesn't explain what format this ID takes, how to obtain it, or its relationship to the tracing operation. The description adds minimal value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('trace', 'check if supersedes') and resources ('fact or decision', 'version chain with timestamps'). It distinguishes from siblings by focusing on temporal evolution tracking rather than other memory/context operations like 'get_audit_trail' or 'get_causal_chain'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance is provided on when to use this tool: 'When you find a value... call this to check if a more recent session supersedes it' and 'ALWAYS use for 'current', 'now', 'most recent' questions before answering'. This gives clear context for application versus alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_decision_contextGet Decision ContextBInspect

Retrieve agent session decisions from the CoreCrux Decision Plane, including decision IDs, outcomes, and cursor positions.

ParametersJSON Schema

Name	Required	Description	Default
`session_id`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It states the retrieval action but doesn't mention permissions required, rate limits, pagination behavior, error conditions, or what format the data returns. While an output schema exists, the description doesn't hint at behavioral traits like whether this is a read-only operation, how fresh the data is, or any side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose. Every word contributes meaning: 'Retrieve' (action), 'agent session decisions' (resource), 'CoreCrux Decision Plane' (source), and specific content details. No wasted words or redundant phrasing.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (retrieval operation with one parameter) and the presence of an output schema, the description covers the basic purpose adequately. However, with no annotations and 0% schema description coverage, it lacks important context about behavioral traits and parameter meaning. The output schema reduces the need to describe return values, but the description doesn't provide enough operational guidance for confident use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It mentions retrieving decisions for a session but doesn't explain what 'session_id' represents, its format, or where to obtain it. The description adds minimal parameter context beyond the schema's structural definition. With only one parameter, the baseline is higher, but the lack of semantic explanation keeps this at an adequate level.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Retrieve') and resource ('agent session decisions from the CoreCrux Decision Plane'), with specific content details ('decision IDs, outcomes, and cursor positions'). It distinguishes from many siblings by focusing on decision retrieval rather than assessment, checking, or other operations. However, it doesn't explicitly differentiate from similar retrieval tools like 'get_checkpoints' or 'get_decisions_on_stale_context'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With numerous sibling tools including other retrieval operations (e.g., 'get_checkpoints', 'get_decisions_on_stale_context', 'get_audit_trail'), there's no indication of which scenarios warrant this specific retrieval. No prerequisites, exclusions, or comparison to similar tools are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_decisions_on_stale_contextGet Decisions on Stale ContextBInspect

Find decisions in a session that may have been made on stale memory context.

ParametersJSON Schema

Name	Required	Description	Default
`session_id`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but offers minimal behavioral insight. It mentions 'may have been made on stale memory context', hinting at uncertainty in results, but doesn't disclose critical traits like whether this is a read-only operation, what permissions are needed, how results are structured, or any rate limits. For a tool with zero annotation coverage, this is inadequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose without unnecessary words. Every part earns its place by specifying the action, target, and key constraint, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (involving decisions and stale context), no annotations, and an output schema present, the description is minimally adequate. It states what the tool does but lacks details on behavioral traits, usage context, or parameter nuances. The output schema mitigates some gaps, but overall completeness is limited, scoring a baseline 3.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 1 parameter with 0% description coverage, but the description compensates by implying the parameter's purpose: 'session_id' is used to find decisions in that session. Since there's only one parameter and the description adds meaningful context beyond the bare schema, a score of 4 is appropriate—it doesn't fully detail the parameter but provides enough semantic value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'find' and the resource 'decisions in a session', specifying they are related to 'stale memory context'. It distinguishes from siblings like 'get_decision_context' or 'get_freshness_report' by focusing on decisions made on potentially outdated information. However, it doesn't explicitly differentiate from all similar tools, keeping it at 4 instead of 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'get_freshness_report', 'get_decision_context', or 'assess_coverage'. It lacks explicit when/when-not instructions or named alternatives, leaving usage context implied at best.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_domain_changelogDomain ChangelogAInspect

Cross-artefact-type changelog for specified domains since a given timestamp. Returns constraints added/updated, knowledge changes, decisions recorded, and alerts raised/resolved. Use at session start to learn what changed in your domain since your last session.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Maximum entries to return (default 500)
`since`	Yes	Changelog start timestamp (max 90 days ago)
`domains`	Yes	Domains to check for changes
`include`	No	Optional filter for artefact types

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes what the tool returns ('constraints added/updated, knowledge changes, decisions recorded, and alerts raised/resolved') and its intended use case. However, it doesn't mention potential limitations like rate limits, authentication needs, or pagination behavior, which would be helpful for a tool with no annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and front-loaded, with two sentences that efficiently convey purpose and usage guidelines. Every word earns its place, with no redundant or vague language.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that there's an output schema (which handles return values), no annotations, and high schema coverage, the description does a good job of explaining the tool's purpose and usage. However, for a tool with no annotations, it could benefit from mentioning behavioral aspects like whether it's read-only or has side effects, though the context implies it's a query tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description doesn't add any parameter-specific details beyond what's in the schema (e.g., it doesn't explain the format of 'domains' or clarify the 'include' filter options). This meets the baseline expectation when schema coverage is high.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Cross-artefact-type changelog for specified domains since a given timestamp.' It specifies the verb ('changelog'), resource ('domains'), and scope ('since a given timestamp'), and distinguishes it from siblings by focusing on cross-artefact changes rather than specific artefact queries like get_constraints or get_decisions_on_stale_context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool: 'Use at session start to learn what changed in your domain since your last session.' This provides clear context for usage, distinguishing it from tools like get_active_alerts or get_freshness_report that might serve different purposes.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_enrichment_statusGet Enrichment StatusBInspect

Check the status of submitted corrections (pending, corroborated, merged, retracted, expired).

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`status`	No
`correction_id`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. While 'check' implies a read operation, it doesn't specify whether this requires authentication, has rate limits, returns paginated results, or what happens when no corrections match the criteria. The description is minimal and lacks important behavioral context for a tool with parameters.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise - a single sentence that efficiently communicates the core purpose. It's front-loaded with the main action and includes relevant status values without unnecessary elaboration. Every word serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 3 parameters, 0% schema description coverage, no annotations, but does have an output schema, the description is minimally adequate. The output schema reduces the need to describe return values, but the description doesn't provide enough context about parameter usage, behavioral characteristics, or differentiation from related tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the schema provides no parameter documentation. The description mentions 'status' values but doesn't explain the 'limit' parameter's purpose or the 'correction_id' parameter. It provides some semantic context for the status enum but leaves other parameters unexplained, resulting in partial compensation for the schema gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Check the status of submitted corrections' with specific status values listed. It uses a specific verb ('check') and resource ('submitted corrections'), but doesn't explicitly differentiate from sibling tools like 'get_correction_chain' or 'submit_correction' which might be related.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided about when to use this tool versus alternatives. The description doesn't mention prerequisites, appropriate contexts, or when other tools like 'get_correction_chain' might be more suitable. It simply states what the tool does without usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_escalation_recommendationGet Escalation RecommendationCInspect

Get model routing recommendation for a query based on composite confidence and difficulty profile. Returns escalation advice: none, recommended, or required.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes
`session_id`	No
`current_model`	No
`query_confidence`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

C2.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions the tool returns 'escalation advice: none, recommended, or required' but doesn't explain what triggers these recommendations, whether the operation has side effects, rate limits, or authentication requirements. For a decision-making tool with zero annotation coverage, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (two sentences) and front-loaded with the core purpose. Every sentence adds value: the first explains what the tool does, the second specifies the return values. There's no wasted verbiage.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (decision-making with 4 parameters), no annotations, and an output schema (which handles return values), the description is minimally adequate. It covers the basic purpose and output but lacks details on behavioral traits, parameter usage, and differentiation from siblings, leaving gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the schema provides no parameter documentation. The description mentions 'query', 'composite confidence', and 'difficulty profile', which partially map to parameters (query, query_confidence), but doesn't explain 'session_id' or 'current_model', nor does it clarify how 'composite confidence' relates to 'query_confidence'. It adds some meaning but doesn't fully compensate for the coverage gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get model routing recommendation for a query based on composite confidence and difficulty profile.' It specifies the verb ('get'), resource ('model routing recommendation'), and key inputs ('query', 'confidence', 'difficulty profile'). However, it doesn't explicitly differentiate from sibling tools like 'escalate_with_context' or 'assess_delegation_readiness', which might have overlapping functions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It doesn't mention prerequisites, timing considerations, or compare it to sibling tools like 'escalate_with_context' or 'assess_delegation_readiness'. The agent must infer usage from the purpose alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_freshness_reportGet Freshness ReportAInspect

Check how recent the stored knowledge is across topics. Returns staleness indicators per topic. Use this when answering time-sensitive questions to verify your evidence isn't outdated. Topics with stale data may have been superseded by newer conversations not yet retrieved.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`stale_after_days`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`items`	Yes

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It describes the tool's function and output ('staleness indicators per topic') and adds context about stale data being 'superseded by newer conversations not yet retrieved,' which hints at limitations. However, it doesn't cover critical behavioral aspects like whether this is a read-only operation, potential performance impacts, error conditions, or authentication needs, leaving gaps for a tool with output schema but no annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded: it starts with the core purpose, then describes the output, and ends with usage guidance. Every sentence adds value—none are redundant or wasteful—making it efficient and well-structured for quick understanding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (2 parameters, output schema exists, no annotations), the description is partially complete. It explains the purpose and usage well, and the output schema likely covers return values, reducing the need for output details. However, it lacks parameter explanations and full behavioral context (e.g., side effects, error handling), making it adequate but with clear gaps for informed tool selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 2 parameters with 0% description coverage, meaning the schema provides no semantic information. The description doesn't mention any parameters, failing to compensate for this gap. It doesn't explain what 'limit' or 'stale_after_days' mean or how they affect the report, leaving parameters undocumented and reducing usability.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Check how recent the stored knowledge is across topics' and 'Returns staleness indicators per topic.' This specifies the verb ('check'), resource ('stored knowledge'), and output ('staleness indicators'). However, it doesn't explicitly differentiate from sibling tools like 'get_domain_changelog' or 'get_versioned_snapshot,' which might also relate to knowledge freshness, so it doesn't fully achieve sibling differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage context: 'Use this when answering time-sensitive questions to verify your evidence isn't outdated.' This gives a specific scenario for when to use the tool. However, it doesn't mention when not to use it or name alternatives among the many sibling tools, such as 'get_domain_changelog' for tracking changes, so it lacks explicit exclusions or comparisons.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_incident_contextGet Incident ContextAInspect

What went wrong last time we touched this module? Returns past incidents, deploy failures, gotchas, and active constraints for a module or system. Use BEFORE modifying infrastructure code, deploy scripts, or any module with a history of fragility. Surfaces the kind of tribal knowledge that prevents repeat failures — Docker bind mount traps, Vault agent write patterns, stale dist/ artifacts, port conflicts, and similar operational landmines.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max results to return (default 20)
`module`	Yes	Module or system name (e.g. 'Engine', 'VaultCrux', 'docker', 'vault-agent')
`days_back`	No	How many days back to search (default 90)
`code_paths`	No	Specific code paths to check for related incidents

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing the tool's behavioral purpose: retrieving historical operational knowledge to prevent repeat failures. It gives concrete examples of what might be surfaced (Docker bind mount traps, Vault agent write patterns, etc.), though it doesn't specify rate limits, authentication needs, or exact return format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with two sentences: the first states the core purpose, and the second provides usage guidelines and concrete examples. Every sentence adds value without redundancy, making it front-loaded and appropriately sized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (historical knowledge retrieval), 100% schema coverage, and the presence of an output schema, the description is complete enough. It clearly explains the tool's purpose, when to use it, and what kind of information it returns, compensating well for the lack of annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema, maintaining the baseline score of 3 for adequate but not enhanced parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Returns past incidents, deploy failures, gotchas, and active constraints') and resources ('for a module or system'). It distinguishes itself from siblings by focusing on historical operational knowledge rather than current alerts, constraints, or decision contexts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('Use BEFORE modifying infrastructure code, deploy scripts, or any module with a history of fragility') and what it surfaces ('tribal knowledge that prevents repeat failures'). It implicitly distinguishes from siblings like get_active_alerts (current issues) or get_constraints (current constraints).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_knowledge_gapsGet Knowledge GapsBInspect

List gap receipts (coverage + enumeration) for the tenant, filterable by topic and recency.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since_days`	No
`gap_subtype`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but offers limited behavioral insight. It mentions filtering capabilities but doesn't disclose critical traits like whether this is a read-only operation, potential rate limits, authentication needs, or what 'gap receipts' entail in practice. The description is functional but lacks depth for safe agent invocation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with zero wasted words. It front-loads the core purpose ('List gap receipts') and immediately adds qualifying details. Every element earns its place, making it easy for an agent to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (which handles return values) and no annotations, the description is moderately complete but has gaps. It covers the basic purpose and filtering, but for a 3-parameter tool with 0% schema coverage and no annotations, it should provide more context on parameter usage and behavioral expectations to be fully adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate but only partially does. It mentions filtering by 'topic and recency', which loosely maps to 'gap_subtype' and 'since_days', but doesn't explain the 'limit' parameter or clarify that 'gap_subtype' is an enum with 'coverage' and 'enumeration' values. The description adds some meaning but leaves parameters under-specified.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('List gap receipts') and the resource ('for the tenant'), specifying the scope with 'coverage + enumeration'. It distinguishes from some siblings like 'fill_gap' or 'assess_coverage' by focusing on listing rather than assessment or filling, but doesn't explicitly differentiate from all potential list-like tools in the sibling set.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides minimal guidance with 'filterable by topic and recency', implying usage when filtering is needed, but lacks explicit when-to-use context, prerequisites, or alternatives. No comparison to siblings like 'list_topics' or 'get_freshness_report' is made, leaving the agent without clear selection criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_my_tasksMy TasksBInspect

Read-through to PlanCrux task graph. Returns tasks assigned to or relevant to the calling agent, filtered by status and priority. Includes stage progress, blockers, and linked MemoryCrux artefact counts.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Max tasks to return (default 10)
`status`	No	Filter by task status (e.g. incomplete, in_progress, testing)
`priority`	No	Filter by priority (critical, high, medium, low)

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It mentions that the tool 'Read-through to PlanCrux task graph' and includes details like 'stage progress, blockers, and linked MemoryCrux artefact counts,' which adds some context. However, it lacks critical information such as authentication requirements, rate limits, error handling, or whether it's a read-only operation (implied by 'returns' but not explicit).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded, with the core purpose stated in the first sentence. It efficiently conveys key details without unnecessary words, though it could be slightly more structured by separating functional and contextual information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (3 parameters, no annotations, but with an output schema), the description is reasonably complete. It explains what the tool does, what it returns, and includes contextual details like 'stage progress' and 'blockers'. The presence of an output schema means the description does not need to detail return values, making it adequate for the agent's needs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with clear documentation for 'limit', 'status', and 'priority'. The description adds minimal value beyond the schema by mentioning filtering by status and priority, but does not provide additional semantics like examples for status values or priority levels. Given the high schema coverage, the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Returns tasks assigned to or relevant to the calling agent, filtered by status and priority.' It specifies the verb ('returns'), resource ('tasks'), and scope ('assigned to or relevant to the calling agent'), but does not explicitly differentiate from sibling tools like 'get_task_context' or 'get_active_alerts', which might also retrieve task-related information.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It mentions filtering by status and priority but does not specify scenarios or prerequisites for its use, nor does it reference any sibling tools. This leaves the agent without clear direction on tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_operating_rhythmsGet Operating RhythmsAInspect

Retrieve operating rhythm records for an operator — temporal behavioural patterns capturing what the human actually does (not what their calendar says). Filterable by cadence (daily, weekly, monthly) and delegation status. Returns structured sequences with delegation readiness signals.

ParametersJSON Schema

Name	Required	Description	Default
`cadence`	No
`operator_id`	No
`delegation_status`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool's function and output format ('structured sequences with delegation readiness signals'), but lacks details on permissions, rate limits, error handling, or pagination. The disclosure is adequate but not comprehensive for a retrieval tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in two sentences: the first states the core purpose and differentiation, the second covers filtering and output. Every phrase adds value without redundancy, making it front-loaded and appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (3 parameters, 2 with enums), no annotations, but an output schema present, the description is largely complete. It covers purpose, filtering logic, and output format, though it could benefit from more behavioral context (e.g., data freshness, access controls) since annotations are absent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds meaningful context by explaining the purpose of cadence and delegation status filters, and clarifies that operator_id targets 'an operator'. However, it does not detail the enum values or optionality of parameters, leaving some schema interpretation required.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('retrieve'), resource ('operating rhythm records'), and scope ('for an operator'), with explicit differentiation from calendar data ('what the human actually does (not what their calendar says)'). This provides a precise, non-tautological purpose that distinguishes it from sibling tools like get_operator_profile or get_task_context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context through filterable parameters (cadence, delegation status) but does not explicitly state when to use this tool versus alternatives. No sibling tools are directly referenced for comparison, and there is no guidance on prerequisites or exclusions, leaving usage somewhat open to interpretation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_operator_profileGet Operator ProfileAInspect

Retrieve the human operator's working context: role, domain expertise, communication preferences, quality bar, trusted/distrusted sources. Returns the latest version of the profile. Defaults to the operator linked to the calling agent's passport.

ParametersJSON Schema

Name	Required	Description	Default
`operator_id`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It adds useful context about default behavior (operator linked to passport) and that it retrieves the latest version, but does not cover aspects like authentication needs, rate limits, error handling, or whether it's a read-only operation. This leaves gaps in transparency for an agent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with two sentences that efficiently convey purpose, scope, and default behavior without unnecessary details. Every sentence adds value, making it easy for an agent to parse and understand quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (retrieving a profile with multiple attributes), no annotations, and an output schema present (which likely covers return values), the description is mostly complete. It specifies what is retrieved and default behavior, but could improve by mentioning prerequisites or potential side effects, though the output schema reduces the need for return value details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 1 parameter with 0% description coverage, so the description must compensate. It implies the parameter 'operator_id' is optional by stating defaults to the linked operator, adding semantic meaning beyond the schema. However, it does not explain the format or constraints of 'operator_id' (e.g., expected values or usage scenarios), leaving some ambiguity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Retrieve') and resources ('human operator's working context'), listing concrete components like role, domain expertise, and communication preferences. It distinguishes from sibling tools like 'update_operator_profile' by focusing on retrieval rather than modification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for usage by specifying it returns the latest version of the profile and defaults to the operator linked to the calling agent's passport. However, it does not explicitly state when not to use it or name alternatives among siblings, such as 'get_decision_context' or 'get_task_context', which might serve overlapping purposes.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_passportGet PassportAInspect

Return the calling agent's passport with current reputation tier and receipt count. Recalculates receipt count on every call and auto-upgrades the tier when new thresholds are met (basic 10+, established 100+, trusted 500+, elite 2000+). Includes a hint for the next tier upgrade.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and excels by disclosing critical behavioral traits: it recalculates receipt count on every call, auto-upgrades tiers based on specific thresholds (basic 10+, established 100+, trusted 500+, elite 2000+), and includes a hint for the next tier upgrade. These details go beyond basic functionality to explain dynamic behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, followed by essential behavioral details in a logical flow. Every sentence adds value: the first states what it returns, the second explains recalculation and auto-upgrade mechanics, and the third notes an additional feature. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (dynamic tier upgrades) and the presence of an output schema (which handles return values), the description is complete. It covers purpose, behavior, and key features without needing to detail outputs or parameters, making it fully adequate for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0 parameters with 100% coverage, so no parameter documentation is needed. The description appropriately doesn't discuss parameters, earning a baseline of 4 (not 5) because it could theoretically mention the absence of parameters, but this is minor.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('return'), the resource ('calling agent's passport'), and the key components included ('current reputation tier and receipt count'). It distinguishes this tool from all sibling tools by focusing on agent identity and reputation status, which no other tool appears to address.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context through 'calling agent's passport' and 'recalculates receipt count on every call,' suggesting this is for checking one's own status. However, it doesn't explicitly state when to use this versus alternatives (like checking other agents' statuses if such tools existed) or provide exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_platform_capabilitiesGet Platform CapabilitiesAInspect

Machine-queryable manifest of all available MemoryCrux tools, required trust tiers, and credit costs. Returns structured data for agent-to-service evaluation without reading documentation. Free (0 credits) at all tiers — discovery drives adoption.

ParametersJSON Schema

Name	Required	Description	Default
`category`	No
`min_trust_tier`	No
`max_credit_cost`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behavioral traits: it's a read-only query tool (implied by 'manifest' and 'returns'), discloses cost information ('Free (0 credits) at all tiers'), and explains the business rationale ('discovery drives adoption'). It doesn't mention rate limits or authentication requirements, but covers the essential operational context well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise with two sentences that each earn their place. The first sentence defines the tool's purpose and scope, while the second provides critical operational context (cost and rationale). No wasted words, and the most important information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that this is a read-only discovery tool with an output schema (which handles return value documentation), the description provides good contextual completeness. It explains what the tool does, why to use it, and operational characteristics. The main gap is the lack of parameter semantics, but for a tool whose primary value is returning structured data about other tools, the description is reasonably complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 3 parameters with 0% description coverage, but the tool description provides no information about what the parameters mean or how they affect the query. The description mentions 'category', 'trust tiers', and 'credit costs' conceptually, but doesn't explain how the actual parameters (category, min_trust_tier, max_credit_cost) map to these concepts or their expected values.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Machine-queryable manifest', 'Returns structured data') and resources ('all available MemoryCrux tools, required trust tiers, and credit costs'). It distinguishes itself from documentation-reading alternatives by emphasizing 'without reading documentation', making its purpose distinct from siblings like query_memory or get_domain_changelog.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool ('for agent-to-service evaluation without reading documentation') and mentions it's 'Free (0 credits) at all tiers — discovery drives adoption', which gives practical usage guidance. However, it doesn't explicitly state when NOT to use it or name specific alternative tools among the many siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_pressure_statusGet Pressure StatusBInspect

Get Engine knowledge pressure status for the tenant — indicates whether knowledge bases are under update pressure.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions the tool 'indicates whether knowledge bases are under update pressure,' which hints at a read-only status check, but lacks details on permissions, rate limits, response format, or error handling. This is inadequate for a tool with no annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the purpose ('Get Engine knowledge pressure status') and adds clarifying context ('indicates whether knowledge bases are under update pressure'). There is no wasted text, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 0 parameters, 100% schema coverage, and an output schema exists, the description is minimally complete. However, with no annotations and a read-focused tool, it should ideally include more behavioral context (e.g., what 'update pressure' entails, response interpretation) to fully guide the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has 0 parameters with 100% schema description coverage, so no additional parameter information is needed. The description doesn't add param details, but this is acceptable given the baseline for zero parameters is 4, as it doesn't need to compensate for any gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Get') and resource ('Engine knowledge pressure status'), specifying it indicates whether knowledge bases are under update pressure. It distinguishes from siblings by focusing on pressure status rather than other aspects like alerts, gaps, or context, though it doesn't explicitly name alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The description implies it's for checking update pressure on knowledge bases, but it doesn't specify scenarios, prerequisites, or compare with siblings like get_freshness_report or get_enrichment_status.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_relevant_contextGet Relevant ContextAInspect

Task-scoped context briefing. Returns a prioritised context payload shaped by your task description, ranked by risk-if-missed. Constraints and alerts rank above general knowledge. Use at the START of reasoning about a question to get the system's best assessment of what's relevant. Complements query_memory: this gives breadth, query_memory gives depth.

ParametersJSON Schema

Name	Required	Description	Default
`token_budget`	Yes
`priority_signal`	No
`task_description`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses key behavioral traits: the tool returns 'prioritised context payload' with ranking criteria ('ranked by risk-if-missed', 'Constraints and alerts rank above general knowledge'), and it specifies usage timing ('at the START of reasoning'). However, it doesn't mention potential limitations like rate limits or authentication needs, which could be relevant for a context-fetching tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded: the first sentence states the core purpose, followed by specifics on ranking and usage. Every sentence adds value—no wasted words—and it efficiently covers purpose, behavior, and guidelines in a compact format.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (context retrieval with prioritization), no annotations, and an output schema (which handles return values), the description is mostly complete. It covers purpose, usage, and key behavioral aspects. However, it lacks details on error handling or specific constraints, which could be useful for an agent invoking this tool in varied scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It explains the purpose of 'task_description' ('shaped by your task description') and implies prioritization logic that relates to 'priority_signal'. However, it doesn't explicitly mention 'token_budget' or detail how parameters interact. The description adds meaningful context but doesn't fully document all three parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Task-scoped context briefing' with specific verbs ('returns', 'shaped by', 'ranked by') and resources ('context payload', 'task description'). It explicitly distinguishes from sibling 'query_memory' by stating 'this gives breadth, query_memory gives depth', making the distinction clear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: 'Use at the START of reasoning about a question' and 'Complements query_memory: this gives breadth, query_memory gives depth.' It specifies when to use this tool versus an alternative, giving clear context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_session_dossiersGet Session DossiersAInspect

Retrieve pre-synthesized per-session memory dossiers (typed: experience | fact | preference; with When/Involving/To-purpose metadata). Use for multi-session or preference-style questions where stitching across conversations is the bottleneck — the dossier already summarises each session's key events. Two modes: mode='search' with a query (BM25-ish ranking over summary+purpose, optional type_filter), or mode='list' returns the tenant's most-recent dossiers chronologically. Tenants without FEATURE_SESSION_DOSSIERS enabled return an empty list (no error).

ParametersJSON Schema

Name	Required	Description	Default
`mode`	No
`limit`	No
`query`	No
`type_filter`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure and does so effectively. It describes the two operational modes, explains the search ranking mechanism (BM25-ish ranking over summary+purpose), mentions the optional type_filter, specifies chronological ordering for list mode, and importantly discloses the FEATURE_SESSION_DOSSIERS dependency that affects results. The only minor gap is not explicitly stating whether this is a read-only operation, though 'retrieve' implies it.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured with zero waste. The first sentence establishes core purpose, the second provides usage context, the third details operational modes, and the fourth covers edge cases. Every sentence adds essential information, and the description is appropriately sized for a tool with multiple parameters and complex behavior.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, 2 operational modes, feature dependency) and the presence of an output schema (which means return values don't need description), the description is complete. It covers purpose, usage guidelines, behavioral details, parameter semantics, and edge cases. The description works effectively with the structured data available.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage and 4 parameters, the description compensates fully by explaining all parameters: 'mode' with its two values (search/list) and their distinct behaviors, 'query' for search mode, 'type_filter' as optional filtering by dossier type, and implicitly 'limit' through the context of returning results. It adds crucial semantic context beyond what the bare schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves pre-synthesized per-session memory dossiers with specific metadata (typed: experience|fact|preference; When/Involving/To-purpose). It distinguishes from sibling tools by specifying this is for multi-session or preference-style questions where stitching across conversations is the bottleneck, unlike tools like 'query_memory' or 'extract_facts_from_sessions' which appear to handle different memory operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('for multi-session or preference-style questions where stitching across conversations is the bottleneck') and mentions an important exclusion case ('Tenants without FEATURE_SESSION_DOSSIERS enabled return an empty list'). It also distinguishes between two operational modes (search vs list) with clear use cases for each.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_signals_feedGet Signals FeedCInspect

Get the signals feed for the tenant from the WebCrux platform.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It only states the action without mentioning any behavioral traits such as whether this is a read-only operation, if it requires authentication, rate limits, pagination, or what the output looks like. This leaves significant gaps for an agent to understand how the tool behaves.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence that efficiently states the tool's purpose without unnecessary words. It's appropriately sized for a simple tool, though it could be more informative without sacrificing conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that there's an output schema (which reduces the need to describe return values) and no annotations, the description is minimally adequate but incomplete. It covers the basic action but lacks details on usage, parameters, and behavioral context, making it insufficient for full understanding without relying heavily on the output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description provides no information about parameters, but the schema description coverage is 0%, meaning the schema doesn't document the parameters either. However, the description doesn't compensate by explaining what 'limit' or 'since' mean in the context of a signals feed, leaving parameters largely unexplained. The baseline is 3 due to the lack of schema coverage, but the description adds no value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the verb 'Get' and resource 'signals feed for the tenant from the WebCrux platform', which clarifies the basic action. However, it doesn't distinguish this tool from sibling tools like 'get_active_alerts' or 'get_audit_trail' that also retrieve data, making the purpose somewhat vague in context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The description lacks context about what a 'signals feed' contains or when it's appropriate to retrieve it, offering no help for an agent to decide between this and other data-fetching tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_task_contextTask ContextAInspect

Full task context: task metadata, stages with status and weight, active blockers, linked artefacts (constraints, decisions, knowledge), recent log entries, and pinned master plan version. Assembles the full picture from PlanCrux and MemoryCrux.

ParametersJSON Schema

Name	Required	Description	Default
`task_id`	Yes	PlanCrux task ID or title

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A3.5/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It mentions what data is included (e.g., metadata, blockers, artefacts) but does not specify whether this is a read-only operation, if it requires permissions, or how it handles errors like invalid task IDs. For a tool with no annotations, this leaves significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose ('Full task context...') followed by specific components and sources. It uses two sentences efficiently, though the list of components could be slightly verbose; however, each item adds value by clarifying scope without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (which handles return values) and a simple input schema with full coverage, the description provides adequate context by detailing what data is included and the sources. However, for a tool with no annotations, it could better address behavioral aspects like safety or error handling to be fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, with the single parameter 'task_id' documented as 'PlanCrux task ID or title'. The description does not add further parameter details beyond implying it assembles context for a given task. Since the schema fully covers the parameter, the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool's function: to retrieve 'full task context' including metadata, stages, blockers, linked artefacts, log entries, and master plan version. It specifies the verb 'assembles' and resource 'from PlanCrux and MemoryCrux', clearly distinguishing it from siblings like get_my_tasks (list tasks) or get_decision_context (specific to decisions).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when needing a comprehensive view of a task's context, but does not explicitly state when to use this tool versus alternatives like get_decision_context or get_incident_context. No exclusions or prerequisites are mentioned, leaving the agent to infer based on the 'full picture' phrasing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_versioned_snapshotGet Versioned SnapshotBInspect

Get the latest versioned snapshot for a VaultCrux Memory Core topic at an optional timestamp.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`topic`	Yes
`timestamp`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`at`	Yes
`items`	Yes
`topic`	Yes

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the action is 'Get' (implying read-only) but doesn't clarify permissions, rate limits, or what 'latest versioned snapshot' entails (e.g., size, format, or freshness). It mentions an optional timestamp but not how it affects retrieval (e.g., historical vs. current). This leaves significant gaps in understanding the tool's behavior beyond basic functionality.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that efficiently conveys the core action and key parameters without waste. It's front-loaded with the main purpose and includes essential details (optional timestamp) concisely. Every word earns its place, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (which handles return values), the description's job is reduced. However, with 3 parameters, 0% schema coverage, and no annotations, the description should do more to explain parameter usage and behavioral context. It covers the basics but leaves gaps in guidelines and transparency, making it minimally adequate but not fully complete for informed tool selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the schema provides no parameter descriptions. The description adds minimal semantics: it implies 'topic' is required (matching schema) and 'timestamp' is optional for historical retrieval, but doesn't explain 'limit' at all. It partially compensates for the coverage gap by hinting at parameter roles, but doesn't fully clarify usage (e.g., what 'limit' controls or timestamp format), keeping it at a baseline level.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get' and the resource 'latest versioned snapshot for a VaultCrux Memory Core topic', which is specific and understandable. It distinguishes from siblings like 'list_topics' or 'query_memory' by focusing on snapshots rather than listing or querying. However, it doesn't explicitly differentiate from all siblings (e.g., 'get_checkpoints' might be similar), so it's not a perfect 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It mentions an 'optional timestamp' but doesn't explain when to include it or what happens if omitted. There's no mention of prerequisites, error conditions, or comparison to sibling tools like 'get_checkpoints' or 'reconstruct_knowledge_state', leaving the agent with little context for decision-making.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

investigate_questionInvestigate QuestionAInspect

Composite server-side investigation tool. Pass a question and the server automatically: (1) detects intent (aggregation/temporal/ordering/knowledge-update/recall), (2) queries the entity index for structured facts, (3) builds a timeline for temporal questions, (4) retrieves memory chunks with the right scoring profile, (5) expands context around sparse hits, (6) derives counts/sums for aggregation, (7) assesses answerability, and (8) returns a recommendation. Use this as your FIRST tool for any non-trivial question — it does the multi-step investigation that would otherwise take 4-6 individual tool calls. The response includes structured facts, timeline, retrieved chunks, derived results, answerability assessment, and a recommendation for how to answer.

ParametersJSON Schema

Name	Required	Description	Default
`question`	Yes
`question_date`	No
`scoring_profile`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does an excellent job describing the tool's multi-step behavior, including intent detection, timeline building, memory retrieval, and answerability assessment. It doesn't mention rate limits, authentication needs, or error handling, but provides substantial behavioral context beyond basic functionality.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, starting with the core purpose and immediately listing the 8 steps. The second sentence provides crucial usage guidance. Every sentence adds value, though the step listing is somewhat dense.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's high complexity (composite of 8 functions), no annotations, and the presence of an output schema, the description is remarkably complete. It explains the multi-step process, when to use it, what it returns (structured facts, timeline, chunks, etc.), and the recommendation output — covering everything needed to understand this sophisticated tool's role.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must compensate but doesn't explicitly mention any parameters. However, it strongly implies the 'question' parameter through context ('Pass a question'), and the tool's comprehensive nature suggests how optional parameters might affect the investigation. It adds meaningful context about what the tool does with inputs, though not explicit parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool performs a 'composite server-side investigation' with 8 specific steps (detects intent, queries entity index, builds timeline, etc.), clearly distinguishing it from sibling tools like 'query_memory' or 'build_timeline' by combining multiple functions into one comprehensive operation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance: 'Use this as your FIRST tool for any non-trivial question — it does the multi-step investigation that would otherwise take 4-6 individual tool calls.' This clearly indicates when to use it (non-trivial questions) and contrasts it with alternatives (multiple individual tool calls).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

issue_passportIssue PassportAInspect

Issue an agent passport with identity, sponsor lineage, and a 5-tier reputation system. Stored as a BLAKE3-hashed fact. Tiers: unverified (0), basic (10+ receipts), established (100+), trusted (500+), elite (2000+). Required before sync operations. Idempotent: calling twice returns the existing passport.

ParametersJSON Schema

Name	Required	Description	Default
`sponsor_id`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does so well: it discloses idempotency behavior, storage format (BLAKE3-hashed fact), and tier thresholds. However, it lacks details on permissions, rate limits, or error conditions, preventing a perfect score.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, followed by essential details (tiers, storage, prerequisites, idempotency) in a logical flow. Every sentence adds value with zero redundancy, making it highly efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations, 0% schema coverage, and an output schema present, the description covers key behavioral aspects (idempotency, tiers, prerequisites) but omits parameter explanations and error handling. It's mostly complete for a creation tool but has minor gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0% with 1 parameter, but the description adds no information about the 'sponsor_id' parameter beyond what the schema minimally provides (string type). Baseline is 3 since the schema handles basic structure, but the description fails to explain the parameter's purpose or format.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('issue an agent passport') and details what it creates (identity, sponsor lineage, 5-tier reputation system, BLAKE3-hashed fact). It distinguishes itself from sibling tools like 'get_passport' by focusing on creation rather than retrieval, with no tautology or vagueness.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool ('required before sync operations') and provides clear idempotency guidance ('calling twice returns the existing passport'). It differentiates from alternatives by implying this is for initial setup, not ongoing operations like 'sync_pull' or 'sync_push'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_external_servicesList External ServicesAInspect

List registered external services available to the calling agent. Returns service IDs, display names, allowed methods/paths, and rate limit state. Does not return credentials or credential references.

ParametersJSON Schema

Name	Required	Description	Default
`status`	No	Filter by service status

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does well by disclosing key behavioral traits: it's a read operation (implied by 'List'), returns specific data fields (IDs, names, methods/paths, rate limit state), and explicitly states what is excluded (credentials). However, it lacks details on permissions, pagination, or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose in the first sentence, followed by specific output details and exclusions in a second sentence. Every sentence adds value with no wasted words, making it highly efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (1 optional parameter), no annotations, and the presence of an output schema (which handles return values), the description is nearly complete. It covers purpose, output scope, and exclusions, but could improve by mentioning when to use versus siblings or behavioral aspects like rate limits.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents the single optional parameter 'status' with its enum values. The description adds no additional parameter information beyond what the schema provides, meeting the baseline score of 3 for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('List'), resource ('registered external services'), and scope ('available to the calling agent'), with specific output details. It distinguishes from potential siblings by explicitly stating what it does not return (credentials or credential references), though no direct sibling tool is mentioned for comparison.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for retrieving service information without credentials, but provides no explicit guidance on when to use this tool versus alternatives or any prerequisites. It does not reference sibling tools like 'register_external_service' or 'request_credentialed_call' for context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_topicsList TopicsCInspect

List VaultCrux Memory Core topic groups with freshness metadata.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`items`	Yes

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. While 'List' implies a read operation, the description doesn't disclose important behavioral aspects: whether this requires authentication, how results are ordered, what 'freshness metadata' includes, whether there's pagination beyond the limit parameter, or what happens when limit is omitted. The mention of 'freshness metadata' adds some context but is vague.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the core purpose. Every word earns its place: 'List' (action), 'VaultCrux Memory Core topic groups' (resource), 'with freshness metadata' (additional feature). There's no wasted verbiage or redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (which should document return values) and only one simple parameter, the description is reasonably complete for a basic listing operation. However, with no annotations and many sibling tools in similar domains, the description could better contextualize this tool's specific role. The mention of 'freshness metadata' adds useful context but could be more specific.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. The description mentions 'freshness metadata' which might relate to output rather than the single 'limit' parameter. It provides no information about the limit parameter's purpose, default behavior when omitted, or how it interacts with the listing operation. The description fails to add meaningful parameter semantics beyond what the bare schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('List') and resource ('VaultCrux Memory Core topic groups') with additional detail about 'freshness metadata'. It distinguishes this as a listing operation rather than querying or other memory operations. However, it doesn't explicitly differentiate from similar-sounding siblings like 'query_memory' or 'enumerate_memory_facts'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With many sibling tools related to memory and context retrieval (e.g., 'query_memory', 'enumerate_memory_facts', 'get_relevant_context'), there's no indication of when this specific listing operation is appropriate versus other approaches to accessing topic information.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

log_progressLog ProgressAInspect

Receipted write-through to PlanCrux's log endpoint. Appends a structured log entry to a task with optional evidence references and stage binding. Cannot change task or stage status (human-only), but records work done, findings, and blockers encountered.

ParametersJSON Schema

Name	Required	Description
`note`	Yes	What was done
`task_id`	Yes	PlanCrux task ID
`evidence`	No	Evidence references for the log entry
`stage_id`	No	Bind this log to a specific stage

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It does this well by describing key behavioral traits: it's a write operation ('Receipted write-through'), it appends data (doesn't modify existing logs), it has specific limitations (cannot change task/stage status), and it records specific types of information. It doesn't mention rate limits, authentication needs, or error behavior, but provides substantial operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in two sentences. The first sentence establishes the core function and key parameters. The second sentence clarifies limitations and scope. Every phrase adds value with zero redundancy, making it easy for an agent to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that there's an output schema (though not shown), the description doesn't need to explain return values. For a logging tool with 4 parameters and 100% schema coverage, the description provides excellent context: it explains the tool's purpose, when to use it, behavioral characteristics, and limitations. This is complete enough for an agent to understand and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description adds some context about what parameters represent ('optional evidence references and stage binding'), but doesn't provide additional semantic meaning beyond what's in the schema descriptions. This meets the baseline expectation when schema coverage is complete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Appends a structured log entry to a task' with specific details about what it records ('work done, findings, and blockers encountered'). It distinguishes itself from potential alternatives by explicitly stating what it cannot do ('Cannot change task or stage status'), which helps differentiate it from tools that might modify task state.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: it specifies when to use this tool (to record progress on a task) and when not to use it ('Cannot change task or stage status (human-only)'). It also implicitly suggests alternatives by stating its limitations, helping the agent understand this is for logging only, not for status updates.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

promote_skillPromote SkillAInspect

Promote a reviewed skill submission to Engine artifacts, making it retrievable via get_relevant_context. Only pending_review skills can be promoted. Returns the promoted artifact ID.

ParametersJSON Schema

Name	Required	Description
`visibility`	No	Override target visibility
`review_notes`	No	Notes from the reviewer
`submission_id`	Yes	ID of the skill submission to promote

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses the mutation nature ('Promote'), the precondition ('pending_review'), and the return value ('promoted artifact ID'). However, it lacks details on permissions needed, side effects (e.g., changes to submission status), or error handling, which are important for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with zero waste: the first states purpose and outcome, the second gives a critical constraint and return value. It is front-loaded with essential information and appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (implied by 'Has output schema: true'), the description needn't explain return values. It covers purpose, usage constraint, and references sibling tools adequately. However, as a mutation tool with no annotations, it could benefit from more behavioral context like auth requirements or idempotency.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema fully documents all three parameters. The description does not add any parameter-specific details beyond what the schema provides (e.g., it doesn't explain 'visibility' or 'review_notes' usage). Baseline 3 is appropriate as the schema handles the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Promote'), resource ('a reviewed skill submission'), and outcome ('to Engine artifacts, making it retrievable via get_relevant_context'). It distinguishes from sibling tools like 'submit_skill' (submission) and 'retract_skill' (removal) by focusing on promotion of pending_review items.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Only pending_review skills can be promoted,' providing a clear prerequisite condition. It also references the sibling tool 'get_relevant_context' as the retrieval mechanism post-promotion, giving context for when the tool's output is used.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

provision_agent_contextProvision Agent ContextAInspect

Generate a structured, receipted context package for a new agent — the equivalent of soul.md + heartbeat.md + user.md but machine-queryable and backed by MemoryCrux records. Includes identity, operator context, active constraints, operating rhythms, knowledge briefing, and heartbeat schedule. Scope: minimal (identity + constraints), standard (all), comprehensive (standard + full knowledge).

ParametersJSON Schema

Name	Required	Default
`scope`	No	standard
`agent_role`	Yes
`operator_id`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A3.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It describes the tool as generating a context package but lacks details on permissions, side effects, rate limits, or response format. While it mentions the package is 'machine-queryable and backed by MemoryCrux records,' this is vague and does not fully explain behavioral traits like whether this is a read-only or write operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, starting with the core action and purpose, then detailing inclusions and scope. Every sentence adds value, but it could be slightly more concise by integrating the scope explanation more tightly with the initial statement.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (provisioning agent context with multiple parameters) and the presence of an output schema, the description is moderately complete. It covers the purpose and scope well but lacks behavioral details due to no annotations, and parameter semantics are only partially addressed. The output schema likely handles return values, but the description could better explain the tool's role in the broader context of sibling tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaning beyond the input schema by explaining the scope options (minimal, standard, comprehensive) in terms of content (identity + constraints, all, standard + full knowledge), which clarifies the 'scope' parameter. However, with 0% schema description coverage, it does not fully compensate for the undocumented parameters 'agent_role' and 'operator_id,' leaving their semantics unclear.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('generate a structured, receipted context package') and resources ('for a new agent'), distinguishing it from siblings by focusing on agent provisioning rather than querying, analysis, or constraint management. It explicitly mentions what the package includes (identity, operator context, constraints, etc.) and the scope options.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool by specifying it's for generating context packages for new agents, with scope options (minimal, standard, comprehensive) that guide usage based on needs. However, it does not explicitly state when not to use it or name alternative tools for similar purposes among the siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

query_memoryQuery MemoryAInspect

Search the user's conversation memory. Returns ranked results with content, source timestamps, and confidence scores. For KNOWLEDGE UPDATE questions ('current', 'now', 'most recent'): make two calls — one with scoring_profile='balanced' and one with scoring_profile='recency' — then use the value from the most recent source_timestamp. For COUNTING questions ('how many', 'total'): results may not be exhaustive — search with varied terms and enumerate explicitly before counting. If all results score below 0.3, reformulate with synonyms or specific entity names from the question.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`query`	Yes
`topic`	No
`date_to`	No
`agent_id`	No
`date_from`	No
`date_range`	No
`question_date`	No
`scoring_profile`	No
`confidence_threshold`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`results`	Yes

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: the tool returns ranked results with specific fields, requires specific call patterns for certain question types, and includes fallback strategies for low-confidence results. It doesn't mention rate limits, authentication needs, or destructive effects, but covers operational nuances well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose, followed by specific usage guidelines. Each sentence adds value, but it could be slightly more concise by combining some instructions. Overall, it's efficient with minimal waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (9 parameters, no annotations, but with output schema), the description is highly complete. It explains the tool's purpose, detailed usage scenarios, behavioral patterns, and fallback strategies. The presence of an output schema means return values don't need explanation, and the description covers operational context thoroughly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage for 9 parameters, the description must compensate. It mentions 'scoring_profile' values ('balanced' and 'recency') and implies usage of 'query' parameter, but doesn't explain other parameters like 'limit', 'topic', or date filters. While it adds some semantic context, it doesn't fully cover all parameters, keeping it from a perfect score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Search the user's conversation memory. Returns ranked results with content, source timestamps, and confidence scores.' It specifies the verb ('Search'), resource ('user's conversation memory'), and output format, distinguishing it from sibling tools like 'enumerate_memory_facts' or 'reconstruct_knowledge_state' which likely serve different purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use specific approaches: for KNOWLEDGE UPDATE questions, make two calls with different scoring profiles; for COUNTING questions, search with varied terms and enumerate explicitly; and for low-confidence results, reformulate with synonyms. It offers clear alternatives and context-specific instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

reconstruct_knowledge_stateReconstruct Knowledge StateAInspect

Reconstruct what the system knew at a specific point in time. Returns both current and superseded artefacts as of that timestamp. Use for temporal reasoning: 'what was true in January?' vs 'what is true now?' Compare two calls at different timestamps to see what changed.

ParametersJSON Schema

Name	Required	Description	Default
`decision_id`	Yes
`at_timestamp`	No
`include_superseded`	No
`include_confidence_landscape`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It explains the tool returns 'both current and superseded artefacts as of that timestamp,' which clarifies output behavior. However, it doesn't address potential limitations like rate limits, authentication needs, or error conditions that would be important for a reconstruction tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in three sentences: first states the core function, second explains the return value, third provides usage examples. Every sentence adds value with zero waste, making it easy to parse while being information-dense.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (temporal reconstruction with 4 parameters) and the presence of an output schema (which handles return values), the description provides strong contextual completeness. It explains the tool's purpose, usage scenarios, and behavioral characteristics well. The main gap is lack of parameter details, but the output schema compensates for return value documentation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage for all 4 parameters, the description provides no explicit parameter information. However, the context ('at a specific point in time') implicitly explains the purpose of the 'at_timestamp' parameter, and the mention of 'superseded artefacts' relates to 'include_superseded.' This provides some semantic context beyond the bare schema, though not comprehensive coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific verb ('reconstruct') and resource ('what the system knew'), distinguishing it from siblings like 'get_versioned_snapshot' or 'get_audit_trail' by emphasizing temporal reasoning and comparison of changes over time. It explicitly defines the tool's unique function in the system's knowledge management.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool ('for temporal reasoning: what was true in January? vs what is true now?') and how to apply it ('Compare two calls at different timestamps to see what changed'). It clearly differentiates this from static state retrieval tools among its siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

record_decision_contextRecord Decision ContextCInspect

Record a decision context event in the CoreCrux Decision Plane. This is a mutation operation.

ParametersJSON Schema

Name	Required	Description	Default
`context`	Yes
`agent_id`	No
`session_id`	Yes
`decision_id`	Yes
`occurred_at`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It only states this is a 'mutation operation,' which implies write capabilities but lacks details on permissions, side effects, error handling, or response format. This is insufficient for a mutation tool with zero annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief and front-loaded with the core purpose in the first sentence. However, the second sentence ('This is a mutation operation.') is somewhat redundant given the verb 'Record' already implies mutation, slightly reducing efficiency.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, mutation operation, nested objects) and the presence of an output schema (which reduces need to describe return values), the description is minimally adequate but incomplete. It lacks parameter explanations and behavioral details, though the output schema helps mitigate some gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning none of the 5 parameters are documented in the schema. The description adds no information about parameters like 'session_id', 'decision_id', or 'context', failing to compensate for the complete lack of schema documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Record a decision context event') and the target system ('CoreCrux Decision Plane'), which is specific and informative. However, it doesn't explicitly differentiate this tool from its sibling 'get_decision_context' (a likely read counterpart), missing full sibling distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, prerequisites, or constraints. It merely restates the operation type ('mutation operation') without contextual usage advice, leaving the agent with minimal direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

register_external_serviceRegister External ServiceAInspect

Register an external service and store its credential via Vault Transit. Human-only (admin or owner role). The credential is encrypted immediately on receipt and never stored in plaintext. Returns the service registration record without the credential.

ParametersJSON Schema

Name	Required	Description
`base_url`	Yes	Service base URL (must be HTTPS)
`auth_type`	Yes	Authentication type
`rate_limit`	No	Rate limits for this service
`service_id`	Yes	Tenant-unique identifier for the service
`display_name`	Yes	Human-readable service name
`allowed_paths`	No	Allowed path patterns (default: *)
`auth_injection`	No	How to inject the credential into requests
`allowed_methods`	No	Allowed HTTP methods (default: GET, POST)
`credential_value`	Yes	The credential (encrypted immediately, never stored plaintext)

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing critical behavioral traits: the credential is 'encrypted immediately on receipt and never stored in plaintext', and it 'Returns the service registration record without the credential'. This covers security handling and output behavior, though it could add more on error cases or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, followed by key behavioral details and output information in just three sentences. Every sentence adds value without redundancy, making it highly efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (9 parameters, nested objects) and no annotations, the description does well by covering security handling and output. With an output schema present, it doesn't need to detail return values. However, it could improve by mentioning mutation implications or error scenarios for a registration tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 9 parameters thoroughly. The description adds minimal parameter semantics beyond the schema, only implying the credential_value parameter's handling. Baseline 3 is appropriate as the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Register an external service and store its credential via Vault Transit'), identifies the resource ('external service'), and distinguishes from sibling tools like 'list_external_services' by focusing on creation rather than listing. The verb 'register' is precise and the scope is well-defined.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states 'Human-only (admin or owner role)', providing clear context about who can use this tool. However, it doesn't mention when to use this versus alternatives (e.g., updating vs. registering) or any prerequisites beyond role requirements, leaving some guidance gaps.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

request_credentialed_callCredentialed External CallAInspect

Proxied external API call with server-side credential injection. VaultCrux retrieves the credential from Vault Transit, injects it per the service's auth template, makes the call, and returns the response. The credential never appears in your context window. Request and response are receipted and audit-logged.

ParametersJSON Schema

Name	Required	Description
`path`	Yes	Request path (e.g. /v1/chat/completions)
`method`	Yes	HTTP method
`headers`	No	Additional headers (not auth)
`payload`	No	Request body for POST/PUT
`service_id`	Yes	Registered service ID
`session_id`	No	Session ID for audit trail

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behavioral traits: credential injection from Vault Transit, credential security (never appears in context window), audit logging of requests/responses, and proxying of calls. It does not mention rate limits, error handling, or response formats, but covers the essential security and logging aspects well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with every sentence earning its place. It efficiently explains the tool's core functionality, security features, and logging in three concise sentences without any wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (external API calls with credential injection), no annotations, rich input schema with 6 parameters, and presence of an output schema, the description is mostly complete. It covers the purpose, security, and logging well, but could benefit from mentioning response handling or error scenarios. The output schema reduces the need to explain return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters thoroughly. The description does not add specific parameter semantics beyond what the schema provides, such as explaining how service_id maps to credential templates or clarifying payload usage. It meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('Proxied external API call', 'retrieves', 'injects', 'makes the call', 'returns the response') and resources ('credential', 'Vault Transit', 'service's auth template', 'request and response'). It distinguishes itself from sibling tools by focusing on credentialed external calls, which none of the listed siblings appear to do.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: for making external API calls that require credentials stored in Vault Transit. It implies usage by describing the credential injection process and audit logging. However, it does not explicitly state when not to use it or name alternatives among the sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

retract_skillRetract SkillAInspect

Retract a previously promoted skill. Sets the Engine artifact's living status to 'retracted', removing it from future retrieval results. Use when a skill is found to be incorrect or outdated.

ParametersJSON Schema

Name	Required	Description
`notes`	No	Additional notes
`reason`	Yes	Reason for retraction
`submission_id`	Yes	ID of the skill submission to retract

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It effectively describes the behavioral outcome ('sets living status to retracted, removing from future retrieval results'), which is a destructive change. However, it doesn't mention permissions needed, whether the action is reversible, or any rate limits. For a mutation tool with zero annotation coverage, this is good but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with zero waste. First sentence states the action and outcome, second provides usage guidance. Every word earns its place and the information is front-loaded appropriately.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given this is a mutation tool with no annotations but with an output schema (implied by context signals), the description covers the core purpose and usage well. It doesn't explain return values, but the output schema should handle that. For a destructive operation, it could benefit from more behavioral context about permissions or reversibility.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all three parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema. Baseline 3 is appropriate when the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('retract a previously promoted skill'), the resource affected ('Engine artifact'), and the outcome ('sets living status to retracted, removing from future retrieval results'). It distinguishes from sibling tools like 'promote_skill' and 'dismiss_skill' by focusing on retraction of promoted skills.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use this tool: 'Use when a skill is found to be incorrect or outdated.' This provides clear context for invocation and distinguishes it from other skill-related tools like 'dismiss_skill' or 'submit_correction'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

run_elicitationRun ElicitationAInspect

Start or resume a structured elicitation session for a specific knowledge layer. Layers: operating_rhythm, recurring_decisions, dependencies, friction, judgment_patterns, quality_standards, or 'full' for all layers. Returns the first question and session metadata. Use answer_elicitation to submit answers.

ParametersJSON Schema

Name	Required	Description	Default
`layer`	Yes
`operator_id`	No
`resume_from`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It reveals that the tool returns 'the first question and session metadata,' which is useful behavioral information. However, it doesn't disclose other important traits like whether this is a read-only or write operation, authentication requirements, rate limits, or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly concise and front-loaded. The first sentence establishes the core purpose, the second explains parameter semantics, the third describes the return value, and the fourth provides usage guidance. Every sentence earns its place with zero wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (3 parameters, no annotations, but with output schema), the description is reasonably complete. It explains the purpose, parameter values, return value, and next step. The existence of an output schema means the description doesn't need to detail return values. However, for a session initiation tool with no annotations, more behavioral context would be beneficial.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates well by explaining the 'layer' parameter's semantics: it lists all possible values and clarifies that 'full' means 'all layers.' It doesn't explain 'operator_id' or 'resume_from' parameters, but the coverage of the required 'layer' parameter is excellent. For 3 parameters with no schema descriptions, this is above baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Start or resume a structured elicitation session for a specific knowledge layer.' It specifies the action (start/resume), the resource (elicitation session), and the target (knowledge layer). However, it doesn't explicitly differentiate from sibling tools like 'answer_elicitation' beyond mentioning it as a follow-up action.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: to begin or continue an elicitation session. It explicitly mentions 'Use answer_elicitation to submit answers,' which gives guidance on the next step. However, it doesn't specify when NOT to use this tool or mention alternatives among the many sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

session_debriefSession DebriefAInspect

Structured session-end reflection. Routes discoveries to appropriate capture tools (suggest_constraint, submit_skill, flag_for_review). Produces a receipted debrief record. Call before closing any session longer than 10 minutes.

ParametersJSON Schema

Name	Required	Description
`session_id`	Yes	Session ID for the debrief
`discoveries`	No	Discoveries made during the session
`suggested_actions`	No	Actions to route from session discoveries
`assumptions_validated`	No	Assumptions that were validated during the session
`assumptions_invalidated`	No	Assumptions that were invalidated during the session

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It describes the tool's behavior ('Routes discoveries to appropriate capture tools' and 'Produces a receipted debrief record'), which is helpful. However, it doesn't mention potential side effects, error conditions, or what happens if the session_id doesn't exist. For a tool with no annotations, this leaves some behavioral aspects unclear.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise and front-loaded: three sentences that each serve a distinct purpose (what it does, what it produces, when to call it). There's zero wasted language, and the most important information comes first.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (structured reflection with routing logic), no annotations, but a rich input schema (5 parameters with 100% coverage) and an output schema exists, the description provides good context. It explains the tool's purpose, output ('receipted debrief record'), and usage timing. The main gap is lack of behavioral details about errors or side effects, but the output schema will handle return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 5 parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema. The baseline score of 3 is appropriate when the schema does all the parameter documentation work.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs and resources: 'Structured session-end reflection' and 'Routes discoveries to appropriate capture tools'. It distinguishes from siblings by focusing on session-end processing rather than ongoing analysis or data retrieval tools like 'get_' or 'check_' functions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: 'Call before closing any session longer than 10 minutes.' This gives clear temporal context and a quantitative threshold for when to use this tool versus not using it. No alternatives are mentioned, but the temporal condition is specific and actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

submit_correctionSubmit CorrectionAInspect

Submit a correction for a knowledge item with evidence chain. The original item is never mutated — a versioned enrichment layer is created.

ParametersJSON Schema

Name	Required	Description	Default
`evidence`	Yes
`correction_type`	Yes
`original_item_id`	Yes
`corrected_content`	Yes
`parent_receipt_id`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A3.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It effectively discloses key behavioral traits: it creates a versioned enrichment layer without mutating the original item, which clarifies safety and data integrity. However, it lacks details on permissions, response format, or error handling, leaving some gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded and efficient: two sentences with zero waste. The first sentence states the purpose, and the second clarifies the behavioral mechanism, making it easy to grasp quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, no annotations) but with an output schema (which handles return values), the description is moderately complete. It covers the core operation and safety but lacks details on parameter usage, error cases, or integration with sibling tools, leaving room for improvement.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It mentions 'evidence chain' and 'correction,' which loosely relate to parameters like 'evidence' and 'correction_type,' but does not explain parameter meanings, constraints, or interactions. The description adds minimal semantic value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Submit a correction'), the target ('for a knowledge item'), and the mechanism ('with evidence chain'). It distinguishes this from simple mutation tools by explicitly stating 'The original item is never mutated — a versioned enrichment layer is created,' which is specific and informative.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With many sibling tools like 'get_correction_chain' or 'update_constraint,' there is no indication of prerequisites, scenarios, or comparisons. Usage is implied only by the tool's name and purpose.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

submit_skillSubmit SkillAInspect

Submit a procedural workflow skill discovered during work. Pro+ private skills auto-approve; Starter skills enter a review queue. ATAM injection scanning runs automatically — quarantined skills cannot be promoted. Returns submission ID, approval status, and scan results.

ParametersJSON Schema

Name	Required	Description	Default
`title`	Yes	Short title summarising the skill
`run_id`	No	AgentCrux run ID
`content`	Yes	Full procedural skill content (markdown)
`agent_role`	No	Role of the submitting agent
`session_id`	No	Session ID for provenance tracking
`skill_domains`	No	Knowledge domains this skill applies to
`discovery_context`	No	How/where the skill was discovered
`target_visibility`	No	Visibility scope for the skill	private
`skill_tool_references`	No	Tool names this skill references
`supersedes_artifact_id`	No	Artifact ID this skill replaces
`skill_trigger_description`	No	When this skill should be activated

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behavioral traits: the approval process varies by subscription level, ATAM injection scanning runs automatically, quarantined skills cannot be promoted, and it returns specific information (submission ID, approval status, scan results). This covers important operational aspects beyond basic functionality.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is perfectly front-loaded with the core purpose in the first sentence, followed by important behavioral details. Every sentence earns its place by providing essential information about the submission process, approval workflow, scanning, and return values. No wasted words or redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (11 parameters, mutation operation) and the presence of an output schema (which handles return value documentation), the description provides excellent contextual completeness. It covers the submission process, approval mechanisms, scanning behavior, and constraints - everything needed to understand when and how to use this tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 11 parameters thoroughly. The description doesn't add any parameter-specific information beyond what's in the schema. According to scoring rules, when schema coverage is high (>80%), the baseline score is 3 even with no parameter info in the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('Submit a procedural workflow skill discovered during work') and resource ('skill'), distinguishing it from sibling tools like 'promote_skill' or 'retract_skill'. It provides concrete details about what happens after submission, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use this tool (for submitting discovered skills) and implies usage scenarios through details like 'Pro+ private skills auto-approve' and 'Starter skills enter a review queue'. However, it doesn't explicitly state when NOT to use it or name specific alternatives among the sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

suggest_constraintSuggest ConstraintAInspect

Propose an organisational constraint discovered during work for human review. Agents can suggest boundaries, policies, or context flags they discover — humans decide whether to promote them to active constraints. Low barrier (1 credit); authority gate is on promotion, not suggestion.

ParametersJSON Schema

Name	Required	Description	Default
`scope`	No
`evidence`	No
`severity`	No
`assertion`	Yes
`confidence`	No
`session_id`	No
`constraint_type`	Yes
`discovery_context`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does well by disclosing key behavioral traits: it's a low-barrier operation (1 credit), involves human review, and authority gates are on promotion not suggestion. It could improve by mentioning response format or error handling, but covers essential mutation and permission aspects adequately.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with every sentence earning its place. The first sentence states the core purpose, the second elaborates on scope and human role, and the third clarifies cost and authority—all without redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (8 parameters, nested objects, no annotations) and an output schema (which reduces need to explain returns), the description is partially complete. It covers purpose, usage, and behavioral context well, but lacks parameter details, making it inadequate for full understanding without schema inspection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate but adds no parameter-specific information. It mentions general concepts like 'boundaries, policies, or context flags' which loosely relate to 'constraint_type', but doesn't explain any of the 8 parameters (e.g., 'assertion', 'evidence', 'severity') or their semantics, leaving significant gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with specific verbs ('propose', 'suggest') and resource ('organisational constraint'), distinguishing it from siblings like 'declare_constraint' or 'update_constraint' by emphasizing human review and low barrier. It explicitly mentions what agents can suggest (boundaries, policies, context flags) and the human decision-making role.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: for proposing constraints discovered during work for human review, with a low barrier (1 credit). It distinguishes from alternatives by noting that authority gates apply to promotion (e.g., 'declare_constraint') not suggestion, and specifies the types of constraints (boundaries, policies, context flags) to suggest.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sync_pullSync PullAInspect

Pull enriched facts from a remote memory instance into the local fact store. Uses cursor-based pagination and resumes from the last pull cursor. Pulled facts carry a sync:source_receipt so they are never pushed back. Requires an agent passport with at least basic tier (10+ receipts).

ParametersJSON Schema

Name	Required	Description	Default
`entity_prefix`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well. It discloses key behavioral traits: cursor-based pagination, resumption from last cursor, that pulled facts carry 'sync:source_receipt' to prevent push-back, and authentication requirements (agent passport tier). It doesn't mention rate limits, error handling, or performance characteristics, but covers essential operational details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured in three sentences: first states the core purpose, second explains pagination/resumption mechanics, third specifies authentication requirements. Each sentence adds distinct value with zero wasted words, making it easy to parse and front-loaded with essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (synchronization operation), no annotations, and an output schema exists (so return values are documented elsewhere), the description is moderately complete. It covers purpose, behavior, and prerequisites well but has a significant gap in parameter documentation. For a sync tool with authentication requirements, it should ideally explain the single parameter's role.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 1 parameter with 0% description coverage. The tool description doesn't mention the 'entity_prefix' parameter at all, leaving its purpose and usage completely undocumented. While the description explains the tool's overall behavior, it fails to compensate for the schema's lack of parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Pull enriched facts from a remote memory instance into the local fact store.' It specifies the verb ('pull'), resource ('enriched facts'), and directionality ('from remote to local'). However, it doesn't explicitly differentiate from sibling tools like 'sync_push' or 'query_memory' beyond the pull/push distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: for pulling facts from remote to local with cursor-based pagination and resumption. It mentions prerequisites ('Requires an agent passport with at least basic tier') but doesn't explicitly state when NOT to use it or name alternatives (e.g., 'sync_push' for pushing, 'query_memory' for querying without syncing).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sync_pushSync PushAInspect

Push local facts to a remote memory instance. Private facts and sensitive entity prefixes (finance:, health:, credentials:, etc.) are never pushed. Call without confirm=true for a preview. Requires established passport tier (100+ receipts).

ParametersJSON Schema

Name	Required	Description	Default
`confirm`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well: it discloses privacy behavior ('Private facts and sensitive entity prefixes... are never pushed'), preview functionality, and authentication requirements. It doesn't mention rate limits or error handling, preventing a perfect score.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with zero waste: first states core purpose, second details privacy exclusions, third covers parameter usage and prerequisites. Every sentence earns its place, and information is front-loaded appropriately.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with no annotations but an output schema, the description covers purpose, privacy constraints, parameter usage, and prerequisites well. It doesn't explain what 'facts' are or the push mechanism details, but the output schema likely handles return values, making this reasonably complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage for the single parameter 'confirm', the description fully compensates by explaining its semantics: 'Call without confirm=true for a preview' clarifies this boolean controls actual execution vs. dry-run. This adds crucial meaning beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Push local facts') and target ('to a remote memory instance'), providing a specific verb+resource combination. However, it doesn't explicitly differentiate from sibling 'sync_pull', which would be needed for a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: 'Call without confirm=true for a preview' tells when to use which mode, and 'Requires established passport tier (100+ receipts)' specifies prerequisites. This gives clear when-to-use instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_constraintUpdate ConstraintAInspect

Update an existing constraint. Content changes create a new version (append-only). Status-only changes update in place. This is a mutation operation.

ParametersJSON Schema

Name	Required	Description	Default
`scope`	No
`status`	No
`evidence`	No
`severity`	No
`assertion`	No
`expires_at`	No
`constraint_id`	Yes
`assertion_structured`	No
`review_interval_days`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A3.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It adds valuable context beyond basic functionality: it specifies that content changes are append-only (creating new versions) while status changes update in place, and explicitly states 'This is a mutation operation,' clarifying it's a write action. This covers key behavioral traits like versioning and mutation type, though it could include more on permissions or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and front-loaded, with three concise sentences that each add value: the first states the purpose, the second explains versioning behavior, and the third clarifies it's a mutation. There is no wasted text, making it efficient and easy to parse for an AI agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (9 parameters, nested objects, enums) and no annotations, the description is partially complete. It covers behavioral aspects like mutation and versioning, and an output schema exists, so return values needn't be explained. However, it lacks parameter semantics and usage details, making it adequate but with clear gaps for a tool of this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, meaning parameters are undocumented in the schema. The description does not add any meaning beyond the input schema; it doesn't explain what parameters like 'scope', 'evidence', or 'assertion' mean or how they should be used. With 9 parameters and no guidance in the description, this is a significant gap, scoring low as it fails to compensate for the lack of schema documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Update') and resource ('an existing constraint'), specifying what the tool does. It distinguishes from siblings like 'declare_constraint' or 'suggest_constraint' by focusing on updates rather than creation or suggestion. However, it doesn't explicitly differentiate from other update-related tools if any exist in the sibling list, keeping it at 4 rather than 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by stating 'Content changes create a new version (append-only). Status-only changes update in place,' which provides some context for when to use it based on the type of update. However, it doesn't explicitly state when to use this tool versus alternatives like 'declare_constraint' for creation or other siblings, nor does it mention prerequisites or exclusions, leaving room for improvement.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

update_operator_profileUpdate Operator ProfileAInspect

Update the operator profile. Creates a new version with provenance receipt (does not mutate). Accepts partial updates — fields not provided are preserved from the previous version.

ParametersJSON Schema

Name	Required	Description	Default
`operator_id`	No
`quality_bar`	No
`trusted_sources`	No
`domain_expertise`	No
`experience_level`	No
`role_description`	No
`distrusted_sources`	No
`communication_preferences`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A3.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden and does well by disclosing key behavioral traits: it 'creates a new version with provenance receipt (does not mutate),' clarifying this is a non-destructive versioning operation, and explains partial update semantics. However, it doesn't cover permissions, rate limits, or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences that are front-loaded with the core action and efficiently convey versioning and partial update behavior. Every phrase adds value with no wasted words, making it easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 8 parameters with 0% schema coverage, nested objects, and an output schema (which reduces need to explain returns), the description is incomplete. It covers behavioral aspects well but lacks parameter details and context on when to use, leaving gaps for a mutation tool with complex inputs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate but only mentions 'fields' generally without detailing specific parameters like 'operator_id' or 'experience_level.' It adds some value with 'partial updates' and preservation behavior, but doesn't explain what each parameter means or their formats.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Update') and resource ('operator profile'), and specifies it creates a new version with provenance receipt. However, it doesn't explicitly differentiate from sibling tools like 'get_operator_profile' or explain why to use this over alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage through 'partial updates' and 'fields not provided are preserved,' suggesting when to use it for incremental updates. But there's no explicit guidance on when to use this versus other profile-related tools or prerequisites for operation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

verify_before_actingVerify Before ActingAInspect

Pre-action and pre-conclusion verification gate. Checks Shield policy, org constraints, watch alerts, knowledge pressure, and memory freshness. Returns a combined verdict: proceed, warn, require_approval, or block. Use before committing to an answer when the stakes are high or when your evidence is thin — it catches constraint conflicts and stale-context risks that query_memory alone won't surface.

ParametersJSON Schema

Name	Required	Description	Default
`team_id`	No
`metadata`	No
`tool_name`	Yes
`is_mutation`	No
`publisher_id`	No
`server_digest`	No
`target_resources`	No
`action_description`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool's behavior: it performs checks across multiple systems, returns a combined verdict with specific outcomes (proceed, warn, require_approval, block), and catches constraint conflicts and stale-context risks. However, it lacks details on rate limits, error handling, or authentication needs, leaving some behavioral aspects uncovered.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose and checks, followed by usage guidance. Every sentence adds value: the first defines the tool's function and output, the second provides critical context on when to use it and its differentiation. There is no wasted text, making it highly efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 parameters, no schema descriptions, no annotations, but with an output schema), the description is partially complete. It excels in purpose and usage but fails to address parameter semantics, which is a significant gap. The output schema likely covers return values, so the description doesn't need to explain those, but the lack of parameter guidance reduces overall completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, meaning none of the 8 parameters are documented in the schema. The description does not mention any parameters or their meanings, failing to compensate for the schema gap. This leaves the agent with no guidance on what inputs like 'tool_name', 'action_description', or 'metadata' should contain, significantly hindering correct invocation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as a verification gate that checks multiple systems (Shield policy, org constraints, watch alerts, knowledge pressure, memory freshness) and returns a verdict. It distinguishes from sibling tools by explicitly mentioning 'query_memory alone won't surface' these risks, making the scope and differentiation specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'before committing to an answer when the stakes are high or when your evidence is thin.' It also specifies an alternative ('query_memory alone won't surface') and implies exclusions for low-stakes or well-evidenced scenarios, offering comprehensive usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Resources

Need Help?