VaultCrux Memory Core
Server Details
VaultCrux Memory Core — 32 tools: knowledge, decisions, constraints, signals, coverage
- Status
- Unhealthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Score is being calculated. Check back soon.
Available Tools
52 toolsassess_answerabilityAssess AnswerabilityAInspect
Sufficiency gate — can this question be answered with current evidence? Pass your query and optionally the fact rows you have gathered. Returns: answerable (yes/no), missing fields, contradictory fields, recommended next tool, and confidence. Use this BEFORE forcing a best-guess answer. If answerable=false, it is better to say 'insufficient evidence' than to guess wrong.
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | ||
| candidate_rows | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden and succeeds in explaining the decision logic (sufficiency assessment) and detailed output structure (answerable boolean, missing/contradictory fields, confidence). It clearly states the behavioral consequence of a negative result (insufficient evidence response). Minor gap: does not mention idempotency or side effects, though these are likely absent.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Five sentences, zero waste. Front-loaded with the core purpose ('Sufficiency gate'), followed by inputs, outputs, usage timing, and error handling. Every clause adds value beyond the structured data.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Comprehensive for a tool with 0% schema coverage and no annotations. Despite the existence of an output schema (which reduces the need for detailed return value explanation), the description helpfully summarizes the return fields. Covers purpose, parameters, behavioral logic, and usage workflow thoroughly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Given 0% schema description coverage, the description effectively compensates by mapping semantic meaning to parameters: 'query' is the question being assessed, and 'candidate_rows' are 'the fact rows you have gathered.' This provides necessary context for the optional array parameter that the schema leaves undocumented.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description opens with a clear metaphor ('Sufficiency gate') followed by a precise question format ('can this question be answered with current evidence?'). It distinguishes itself from siblings like investigate_question or check_claim by positioning the tool as a validation gate rather than an evidence-gathering or claim-verification tool.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit temporal guidance ('Use this BEFORE forcing a best-guess answer') and clear error-handling protocol ('If answerable=false, it is better to say 'insufficient evidence' than to guess wrong'). However, it does not explicitly name sibling alternatives to use instead, relying only on the 'recommended next tool' output field to guide users.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
assess_coverageAssess CoverageAInspect
Question-scoped readiness check. Given a task description, returns what the system knows and doesn't know: artefact counts by domain, freshness stats, and knowledge gaps. Use BEFORE answering to decide if you should search more or commit. If coverage is thin on the question's topic, search with different terms before answering. Addresses 'do I have enough evidence to answer this?'
| Name | Required | Description | Default |
|---|---|---|---|
| domains | No | ||
| action_types | No | ||
| task_description | Yes |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden and successfully discloses return values ('returns what the system knows and doesn't know'). However, it lacks explicit safety classification (read-only nature) despite the implied non-destructive 'readiness check' behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four tightly constructed sentences with zero redundancy. Front-loads the core function (readiness check), follows with usage timing, conditional logic, and decision criteria. Every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Strong coverage of purpose and workflow integration. Since output schema exists, the summary of return values is sufficient. However, the complete omission of optional parameter semantics prevents a 5 given the tool's filtering capabilities.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Critical failure given 0% schema coverage. While it implicitly references the required 'task_description' parameter ('Given a task description'), it completely omits the two optional array parameters ('domains' and 'action_types'), leaving the agent without guidance on when or how to use these filters.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Excellent specificity: defines it as a 'Question-scoped readiness check' that returns 'artefact counts by domain, freshness stats, and knowledge gaps.' Clearly distinguishes from sibling assess_answerability by focusing on coverage/evidence quantity rather than answerability feasibility.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit temporal guidance ('Use BEFORE answering'), actionable next steps ('search with different terms before answering'), and the specific decision criteria ('do I have enough evidence to answer this?'). Clearly positions the tool in the workflow against alternatives like commit or search.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
build_timelineBuild TimelineAInspect
Deterministic timeline constructor for temporal reasoning. Finds all dated events matching your query, normalizes dates, and returns them sorted chronologically. Use for 'what order', 'before/after', 'earliest/latest' questions. Returns unresolved events (found but no date) separately.
| Name | Required | Description | Default |
|---|---|---|---|
| as_of | No | ||
| query | Yes | ||
| relation | No | ||
| anchor_event | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden and succeeds well: it discloses 'deterministic' behavior, explains the normalization processing, and clarifies the dual-output structure (returning unresolved events separately). It implies read-only analytical behavior through 'constructor' and 'finds' language.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four tightly constructed sentences with zero redundancy: establishes identity (sentence 1), core mechanism (sentence 2), usage context (sentence 3), and edge case handling (sentence 4). Information is front-loaded and every clause earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema, the description appropriately focuses on behavioral semantics rather than return value structures. It adequately covers the tool's unique value proposition (temporal reasoning) and output characteristics, though it could better document the 4 parameters given the complete lack of schema descriptions.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 0%, requiring the description to compensate. It implicitly covers 'query' ('matching your query') and hints at 'relation' values ('before/after', 'latest'), but provides no guidance for 'as_of' or 'anchor_event' parameters. Partial compensation merits a middle score.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly defines the tool as a 'deterministic timeline constructor for temporal reasoning' with specific actions (finds events, normalizes dates, sorts chronologically). It effectively distinguishes itself from siblings like check_claim or assess_answerability by focusing specifically on temporal ordering and chronological construction.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit positive guidance ('Use for what order, before/after, earliest/latest questions') that clearly signals when to select this tool. However, it lacks explicit negative constraints or named alternative tools for non-temporal queries, though the specificity of the examples strongly implies the boundaries.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
check_claimCheck ClaimAInspect
Verify a proposed answer against memory before committing to it. Pass your candidate answer as claim_text. Returns supporting and contradicting evidence with confidence scores. Use as a pre-answer gate: if contradicting evidence exists or support is weak, investigate further before answering.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | ||
| agent_id | No | ||
| claim_text | Yes |
Output Schema
| Name | Required | Description |
|---|---|---|
| matches | Yes | |
| verdict | Yes | |
| confidence | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. It effectively discloses behavioral traits by specifying the tool returns 'supporting and contradicting evidence with confidence scores' and verifies 'against memory'. Minor gap: does not explicitly state read-only safety or potential side effects like logging.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four sentences, zero waste. Front-loaded with core purpose, followed by parameter mapping, return value disclosure, and usage workflow. Every sentence earns its place with high information density.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given output schema exists, description appropriately summarizes return values without over-specifying. Workflow guidance is complete. Minor incompleteness regarding optional parameters, but sufficient for an agent to invoke the tool successfully given the required parameter is well-documented.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, requiring description to compensate. It successfully maps 'claim_text' to 'candidate answer', but provides no guidance for optional parameters 'limit' (likely result limit) or 'agent_id' (likely caller identifier), leaving 2 of 3 parameters undocumented.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clear verb 'Verify' with specific resource 'proposed answer against memory'. Explicitly positions tool as a 'pre-answer gate', distinguishing it from sibling output/commitment tools like submit_correction or final answering operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicit workflow guidance: 'before committing to it' establishes when to use. Clear conditional logic provided: 'if contradicting evidence exists or support is weak, investigate further before answering', specifying exactly when to halt and what alternative action to take.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
check_constraintsCheck ConstraintsAInspect
Check an action against all active constraints. Returns matched constraints, match types (structural/semantic), and a combined verdict (pass/warn/block).
| Name | Required | Description | Default |
|---|---|---|---|
| team_id | No | ||
| metadata | No | ||
| target_resources | No | ||
| action_description | Yes |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It successfully discloses return behavior (matched constraints, match types, verdict) and specific verdict values (pass/warn/block) not visible in the input schema. It misses explicit safety/disposition information (whether checks are logged or purely read-only).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences with zero waste: the first states purpose, the second states return values. Every word earns its place with no redundancy or filler.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequate for a validation tool with an output schema (which excuses detailed return documentation), but incomplete due to the total lack of parameter guidance given 0% schema coverage. The description meets minimum viability but leaves significant gaps for the 4-parameter interface.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, requiring the description to compensate. While it implicitly references action_description via 'Check an action,' it completely omits semantics for team_id, target_resources, and metadata, leaving three-quarters of the parameter surface undocumented.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verbs ('Check') and clearly identifies the resource ('action against all active constraints'). It distinguishes itself from siblings like get_constraints, declare_constraint, and update_constraint by emphasizing the validation/verification function rather than CRUD operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage through the phrase 'Check an action,' suggesting it's for pre-action validation. However, it lacks explicit guidance on when to use this versus similar evaluation tools like verify_before_acting or assess_answerability, or prerequisites for the action_description format.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
checkpoint_decision_stateCheckpoint Decision StateAInspect
Create a receipted snapshot of your current decision state during a long-running session. Records decisions made, assumptions in effect, and open questions. Enables resumption by the same or different agent from the last checkpoint rather than replaying from zero.
| Name | Required | Description | Default |
|---|---|---|---|
| summary | Yes | ||
| session_id | Yes | ||
| open_questions | No | ||
| decisions_so_far | No | ||
| assumptions_in_effect | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It adds useful behavioral context ('receipted snapshot', 'resumption by different agent') but fails to disclose critical operational traits: persistence guarantees, whether multiple checkpoints per session are retained or overwritten, authorization requirements, or the nature of the receipt returned (despite has_output_schema being true).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is optimally structured with three high-information sentences: action definition, content specification, and usage benefit. There is no redundancy or filler; every clause earns its place by conveying distinct information about functionality, parameters, or operational value.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (5 parameters including nested object arrays for decisions) and absence of annotations, the description provides adequate conceptual framing but insufficient implementation detail. It does not explain the internal structure of 'decisions_so_far' objects or how the checkpoint relates to the output schema, though it successfully conveys the core user story.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, requiring the description to compensate. It semantically maps three parameters by stating it 'records decisions made, assumptions in effect, and open questions,' which clarifies the intent of the complex array structures. However, it completely omits explanation of 'session_id' (scope identifier) and 'summary' (checkpoint label), leaving significant gaps.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly defines the tool's purpose with specific verbs ('Create a receipted snapshot') and resource ('decision state'). It effectively positions the tool as a session persistence mechanism. However, it does not explicitly differentiate from similar siblings like 'record_decision_context' or 'get_decision_context', which could cause selection ambiguity given the large toolkit.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context ('during a long-running session') and value proposition ('enables resumption... rather than replaying from zero'), suggesting when to checkpoint. However, it lacks explicit guidance on when NOT to use this versus alternatives like 'record_decision_context', and does not mention prerequisites like session initialization.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
declare_available_modelsDeclare Available ModelsBInspect
Declare which models are available in this session for orchestration routing. Called once at session start.
| Name | Required | Description | Default |
|---|---|---|---|
| models | Yes | ||
| session_id | Yes |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description must carry full behavioral disclosure. It mentions the timing constraint ('once at session start') but fails to disclose side effects, idempotency guarantees, what happens to previous declarations, or how the orchestration routing consumes this data. Insufficient for a state-modifying operation.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences with zero waste: first states purpose, second states usage timing. Appropriately front-loaded and efficient.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite having an output schema (reducing the need for return value documentation), the tool has complex nested input parameters with enums and 0% schema coverage. The minimal description fails to bridge this gap or explain how this declaration affects the session's subsequent orchestration behavior.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, requiring the description to compensate. While it mentions 'models' abstractly, it does not explain the 'session_id' parameter, the nested array structure, or the semantic meaning of the enum values (fast/balanced/capable, low/medium/high). It adds minimal value beyond the schema's structural definition.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool 'Declare[s] which models are available' for 'orchestration routing' — specific verb and resource. However, it does not explicitly differentiate from sibling 'declare_constraint' or 'register_external_service', which also involve declaration/registration patterns.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit timing guidance ('Called once at session start'), indicating initialization usage. However, it lacks information on when NOT to use it, what happens if called multiple times, or alternatives to this declaration method.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
declare_constraintDeclare ConstraintCInspect
Declare an organisational constraint (boundary, relationship, policy, or context flag) that agents must respect. This is a mutation operation.
| Name | Required | Description | Default |
|---|---|---|---|
| scope | No | ||
| team_id | No | ||
| evidence | No | ||
| severity | No | ||
| assertion | Yes | ||
| expires_at | No | ||
| constraint_type | Yes | ||
| assertion_structured | No | ||
| review_interval_days | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With zero annotations provided, the description carries the full burden of behavioral disclosure. It correctly identifies the operation as a mutation, but fails to disclose side effects (e.g., notifications triggered), persistence guarantees, validation rules, or what happens when declaring duplicate constraints. The mention of 'agents must respect' hints at enforcement but doesn't explain the mechanism.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences with no redundant phrases or filler. However, given the high complexity (9 parameters, nested objects, multiple enums), the brevity contributes to under-specification rather than efficient communication. The structure is front-loaded with the core action.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite having an output schema (reducing the need to describe return values), the description is inadequate for a complex mutation tool with zero schema annotations. It lacks explanation of required parameters' formats, the relationship between 'assertion' and 'assertion_structured', and operational context like review cycles or team scoping implied by the parameters.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, requiring the description to compensate significantly. While it helpfully expands the enum values for 'constraint_type' (boundary, relationship, policy, context_flag) and implies 'assertion' as the constraint content, it completely omits semantics for 7 other parameters including 'severity', 'expires_at', 'scope', and 'evidence', leaving critical parameters undocumented.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly identifies the action ('Declare') and resource ('organisational constraint'), and enumerates the specific constraint types (boundary, relationship, policy, context_flag). However, it fails to distinguish from sibling mutation tools like 'update_constraint' or 'suggest_constraint', leaving ambiguity about when to create versus modify or propose constraints.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description only notes 'This is a mutation operation,' which helps distinguish it from read-only siblings like 'get_constraints' but provides no guidance on choosing between 'declare_constraint', 'update_constraint', or 'suggest_constraint'. No prerequisites, idempotency notes, or conflict resolution behavior is mentioned.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
derive_from_factsDerive From FactsAInspect
Safe math and selection over a fact row set. Operations: sum, count, difference, max, min, latest, earliest. Pass the rows from enumerate_memory_facts and get a deterministic result with a computation trace. Removes arithmetic slop from totals, comparisons, and 'which is highest' questions.
| Name | Required | Description | Default |
|---|---|---|---|
| rows | Yes | ||
| operation | Yes |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. Adds valuable behavioral context: 'deterministic result with a computation trace' and 'Safe math' (implies error handling). Does not indicate if read-only or mention rate limits, but covers core behavioral value well.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three efficient sentences with zero waste. Front-loaded with operations list, followed by input/output contract, ending with value proposition. Every clause earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given output schema exists (not shown), description correctly omits return value details. Effectively references critical sibling tool (enumerate_memory_facts). Strong coverage for a 2-parameter tool despite 0% schema descriptions, though row field semantics could be richer.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage, the description compensates by listing all 7 enum values for operation and specifying that rows must come from enumerate_memory_facts. Minor gap: doesn't detail expected row structure (which fields are numeric/date), but sufficient for an agent to construct valid calls given the sibling context.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
States specific operations (sum, count, difference, max, min, latest, earliest) over a specific resource (fact row set). Explicitly distinguishes from siblings by specifying it receives rows from enumerate_memory_facts, clarifying its position in the data pipeline.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states prerequisite: 'Pass the rows from enumerate_memory_facts.' Defines specific use cases: 'Removes arithmetic slop from totals, comparisons, and which is highest questions.' Clear when-to-use guidance distinguishes it from raw query tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
dismiss_skillDismiss SkillAInspect
Dismiss a pending or quarantined skill submission. The skill is marked as dismissed and will not be promoted.
| Name | Required | Description | Default |
|---|---|---|---|
| notes | No | Additional notes | |
| reason | Yes | Reason for dismissal | |
| submission_id | Yes | ID of the skill submission to dismiss |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Without annotations, the description carries the full burden of behavioral disclosure. It successfully explains the persistent outcome ('marked as dismissed') and business rule implication ('will not be promoted'), but omits critical safety context such as whether the action is reversible, idempotent, or requires specific permissions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description consists of exactly two sentences with zero redundancy. The first establishes the operation and target state; the second clarifies the terminal consequence. Every word serves a distinct purpose in aiding tool selection.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema (per context signals), the description appropriately omits return value details. It adequately covers the core action for a state-transition tool, though it could be strengthened by explicitly contrasting with 'retract_skill' to prevent selection errors among the 40+ sibling tools.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 100% schema description coverage, the structured fields already adequately document all three parameters (submission_id, reason, notes). The description adds no additional semantic guidance—such as expected reason formats or note conventions—meriting the baseline score of 3.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Dismiss'), target resource ('skill submission'), and qualifying states ('pending or quarantined'). It further distinguishes itself from the sibling 'promote_skill' by explicitly stating the skill 'will not be promoted,' clarifying the terminal outcome of this operation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implicitly guides usage by restricting the action to 'pending or quarantined' submissions, which helps identify applicable scenarios. However, it fails to explicitly differentiate from sibling 'retract_skill' or state when to prefer this tool over alternative rejection workflows.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
enumerate_memory_factsEnumerate Memory FactsAInspect
Deterministic fact-table extraction for aggregation questions. Returns a structured row set (subject, predicate, object, date, session_id, confidence) instead of prose. Use this for 'how many', 'total', 'list all' questions — count the rows instead of hoping the LLM enumerates correctly. Includes missing_dimensions to flag what might not have been found.
| Name | Required | Description | Default |
|---|---|---|---|
| as_of | No | ||
| limit | No | ||
| query | Yes |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations, description carries full burden and discloses return structure (specific tuple fields: subject, predicate, object, etc.), determinism, and output features (missing_dimensions flag). Lacks operational details like error handling or empty result behavior, but covers the critical output contract well.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four sentences with zero waste: (1) purpose and return type, (2) specific use cases, (3) methodology guidance, (4) auxiliary feature. Front-loaded with the deterministic/structured nature and maintains high information density throughout.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has an output schema, the description appropriately focuses on explaining the semantic meaning of the return structure (the fact table format) and missing_dimensions feature. Only gap is the lack of parameter documentation given zero schema coverage.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 0% description coverage, requiring the description to compensate, yet it fails to mention any of the three parameters (query, as_of, limit). While 'query' is somewhat inferable from context, temporal filtering (as_of) and pagination (limit) are completely undocumented.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
States specific verb ('extraction') and resource ('facts'), clearly defining the deterministic table-based approach. Explicitly distinguishes from prose-based siblings like 'query_memory' by contrasting 'structured row set' versus prose and LLM enumeration.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit when-to-use guidance ('how many', 'total', 'list all' questions) and methodology ('count the rows instead of hoping the LLM enumerates correctly'). Clearly indicates this is the alternative to prose-based query tools for aggregation tasks.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
escalate_with_contextEscalate With ContextAInspect
Contextual escalation — packages your full reasoning state (evidence gathered, options considered, recommended action) and routes to a human for review. Preserves work so the human responds with full context, not from scratch. Use when you hit genuine uncertainty that the system cannot evaluate.
| Name | Required | Description | Default |
|---|---|---|---|
| urgency | No | ||
| question | Yes | ||
| reasoning | Yes | ||
| session_id | No | ||
| evidence_gathered | No | ||
| options_considered | No | ||
| recommended_action | Yes |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden. It discloses that the tool 'preserves work' and routes to humans, adding valuable context beyond the name. However, it omits what happens post-escalation (blocking vs. async), reversibility, or expected response timing.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three tightly constructed sentences: definition, value proposition ('Preserves work'), and usage condition. Every sentence earns its place with zero redundancy, though 'Contextual escalation' slightly echoes the title.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 7 parameters with complex nested objects and 0% schema coverage, the description is minimally viable. It captures the core concept adequately and defers return values appropriately (output schema exists), but the lack of guidance on optional parameters (urgency levels, session_id) and complex array structures leaves significant gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 0%, requiring heavy compensation. The description parenthetically maps to key parameters (evidence gathered, options considered, recommended action) and implies 'reasoning,' but fails to document the urgency enum values (blocking/advisory), session_id purpose, or the nested object structures for evidence/options.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool 'packages your full reasoning state... and routes to a human for review,' specifying the exact resource (reasoning state including evidence, options, action) and distinguishing it from automated analysis siblings like investigate_question by emphasizing human routing.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit when-to-use guidance ('Use when you hit genuine uncertainty that the system cannot evaluate'), clearly delineating the boundary between automated processing and human escalation. Lacks specific naming of sibling alternatives like get_escalation_recommendation or assess_answerability.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
expand_hit_contextExpand Hit ContextAInspect
Session-neighborhood expansion around promising retrieval hits. When you find a relevant chunk but the specific fact (name, date, amount) is in a nearby turn, use this to fetch ±N turns from the same session. Recovers facts like 'my parents', '$6', or 'Disney+' that are near but not in the retrieved chunk.
| Name | Required | Description | Default |
|---|---|---|---|
| mode | No | ||
| hit_ids | Yes | ||
| radius_turns | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It explains the expansion mechanic ('fetch ±N turns from the same session') and return value type ('Recovers facts'), but omits safety profile (read-only vs. destructive), rate limits, or idempotency characteristics that would be critical for an agent to know.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three tightly constructed sentences with zero redundancy: definition, usage trigger/mechanism, and value examples. Front-loaded with the core concept ('Session-neighborhood expansion'). Every clause earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema, the description appropriately focuses on operational semantics rather than return structure. It adequately positions the tool within the retrieval workflow (post-hit expansion). Minor gap: could reference the constraint limits (max 5 turns) given zero schema descriptions.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 0%, requiring description compensation. It maps '±N turns' to radius_turns and 'promising retrieval hits' to hit_ids, and implies the 'mode' concept (turn/session). However, it fails to explain the enum values ('turn' vs 'session' vs 'window'), constraints (maxItems: 10, maximum: 5), or required vs optional parameters.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description opens with a specific verb-noun phrase ('Session-neighborhood expansion') and clearly identifies the resource (retrieval hits). It distinguishes itself from initial retrieval siblings like 'query_memory' by specifying it operates on 'promising retrieval hits' (existing results) rather than raw queries.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit conditional trigger: 'When you find a relevant chunk but the specific fact... is in a nearby turn, use this.' This clearly signals the prerequisite state (having a hit with incomplete context). Lacks explicit naming of alternatives for initial search, though the conditionality implies the workflow sequence.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
fill_gapFill Knowledge GapBInspect
Fill a previously reported gap with new knowledge. Gap must have been reported by a different tenant for cross-tenant credit.
| Name | Required | Description | Default |
|---|---|---|---|
| content | Yes | ||
| evidence | Yes | ||
| gap_receipt_id | Yes |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses the cross-tenant credit system and the tenant validation rule, which is valuable behavioral context. However, it omits whether filling a gap is reversible, if it triggers notifications, or how the evidence array is processed/validated.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description consists of exactly two sentences with zero waste. The first sentence establishes the core purpose immediately; the second sentence provides the critical constraint. Every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema (not shown but indicated), the description appropriately omits return value details. However, for a mutation tool with 0% schema coverage and no annotations, it lacks sufficient detail on parameter semantics and side effects to be fully complete, though the cross-tenant constraint is a crucial inclusion.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, requiring the description to compensate. It loosely maps 'gap' to gap_receipt_id and 'new knowledge' to content, but provides no guidance on the structure of the evidence array (complex objects with source_id, source_type, excerpt) or what constitutes valid content syntax.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the core action ('Fill') and resource ('previously reported gap'). It distinguishes this tool from siblings like 'get_knowledge_gaps' by specifying this is a write operation and uniquely mentioning the 'cross-tenant credit' business rule, which defines its specific scope.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides a clear constraint ('Gap must have been reported by a different tenant'), indicating a validation rule for invocation. However, it fails to specify what to do if the gap is from the same tenant (alternative action) or when to prefer this over 'submit_correction' or other knowledge-updating siblings.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_active_alertsGet Active AlertsCInspect
Get active watch alerts across all watches for the tenant from the last 7 days.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It specifies the 7-day lookback window and 'active' status filter, but fails to disclose read-only status, performance characteristics, rate limits, or what constitutes a 'watch' in this domain.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence of 11 words that front-loads the action. While appropriately concise, it is overly minimal given the lack of schema documentation and behavioral annotations.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite having an output schema (reducing the need for return value description), the tool suffers from zero schema coverage on inputs and no annotations. The description fails to compensate by documenting the 'limit' parameter or explaining domain-specific terms like 'watch' and 'tenant'.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema contains one parameter ('limit') with 0% description coverage, and the description makes no mention of this parameter or its function (pagination control). The description adds zero semantic value beyond the schema structure.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action (Get), resource (active watch alerts), and scope (across all watches for the tenant from the last 7 days). However, it does not explicitly differentiate this tool from siblings like 'get_signals_feed' or 'get_pressure_status'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description defines the temporal scope (last 7 days) but provides no guidance on when to use this tool versus alternatives such as 'get_signals_feed' or 'get_escalation_recommendation', nor does it mention prerequisites or constraints beyond the implicit time window.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_audit_trailGet Audit TrailCInspect
Read VaultCrux Memory Core import audit history and linked receipt hashes for a topic.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | ||
| topic | Yes |
Output Schema
| Name | Required | Description |
|---|---|---|
| items | Yes | |
| topic | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. While 'Read' implies a safe, non-destructive operation, and 'linked receipt hashes' hints at return content, the description omits pagination behavior (despite the limit parameter), error handling for invalid topics, and whether results are ordered or filtered by default.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The single-sentence structure is appropriately compact and front-loaded with the action verb. However, the dense domain jargon ('VaultCrux Memory Core') crammed into one sentence reduces scannability compared to structured multi-sentence descriptions that separate purpose from parameter guidance.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the existence of an output schema (covering return values) and only two simple parameters, the description is minimally adequate. However, the lack of parameter documentation for 'limit' and absence of usage guidelines leaves gaps that should be addressed for a tool interfacing with audit logs.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, requiring the description to fully compensate. It implicitly clarifies 'topic' ('for a topic') but completely omits the 'limit' parameter, leaving its purpose (pagination vs. result truncation) and interaction with the output schema undocumented.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses a specific verb ('Read') and identifies a distinct domain resource ('VaultCrux Memory Core import audit history and linked receipt hashes'). The specificity of 'import audit history' distinguishes it from siblings like query_memory or get_decision_context, though it could more explicitly contrast with general memory query tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like query_memory or list_topics. In a server with many 'get_' and query tools, the absence of selection criteria forces the agent to guess based on naming conventions alone.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_causal_chainGet Causal ChainBInspect
Get the causal chain graph for a specific decision, showing how decisions, actions, and supersessions relate.
| Name | Required | Description | Default |
|---|---|---|---|
| decision_id | Yes |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses that the output is a graph structure showing causal relationships including supersessions, which adds domain context. However, it omits safety traits (read-only nature), error conditions, or performance characteristics expected for a retrieval tool.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence of 18 words that front-loads the action verb 'Get' and every phrase contributes specific meaning about the resource and output structure. No extraneous text.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has an output schema (covering return values) and only one parameter, the description is minimally adequate. However, with zero schema coverage and no annotations, the lack of explicit parameter documentation and operational guidance (errors, auth) creates notable gaps for an AI agent attempting to invoke the tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, requiring the description to compensate. It loosely implies the decision_id parameter through the phrase 'for a specific decision' but fails to explicitly document what decision_id represents, its format, or how to obtain it. This meets minimum viability but leaves significant documentation gaps.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool retrieves a 'causal chain graph' and specifies the content (decisions, actions, supersessions) distinguishing it from siblings like get_audit_trail or get_decision_context. However, it doesn't explicitly contrast with get_correction_chain which also involves chains.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like get_audit_trail, get_decision_context, or get_correction_chain, nor does it mention prerequisites for the decision_id parameter.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_checkpointsGet CheckpointsAInspect
Retrieve decision checkpoints for a session. Returns the linked list of checkpoints in reverse chronological order. Use this to resume work from a prior checkpoint after session failure or handoff.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | ||
| session_id | Yes |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. It successfully discloses return behavior: 'linked list of checkpoints in reverse chronological order'. However, lacks mention of read-only safety, rate limits, or pagination behavior despite zero annotation coverage.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, front-loaded with action. First sentence covers purpose and return format; second provides usage context. Zero redundancy—every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequate for a retrieval tool with output schema (so return values needn't be detailed). However, gaps remain: with 0% parameter schema coverage, the description should have documented the two parameters explicitly, which it did not.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 0% description coverage for both 'session_id' and 'limit' parameters. Description only implicitly references 'session' without explaining parameter format, and completely omits the 'limit' parameter. With low schema coverage, the description fails to compensate adequately.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verb 'Retrieve' with resource 'decision checkpoints' and scope 'for a session'. It clearly distinguishes from sibling tool 'checkpoint_decision_state' (which likely creates checkpoints) by emphasizing retrieval.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states when to use: 'to resume work from a prior checkpoint after session failure or handoff'. Provides clear contextual trigger but does not explicitly name alternative approaches (e.g., starting fresh vs. resuming).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_constraintsGet ConstraintsCInspect
List active organisational constraints, optionally filtered by type, status, or team.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | ||
| status | No | ||
| team_id | No | ||
| constraint_type | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden but only notes constraints are 'active' (ambiguous given the status enum includes expired/superseded). Omits pagination behavior, default limits, and safety profile entirely.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence is front-loaded and efficient, though excessive brevity given the lack of schema documentation and multiple siblings requiring differentiation.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Inadequate for 4-parameter tool with 0% schema coverage. Missing explanation of pagination (limit), return structure (despite output schema existing), and relationship to constraint lifecycle states.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Mentions 3 of 4 parameters (type, status, team) but completely omits 'limit'. With 0% schema description coverage, it fails to explain enum semantics for constraint_type or status values, though it correctly implies all parameters are optional.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clear verb 'List' and resource 'organisational constraints', but fails to differentiate from siblings like 'check_constraints' or 'declare_constraint' which handle similar resources.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Mentions optional filtering but provides no guidance on when to use this listing tool versus 'check_constraints', 'update_constraint', or 'declare_constraint', nor any prerequisites.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_contradictionsGet ContradictionsAInspect
Find conflicting information across the user's memory. Returns groups of artefacts that contradict each other on the same topic. Use after gathering evidence for an answer — if your evidence sources disagree, this reveals which version is correct (typically the most recent).
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| items | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. It discloses return structure ('groups of artefacts') and provides critical behavioral context about interpreting results ('reveals which version is correct'). Could improve by explicitly confirming read-only status.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two tight sentences with zero waste. First sentence covers purpose and return value; second covers usage timing and interpretation. Information is front-loaded with the core function stated immediately.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the output schema exists, return values are adequately handled. The conceptual coverage (finding contradictions, recency heuristic) is complete for the tool's complexity. However, the complete omission of the sole parameter documentation creates a practical gap in usability.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage, the description must compensate but fails entirely. The 'limit' parameter (integer 1-500) is completely undocumented in both schema and description, leaving the agent with no guidance on what gets limited (contradiction groups vs artefacts).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action (Find conflicting information/Returns groups) and resource (artefacts across memory, same topic). It distinguishes from siblings like check_claim or assess_coverage by focusing specifically on contradiction detection.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit temporal guidance ('Use after gathering evidence') and workflow logic ('if your evidence sources disagree'). Includes interpretive guidance for resolving conflicts ('typically the most recent'), helping the agent understand when and how to invoke the tool.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_correction_chainGet Correction ChainAInspect
Trace how a fact or decision evolved over time. When you find a value (e.g. 'Rachel moved to Chicago'), call this to check if a more recent session supersedes it. Returns the full version chain with timestamps. ALWAYS use for 'current', 'now', 'most recent' questions before answering with the first value you find.
| Name | Required | Description | Default |
|---|---|---|---|
| decision_id | Yes |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses the return value structure ('full version chain with timestamps') and the core behavioral concept (tracking supersession of facts over time). It could be improved by explicitly stating read-only nature or any rate limiting, but the behavioral traits are well-covered.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four sentences with zero waste: purpose (sentence 1), usage context (sentence 2), return value (sentence 3), and critical usage constraint (sentence 4). Information is front-loaded with the core verb, and every sentence earns its place by adding distinct value beyond the structured fields.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's medium complexity (version tracking logic) and presence of an output schema, the description is complete. It explains the domain concept (correction chaining), provides invocation triggers, and describes return characteristics without needing to replicate the full output schema structure.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 0% for the single 'decision_id' parameter. The description implies the input context ('When you find a value') but does not explicitly map this to the decision_id parameter or describe its format/semantics. With only one parameter the inference is possible, but the description does not fully compensate for the lack of schema documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verbs ('Trace', 'check if... supersedes') and clearly identifies the resource (fact/decision evolution over time). It distinguishes from siblings like get_causal_chain or get_audit_trail by emphasizing temporal versioning and supersession rather than causality or action logging.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit when-to-use guidance ('When you find a value... call this') and specific trigger words requiring invocation ('ALWAYS use for 'current', 'now', 'most recent' questions'). It clearly states the alternative to avoid ('before answering with the first value you find'), giving the agent precise selection criteria.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_decision_contextGet Decision ContextCInspect
Retrieve agent session decisions from the CoreCrux Decision Plane, including decision IDs, outcomes, and cursor positions.
| Name | Required | Description | Default |
|---|---|---|---|
| session_id | Yes |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. While it lists the types of data retrieved (IDs, outcomes, cursor positions), it fails to mention operational aspects such as whether the operation is read-only, pagination behavior, or potential error states. It does not disclose what 'cursor positions' implies for pagination or state management.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, front-loaded sentence that efficiently communicates the core action and specific return fields without redundancy. Every clause adds value, though it could slightly improve by explicitly referencing the input parameter to strengthen the connection between the action and its required argument.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema, the description appropriately focuses on the tool's purpose rather than exhaustively listing return values. However, with numerous similarly-named sibling tools and zero schema coverage for the single required parameter, the description lacks critical context needed for correct invocation and selection, leaving clear gaps in completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 0% description coverage for the required 'session_id' parameter. The description mentions 'agent session decisions' which implicitly relates to the parameter, but it does not explicitly describe what the session_id represents (e.g., 'the unique identifier for the agent session to query') or its format requirements beyond the schema's minLength.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses a specific verb ('Retrieve') and resource ('agent session decisions'), clearly identifying the tool's function. It further specifies the data returned ('decision IDs, outcomes, and cursor positions'), which hints at the tool's scope within the Decision Plane. However, it does not explicitly differentiate from the sibling tool 'get_decisions_on_stale_context'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like 'get_decisions_on_stale_context' or 'record_decision_context'. There are no stated prerequisites, exclusions, or conditions that would help an agent determine if this is the correct retrieval tool among its many siblings.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_decisions_on_stale_contextGet Decisions on Stale ContextCInspect
Find decisions in a session that may have been made on stale memory context.
| Name | Required | Description | Default |
|---|---|---|---|
| session_id | Yes |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. The phrase 'may have been made' hints at heuristic/detection behavior, but the description fails to explain what constitutes 'stale' context, detection criteria, or whether this is a read-only audit operation.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The single-sentence description is efficiently structured and front-loaded with the action verb, but given the 0% schema coverage and lack of annotations, it is overly terse and leaves significant documentation gaps.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Although an output schema exists (reducing the need for return value documentation), the description inadequately explains the domain concept of 'stale memory context' and lacks guidance on interpreting results for this conceptual audit tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The schema has 0% description coverage. While the description mentions 'in a session,' implying the session_id parameter, it does not define the parameter's format, source, or semantics, offering only minimal compensation for the undocumented schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool 'Find[s] decisions in a session that may have been made on stale memory context,' providing a specific verb, resource, and condition. However, it does not differentiate from siblings like get_decision_context or get_freshness_report, which may operate on similar concepts.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives (e.g., get_decision_context for general retrieval), nor does it specify prerequisites or conditions that would trigger its use.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_domain_changelogDomain ChangelogAInspect
Cross-artefact-type changelog for specified domains since a given timestamp. Returns constraints added/updated, knowledge changes, decisions recorded, and alerts raised/resolved. Use at session start to learn what changed in your domain since your last session.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Maximum entries to return (default 500) | |
| since | Yes | Changelog start timestamp (max 90 days ago) | |
| domains | Yes | Domains to check for changes | |
| include | No | Optional filter for artefact types |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses what data is returned (constraints added/updated, etc.) but omits behavioral traits like read-only safety, performance characteristics, or cost implications that an agent would need to know before calling.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences with zero waste: sentence 1 defines the operation, sentence 2 details the return payload, and sentence 3 provides usage timing. Information is front-loaded and appropriately sized.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
While the description leverages the existing output schema (appropriately not duplicating return structure), it incompletely describes the tool's scope by omitting 'skills' from the list of artefact types, despite 'skills' being a valid enum value in the 'include' parameter.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Despite 100% schema coverage, the description adds valuable semantic context: it frames the 'since' parameter around 'session' boundaries and maps the return value categories to the 'include' filter options (constraints, knowledge, decisions, alerts), aiding agent comprehension beyond raw schema definitions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly defines the tool as a 'Cross-artefact-type changelog' and specifies it covers constraints, knowledge, decisions, and alerts. The 'cross-artefact-type' phrasing implicitly distinguishes it from single-artefact siblings like get_constraints, though it does not explicitly compare against them.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
It provides explicit temporal guidance ('Use at session start') and context ('since your last session') for when to invoke the tool. However, it lacks guidance on when NOT to use it or explicit mentions of alternatives like get_constraints for single-artefact queries.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_enrichment_statusGet Enrichment StatusCInspect
Check the status of submitted corrections (pending, corroborated, merged, retracted, expired).
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | ||
| status | No | ||
| correction_id | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the burden of disclosure. It valuably lists the five possible terminal states (pending, corroborated, merged, retracted, expired), providing insight into the state machine. However, it omits safety information (read-only vs. mutation), error handling, or pagination behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence with no redundant phrases. However, given the complete lack of schema documentation and annotations, it may be overly terse—every word earns its place, but critical information is missing.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a 3-parameter tool with zero schema descriptions and no annotations, the description is incomplete. It does not explain the query pattern (specific ID vs. list), filtering capabilities, or safety profile, leaving significant gaps the agent must infer.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, requiring the description to compensate. It fails to do so, never explaining that 'correction_id' queries a specific record while 'limit' and 'status' enable filtered listing, nor how they interact. The parenthetical status list ambiguously references values without clarifying if these are input filters or return values.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('Check') and resource ('status of submitted corrections'), and enumerates the specific possible states. However, it does not explicitly differentiate from siblings like 'get_correction_chain' or 'submit_correction' in terms of workflow position.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance is provided on when to use this tool versus alternatives, nor when to use the filtering parameters versus querying by specific ID. The description lacks prerequisites (e.g., 'use after submitting a correction').
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_escalation_recommendationGet Escalation RecommendationBInspect
Get model routing recommendation for a query based on composite confidence and difficulty profile. Returns escalation advice: none, recommended, or required.
| Name | Required | Description | Default |
|---|---|---|---|
| query | Yes | ||
| session_id | No | ||
| current_model | No | ||
| query_confidence | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, description discloses return values (escalation advice levels) but omits critical behavioral traits: whether the call is idempotent, if it logs/records the recommendation internally, latency expectations, or side effects on session state.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two efficiently structured sentences: first defines purpose and input logic, second specifies return values. No redundant or filler text; information density is high.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequate for a recommendation tool with existing output schema (return values briefly noted), but incomplete regarding the 4 input parameters given zero schema documentation. Missing guidance on sibling tool relationships prevents full completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 0% description coverage. Description mentions 'composite confidence' (conceptually mapping to query_confidence) and 'difficulty profile' (implied from query content) but fails to document session_id, current_model, or explicit parameter semantics. Insufficient compensation for undocumented schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
States specific function (model routing recommendation) and return values (none/recommended/required). Implicitly distinguishes from sibling 'escalate_with_context' by framing as advisory 'recommendation' rather than action, though explicit differentiation would strengthen this.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No explicit guidance on when to use versus alternatives like 'escalate_with_context' or 'assess_answerability'. Does not specify prerequisites (e.g., when query_confidence is required vs optional) or workflow integration.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_freshness_reportGet Freshness ReportAInspect
Check how recent the stored knowledge is across topics. Returns staleness indicators per topic. Use this when answering time-sensitive questions to verify your evidence isn't outdated. Topics with stale data may have been superseded by newer conversations not yet retrieved.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | ||
| stale_after_days | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| items | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden. It successfully discloses that the tool returns staleness indicators per topic and importantly notes a limitation: 'Topics with stale data may have been superseded by newer conversations not yet retrieved.' This warns the agent about data completeness gaps beyond what any structured field indicates.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four tightly constructed sentences: purpose, return value, usage guidance, and behavioral limitation. Each earns its place with zero redundancy. The front-loading of core functionality (checking recency) followed by usage context is optimal for agent decision-making.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the output schema exists, the description appropriately avoids detailing return values. However, with two completely undocumented parameters (0% schema coverage) and no explanation in the description, the definition has a significant gap that forces the agent to guess parameter semantics. Adequate but with clear gaps in input specification.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage, the description must compensate but fails entirely. Neither 'limit' (integer 1-500) nor 'stale_after_days' (integer 1-3650) are mentioned or explained. While 'limit' might be inferable, 'stale_after_days' is critical to the tool's function and its purpose is completely undocumented in both schema and description.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool checks recency of stored knowledge across topics and returns staleness indicators. Specific verbs (Check) and resources (stored knowledge, topics) are present. It implicitly distinguishes from sibling decision-tools by focusing on knowledge freshness, though it doesn't explicitly name alternatives like get_decisions_on_stale_context.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states when to use: 'when answering time-sensitive questions to verify your evidence isn't outdated.' This provides clear temporal context for invocation. However, it lacks explicit guidance on when NOT to use this or specific alternatives to consider for non-time-sensitive queries.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_knowledge_gapsGet Knowledge GapsCInspect
List gap receipts (coverage + enumeration) for the tenant, filterable by topic and recency.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | ||
| since_days | No | ||
| gap_subtype | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses that the tool retrieves gap types (coverage + enumeration) but fails to mention safety profile (though implied by 'List'), pagination behavior, or what distinguishes a 'receipt' from the gap itself.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, efficient sentence with no waste. However, the brevity contributes to the parameter mismatch (topic vs gap_subtype) and lacks necessary expansion given the zero schema documentation.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
While an output schema exists (reducing description burden for returns), the tool has 0% schema coverage and no annotations. The description fails to fully document the three parameters, particularly omitting limit and misidentifying gap_subtype as 'topic', leaving critical gaps in contextual completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 0%, requiring the description to compensate. It maps 'recency' to since_days and implicitly references gap_subtype values (coverage, enumeration), but erroneously mentions a 'topic' filter not present in the schema and completely omits the limit parameter.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verb 'List' and resource 'gap receipts', clarifying via parenthetical that gaps encompass 'coverage + enumeration'. However, 'gap receipts' is slightly jargon-heavy and the description does not explicitly differentiate from sibling tools like assess_coverage or fill_gap.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description mentions filtering capabilities but provides no guidance on when to use this tool versus siblings like assess_coverage, fill_gap, or enumerate_memory_facts. It also incorrectly implies a 'topic' filter that does not exist in the schema.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_my_tasksMy TasksBInspect
Read-through to PlanCrux task graph. Returns tasks assigned to or relevant to the calling agent, filtered by status and priority. Includes stage progress, blockers, and linked MemoryCrux artefact counts.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | Max tasks to return (default 10) | |
| status | No | Filter by task status (e.g. incomplete, in_progress, testing) | |
| priority | No | Filter by priority (critical, high, medium, low) |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It implies read-only behavior via 'Read-through' and 'Returns', and adds valuable context about returned data (stage progress, blockers, MemoryCrux artefact counts). However, it lacks explicit safety declarations, rate limits, or error behaviors.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three short sentences efficiently structured: target system identification, core function with filtering, and output details. No significant waste, though 'Read-through' is slightly ambiguous and domain jargon (PlanCrux/MemoryCrux) is dense but necessary.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequate for a 3-parameter retrieval tool. The description compensates for lack of annotations by mentioning specific return fields (stage progress, blockers). However, it fails to note that all parameters are optional (0 required), which would help the agent understand minimum invocation requirements.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, with all three parameters (limit, status, priority) fully documented in the schema. The description mentions 'filtered by status and priority' but adds no syntax details, format constraints, or examples beyond what the schema already provides. Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly identifies the action ('Read-through'/'Returns'), the resource (PlanCrux task graph), and the scope (tasks assigned to the calling agent). It distinguishes from siblings like `get_task_context` by emphasizing 'my tasks' relevance. However, 'Read-through' is slightly awkward phrasing that could be clearer.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No explicit guidance on when to use this tool versus alternatives like `get_task_context` or when to prefer filtering via parameters versus post-processing. The description states what filtering is possible but not when to apply it.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_platform_capabilitiesGet Platform CapabilitiesAInspect
Machine-queryable manifest of all available MemoryCrux tools, required trust tiers, and credit costs. Returns structured data for agent-to-service evaluation without reading documentation. Free (0 credits) at all tiers — discovery drives adoption.
| Name | Required | Description | Default |
|---|---|---|---|
| category | No | ||
| min_trust_tier | No | ||
| max_credit_cost | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden and successfully discloses critical behavioral traits: it returns 'structured data,' is 'Free (0 credits) at all tiers,' and covers specific metadata (trust tiers/costs). It could improve by mentioning caching or rate limiting behaviors.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three efficiently structured sentences with zero waste: first defines the resource, second explains the value proposition/return format, and third discloses cost. Information is front-loaded with the manifest definition.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the output schema exists and the tool has moderate complexity (3 optional params), the description adequately covers domain-specific concepts (trust tiers, credit costs) necessary for interpreting capabilities. It could explicitly note that parameters act as optional filters.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, requiring the description to compensate. While it mentions 'trust tiers' and 'credit costs' (corresponding to min_trust_tier/max_credit_cost parameters) and implies filtering capabilities, it does not explicitly map these concepts to the parameter names or explain their optional filtering behavior.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly identifies this as a 'Machine-queryable manifest' covering 'all available MemoryCrux tools, required trust tiers, and credit costs,' using specific verbs and resources that clearly distinguish it from operational siblings like check_claim or build_timeline.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear usage context ('agent-to-service evaluation without reading documentation') and implies its role in discovery, but lacks explicit 'when not to use' guidance or contrast with specific alternative tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_pressure_statusGet Pressure StatusBInspect
Get Engine knowledge pressure status for the tenant — indicates whether knowledge bases are under update pressure.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries full burden for behavioral disclosure. While it explains what the status indicates (update pressure), it fails to mention if the operation is read-only, cached, rate-limited, or computationally expensive. The interpretation of 'pressure' is provided, but operational characteristics are missing.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description consists of a single efficient sentence with an em-dash for clarification. It is front-loaded with the action ('Get Engine knowledge pressure status') and wastes no words. Every clause earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has zero parameters and an output schema exists, the description adequately explains the tool's purpose and return value semantics without needing to detail return structures. It is complete for a simple state-checking utility, though usage context would improve it.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema contains zero parameters. According to scoring rules, 0 parameters establishes a baseline score of 4. No parameter semantics are needed or provided.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool retrieves 'Engine knowledge pressure status' for the tenant and explains this indicates whether 'knowledge bases are under update pressure.' It effectively distinguishes from siblings like get_active_alerts or get_freshness_report by specifying 'pressure' and 'update pressure' semantics.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus similar monitoring tools like get_active_alerts, get_freshness_report, or get_knowledge_gaps. No prerequisites, triggers, or exclusions are mentioned.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_relevant_contextGet Relevant ContextAInspect
Task-scoped context briefing. Returns a prioritised context payload shaped by your task description, ranked by risk-if-missed. Constraints and alerts rank above general knowledge. Use at the START of reasoning about a question to get the system's best assessment of what's relevant. Complements query_memory: this gives breadth, query_memory gives depth.
| Name | Required | Description | Default |
|---|---|---|---|
| token_budget | Yes | ||
| priority_signal | No | ||
| task_description | Yes |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It successfully discloses ranking behavior ('Constraints and alerts rank above general knowledge,' 'ranked by risk-if-missed'). It could be improved by explaining token_budget enforcement behavior (truncation vs error) since no annotations cover this.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Five sentences, zero waste. Front-loaded with the core purpose ('Task-scoped context briefing'), followed by return value description, behavioral specifics, usage timing, and sibling differentiation. Every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given that an output schema exists, the description appropriately focuses on the conceptual model (prioritisation, risk-if-missed) rather than field-by-field output documentation. It adequately situates the tool within the broader toolset ecosystem, though input parameter gaps prevent a 5.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, requiring the description to compensate. It only explicitly references 'task description' (task_description parameter). It fails to explain token_budget (critical constraint) or priority_signal (enum with three options), leaving two-thirds of the parameters undocumented.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description opens with 'Task-scoped context briefing' and specifies it 'Returns a prioritised context payload shaped by your task description, ranked by risk-if-missed.' This provides a specific verb (returns/prioritises), resource (context payload), and unique ranking mechanism (risk-if-missed) that distinguishes it from siblings like query_memory.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states temporal positioning: 'Use at the START of reasoning about a question.' It also provides clear alternative guidance: 'Complements query_memory: this gives breadth, query_memory gives depth,' explicitly defining when to use which tool.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_signals_feedGet Signals FeedCInspect
Get the signals feed for the tenant from the WebCrux platform.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | ||
| since | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full responsibility for behavioral disclosure but offers none. It does not indicate whether the feed is real-time or batched, if results are cached, idempotency characteristics, or what happens when called without the optional time-based parameters.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single, front-loaded sentence with no filler words. However, its extreme brevity contributes to under-specification; while concise, it lacks the necessary information density for a tool with undocumented parameters.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 0% schema coverage for parameters and the presence of many similar sibling tools, the description is incomplete. It does not clarify the nature of 'signals' (security events, data changes, system metrics), explain the time-window query pattern implied by the 'since' parameter, or distinguish this feed from other event sources.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%—neither 'limit' nor 'since' have descriptions in the schema. The description text fails to compensate by explaining that 'limit' controls result pagination (1-100) or that 'since' filters for signals after a specific ISO 8601 timestamp. No parameter semantics are provided.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states a basic verb ('Get') and resource ('signals feed') with scope ('for the tenant from WebCrux'), but fails to define what constitutes a 'signal' in this context. Given numerous sibling tools like get_active_alerts, get_audit_trail, and get_causal_chain, the description does not differentiate what makes this feed distinct from other monitoring or event-retrieval tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives like get_active_alerts or get_relevant_context. It omits prerequisites, polling recommendations, or whether this should be used for real-time monitoring versus historical analysis.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_task_contextTask ContextAInspect
Full task context: task metadata, stages with status and weight, active blockers, linked artefacts (constraints, decisions, knowledge), recent log entries, and pinned master plan version. Assembles the full picture from PlanCrux and MemoryCrux.
| Name | Required | Description | Default |
|---|---|---|---|
| task_id | Yes | PlanCrux task ID or title |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It successfully indicates the tool aggregates data from two source systems (PlanCrux and MemoryCrux) and lists return components, but omits operational characteristics like caching behavior, data freshness guarantees, or whether the operation is read-only (though implied by 'get' prefix).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two dense sentences with zero waste. The first front-loads specific return value categories using a colon structure for scannability. The second provides essential provenance information (PlanCrux and MemoryCrux). Every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the existence of an output schema, the description appropriately provides high-level categorical expectations (metadata, stages, blockers) without duplicating schema details. It adequately covers the complexity of cross-system aggregation (PlanCrux + MemoryCrux) for a retrieval tool, though mentioning idempotency or safety would improve it to a 5.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema has 100% description coverage for the single task_id parameter ('PlanCrux task ID or title'). The description adds no additional semantic information about the parameter (e.g., format examples, whether title is fuzzy-matched), meriting the baseline score of 3 for high schema coverage.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description precisely enumerates what 'task context' entails: metadata, stages with status/weight, active blockers, linked artefacts (constraints, decisions, knowledge), recent log entries, and pinned master plan version. This specificity distinguishes it from siblings like get_decision_context or get_my_tasks by defining the exact scope of returned information.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
While the description implies this tool provides comprehensive detail (suggesting use when 'full picture' is needed), it offers no explicit when-to-use guidance, prerequisites, or comparison to alternatives like get_my_tasks (listing) versus this detail view. No exclusion criteria or decision flow is provided.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_versioned_snapshotGet Versioned SnapshotCInspect
Get the latest versioned snapshot for a VaultCrux Memory Core topic at an optional timestamp.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | ||
| topic | Yes | ||
| timestamp | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| at | Yes | |
| items | Yes | |
| topic | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. Fails to clarify the relationship between 'latest' and 'timestamp' (mutually exclusive or filtering?). Does not explain the 'limit' parameter's purpose (pagination vs. depth), creating ambiguity with the singular 'snapshot' noun. Implicitly read-only via 'Get' but lacks explicit safety disclosure.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence front-loaded with the core action. Efficient length, though 'at an optional timestamp' creates slight logical awkwardness (suggesting specificity while declaring optionality).
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With 0% schema coverage, no annotations, and an output schema that exists but is undocumented here, the description leaves critical gaps. The unexplained 'limit' parameter and ambiguous temporal logic ('latest' vs. 'timestamp') leave the agent under-informed for a 3-parameter retrieval tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 0% description coverage, requiring heavy compensation. Description mentions 'topic' and 'timestamp' with minimal context (timestamp is 'optional'), but completely omits the 'limit' parameter despite its presence in the schema. Does not clarify timestamp format expectations beyond the regex in schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clear verb-resource combination ('Get... versioned snapshot') and identifies the domain ('VaultCrux Memory Core topic'). Loses a point for failing to differentiate from siblings like get_checkpoints, get_decision_context, or query_memory in a crowded memory-management toolset.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Mentions 'optional timestamp' but provides no guidance on when to use it versus omitting it, nor when to prefer this over reconstruct_knowledge_state or get_checkpoints. Zero explicit guidance on prerequisites or alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
investigate_questionInvestigate QuestionAInspect
Composite server-side investigation tool. Pass a question and the server automatically: (1) detects intent (aggregation/temporal/ordering/knowledge-update/recall), (2) queries the entity index for structured facts, (3) builds a timeline for temporal questions, (4) retrieves memory chunks with the right scoring profile, (5) expands context around sparse hits, (6) derives counts/sums for aggregation, (7) assesses answerability, and (8) returns a recommendation. Use this as your FIRST tool for any non-trivial question — it does the multi-step investigation that would otherwise take 4-6 individual tool calls. The response includes structured facts, timeline, retrieved chunks, derived results, answerability assessment, and a recommendation for how to answer.
| Name | Required | Description | Default |
|---|---|---|---|
| question | Yes | ||
| question_date | No | ||
| scoring_profile | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. It comprehensively details the 8-step internal process (intent detection, entity indexing, timeline building, etc.) and explains the return structure. However, it does not explicitly state operational traits like idempotency, safety, or side effects.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two well-structured paragraphs: first enumerates the 8-step process, second provides usage guidance. Every sentence adds value—either explaining internal mechanics or guiding selection. No redundant text despite the complexity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has an output schema (per context signals) and replaces multiple sibling tools, the description adequately covers the composite nature and response contents. However, the lack of parameter documentation for 'question_date' and minimal explanation of 'scoring_profile' leaves minor gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema coverage, the description must compensate. It implicitly references 'scoring_profile' in step 4 ('right scoring profile') but does not explain the enum values (balanced/recall/recency) or mention 'question_date' at all. Partial compensation for the schema gap.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly defines the tool as a 'composite server-side investigation tool' and distinguishes it from siblings by stating it replaces '4-6 individual tool calls' that would otherwise be needed. It uses specific verbs (detects, queries, builds, retrieves) and identifies the resource (questions/entities).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states 'Use this as your FIRST tool for any non-trivial question' and clearly contrasts with alternatives by explaining it consolidates multi-step investigations. The guidance is prescriptive and contextual.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_external_servicesList External ServicesAInspect
List registered external services available to the calling agent. Returns service IDs, display names, allowed methods/paths, and rate limit state. Does not return credentials or credential references.
| Name | Required | Description | Default |
|---|---|---|---|
| status | No | Filter by service status |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. Discloses critical security behavior: 'Does not return credentials or credential references.' Also details return data structure. Lacks explicit read-only confirmation or rate limit behavior, but credential exclusion is high-value behavioral disclosure.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences with zero waste: purpose front-loaded, return values specified, and security constraint clearly stated. Appropriate length for tool complexity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Adequate for a single-parameter list operation with output schema present. Credential exclusion provides necessary security context. Lacks explicit read-only declaration (absent annotations), but 'List' verb and scope description provide sufficient completeness.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% ('Filter by service status'), establishing baseline 3. Description adds no explicit parameter details, but none needed given complete schema documentation of the single optional enum parameter.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Specific verb 'List' with clear resource 'registered external services' and scope 'available to the calling agent'. Distinct from sibling 'register_external_service' (which creates) and 'request_credentialed_call' (which executes).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear context about agent scope and explicitly excludes credential returns, implying boundaries for use. However, lacks explicit 'when to use vs alternatives' guidance despite strong implicit differentiation from credential-related siblings.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_topicsList TopicsBInspect
List VaultCrux Memory Core topic groups with freshness metadata.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| items | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations, description carries full burden. It adds valuable context by specifying 'freshness metadata' is returned, hinting at data quality/timeliness aspects. However, omits pagination behavior, caching characteristics, and what 'freshness' specifically means.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence, front-loaded with verb. No wasted words. However, extreme brevity contributes to information gaps (parameters, usage context) rather than efficient communication.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Mentions key output characteristic (freshness metadata) which aligns with output schema existence, but fails to document the sole input parameter. Adequate for a simple read operation but leaves critical operational questions unanswered.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 0% (limit parameter is undocumented in schema), and description fails to compensate. No explanation of what 'limit' constrains (topics returned), default behavior, or recommended values despite being the sole parameter.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clear verb ('List') and specific resource ('VaultCrux Memory Core topic groups'). Distinguishes content type (topic groups with freshness metadata) from siblings like enumerate_memory_facts or query_memory. Could improve by contrasting explicitly with siblings.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance provided on when to use versus alternatives like enumerate_memory_facts, get_freshness_report, or query_memory. No mention of prerequisites or constraints beyond the schema.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
log_progressLog ProgressAInspect
Receipted write-through to PlanCrux's log endpoint. Appends a structured log entry to a task with optional evidence references and stage binding. Cannot change task or stage status (human-only), but records work done, findings, and blockers encountered.
| Name | Required | Description | Default |
|---|---|---|---|
| note | Yes | What was done | |
| task_id | Yes | PlanCrux task ID | |
| evidence | No | Evidence references for the log entry | |
| stage_id | No | Bind this log to a specific stage |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. Discloses key behavioral traits: 'Receipted write-through' implies acknowledgment/confirmation, 'appends' indicates non-destructive addition, and 'human-only' constraint clarifies permission boundaries. Missing minor details like idempotency or rate limits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences with zero waste. Front-loaded with mechanism ('Receipted write-through'), followed by core action, then constraints/content types. Every clause earns its place; no redundant or filler text.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 4 parameters with nested objects and existence of output schema, description provides sufficient context for correct invocation. Covers functionality, constraints, and content expectations. Since output schema exists, return values needn't be explained. Only minor gap is not mentioning the receipt/return structure.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 100% coverage, but description adds semantic value beyond technical definitions: explains that 'note' should contain 'work done, findings, and blockers' (content guidance), and clarifies 'evidence' and 'stage_id' are for 'optional evidence references and stage binding' (functional context).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verbs ('appends', 'records') with clear resource ('structured log entry to a task'). It explicitly distinguishes from siblings by clarifying 'Cannot change task or stage status (human-only)', ensuring the agent doesn't confuse this with task management or status update tools.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit negative constraints ('Cannot change task or stage status') that establish when NOT to use this tool. Implies appropriate use cases ('records work done, findings, and blockers'). Would be perfect if it named the specific tool for status updates, but the constraint is clearly stated.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
promote_skillPromote SkillAInspect
Promote a reviewed skill submission to Engine artifacts, making it retrievable via get_relevant_context. Only pending_review skills can be promoted. Returns the promoted artifact ID.
| Name | Required | Description | Default |
|---|---|---|---|
| visibility | No | Override target visibility | |
| review_notes | No | Notes from the reviewer | |
| submission_id | Yes | ID of the skill submission to promote |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. Discloses critical state constraint (pending_review only), side effect (becomes retrievable via get_relevant_context), and return value (promoted artifact ID). Missing permission requirements and irreversibility details, but covers primary behavioral traits.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences with zero waste: action/side-effect, constraint, return value. Front-loaded with primary verb and resource. Every sentence earns its place with specific operational details.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Appropriate for a state-transition tool with 3 parameters and output schema. Covers essential workflow constraint (pending_review status) and references retrieval mechanism. Could strengthen by mentioning predecessor step (submit_skill) or rejection alternative (dismiss_skill) given the rich sibling tool ecosystem.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, establishing baseline 3. Description implies submission_id via 'skill submission' but does not elaborate on parameter semantics, formats, or provide examples beyond what the schema already documents for visibility or review_notes.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description uses specific verb 'Promote' with resource 'skill submission' and scope 'to Engine artifacts'. Distinguishes from sibling tools by specifying the target state 'pending_review' and outcome 'making it retrievable via get_relevant_context', clearly differentiating it from submit_skill or dismiss_skill.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states constraint 'Only pending_review skills can be promoted', establishing when to use the tool. References sibling tool get_relevant_context as the retrieval mechanism. Lacks explicit contrast with alternatives like dismiss_skill for rejected submissions, though the state constraint implies correct usage.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
query_memoryQuery MemoryAInspect
Search the user's conversation memory. Returns ranked results with content, source timestamps, and confidence scores. For KNOWLEDGE UPDATE questions ('current', 'now', 'most recent'): make two calls — one with scoring_profile='balanced' and one with scoring_profile='recency' — then use the value from the most recent source_timestamp. For COUNTING questions ('how many', 'total'): results may not be exhaustive — search with varied terms and enumerate explicitly before counting. If all results score below 0.3, reformulate with synonyms or specific entity names from the question.
| Name | Required | Description | Default |
|---|---|---|---|
| limit | No | ||
| query | Yes | ||
| topic | No | ||
| date_to | No | ||
| agent_id | No | ||
| date_from | No | ||
| date_range | No | ||
| scoring_profile | No | ||
| confidence_threshold | No |
Output Schema
| Name | Required | Description |
|---|---|---|
| results | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden and effectively discloses key behavioral traits: results are ranked with confidence scores, source timestamps enable recency evaluation, and result sets may be non-exhaustive for counting operations. It does not explicitly state read-only safety or idempotency, but covers the critical search semantics (ranking, confidence thresholds, exhaustiveness limits) necessary for correct usage.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Despite length, every sentence delivers actionable guidance. Structure is front-loaded with core purpose ('Search...Returns...'), followed by three distinct operational patterns (Knowledge Update, Counting, Low Confidence) without filler. No tautology or repetition of the tool name; instead uses the space for specific conditional logic ('make two calls', 'enumerate explicitly').
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description provides sophisticated usage patterns that compensate partially for the lack of schema descriptions, but leaves major gaps regarding parameter semantics (temporal filters, agent/topic scoping). Given the presence of an output schema, it appropriately avoids detailing return structures, focusing instead on invocation patterns. However, for a 9-parameter tool with complex date filtering options, the lack of parameter guidance renders it minimally viable rather than comprehensive.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, requiring the description to compensate for all 9 parameters. While it excellently documents 'scoring_profile' (enumerating specific values and their use cases) and implicitly references 'confidence_threshold' (via the 0.3 threshold advice), it fails to explain 'date_from/date_to' vs 'date_range' redundancy, 'agent_id' filtering, 'topic' scoping, or 'limit' constraints. For a complex 9-parameter tool with temporal filtering and nested objects, this gap is significant.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description opens with the specific verb 'Search' and resource 'user's conversation memory', immediately clarifying scope. It distinguishes from sibling tools like 'enumerate_memory_facts' or 'build_timeline' by emphasizing ranked retrieval with confidence scores and timestamps, indicating this is a relevance-ranked search interface rather than an enumeration or reconstruction tool.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit operational patterns for specific query types: KNOWLEDGE UPDATE questions require dual calls with different scoring_profile values ('balanced' and 'recency'), while COUNTING questions require explicit enumeration warnings due to non-exhaustive results. Also specifies a concrete confidence threshold (0.3) triggering query reformulation, giving the agent clear decision criteria.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
reconstruct_knowledge_stateReconstruct Knowledge StateAInspect
Reconstruct what the system knew at a specific point in time. Returns both current and superseded artefacts as of that timestamp. Use for temporal reasoning: 'what was true in January?' vs 'what is true now?' Compare two calls at different timestamps to see what changed.
| Name | Required | Description | Default |
|---|---|---|---|
| decision_id | Yes | ||
| at_timestamp | No | ||
| include_superseded | No | ||
| include_confidence_landscape | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden. It discloses that the tool 'Returns both current and superseded artefacts' which explains the behavioral output. However, it lacks disclosure on performance implications of historical reconstruction, error conditions for invalid timestamps, or read-only safety guarantees.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three tightly constructed sentences with zero waste: purpose statement, behavioral disclosure, and usage guidelines. Front-loaded with the core action, flows logically from what it does to how to use it. Every sentence earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 4 parameters with 0% schema coverage and complex temporal logic, the description is incomplete regarding the required decision_id parameter and confidence landscape feature. However, it adequately covers the core temporal concept, and since an output schema exists, the lack of detailed return value explanation is acceptable.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 0%, requiring heavy description compensation. The description implicitly explains 'at_timestamp' ('at that timestamp') and 'include_superseded' ('superseded artefacts'), adding crucial semantic meaning. However, it fails to explain the required 'decision_id' parameter or 'include_confidence_landscape', leaving half the parameters undocumented.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description uses specific verb 'reconstruct' with clear resource 'what the system knew at a specific point in time.' It effectively distinguishes from siblings like get_versioned_snapshot or checkpoint_decision_state by emphasizing temporal reasoning capabilities and reconstruction of historical knowledge states.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit usage patterns: 'what was true in January?' vs 'what is true now?' and 'Compare two calls at different timestamps to see what changed.' This gives clear temporal reasoning context. Could be improved by explicitly naming alternative tools for current-state queries.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
record_decision_contextRecord Decision ContextCInspect
Record a decision context event in the CoreCrux Decision Plane. This is a mutation operation.
| Name | Required | Description | Default |
|---|---|---|---|
| context | Yes | ||
| agent_id | No | ||
| session_id | Yes | ||
| decision_id | Yes | ||
| occurred_at | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It identifies the operation as a 'mutation,' but fails to disclose side effects, persistence guarantees, idempotency characteristics, or what constitutes a valid 'decision context event' beyond the operation type.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is extremely concise at two sentences and 15 words. The first sentence efficiently captures the core purpose, while the second sentence ('This is a mutation operation') provides necessary behavioral context given the absence of annotations. However, extreme brevity becomes a liability given the tool's complexity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Despite having an output schema (excusing return value documentation), the description is inadequate for a 5-parameter tool with nested objects and no schema descriptions. It omits critical context such as the relationship between entities (session/decision/context), validation rules, and what constitutes a valid context payload.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 0% description coverage and contains a complex nested object structure (the 'context' parameter with additionalProperties). The description completely fails to compensate for this gap, providing no explanation of what data belongs in the free-form context object, how session_id relates to decision_id, or the purpose of optional parameters like agent_id and occurred_at.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description states the specific action (Record), resource (decision context event), and target system (CoreCrux Decision Plane). However, it fails to explicitly differentiate from similar sibling tools like 'checkpoint_decision_state' or 'log_progress', leaving ambiguity about when to use this specific recording mechanism versus alternatives.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides no guidance on when to use this tool versus alternatives, prerequisites for invocation, or conditions where it should be avoided. It does not reference the sibling retrieval tool 'get_decision_context' or explain the recording lifecycle.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
register_external_serviceRegister External ServiceAInspect
Register an external service and store its credential via Vault Transit. Human-only (admin or owner role). The credential is encrypted immediately on receipt and never stored in plaintext. Returns the service registration record without the credential.
| Name | Required | Description | Default |
|---|---|---|---|
| base_url | Yes | Service base URL (must be HTTPS) | |
| auth_type | Yes | Authentication type | |
| rate_limit | No | Rate limits for this service | |
| service_id | Yes | Tenant-unique identifier for the service | |
| display_name | Yes | Human-readable service name | |
| allowed_paths | No | Allowed path patterns (default: *) | |
| auth_injection | No | How to inject the credential into requests | |
| allowed_methods | No | Allowed HTTP methods (default: GET, POST) | |
| credential_value | Yes | The credential (encrypted immediately, never stored plaintext) |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full disclosure burden excellently: it specifies Vault Transit encryption, security guarantees ('encrypted immediately...never stored in plaintext'), authorization requirements, and return value behavior (record returned without credential).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four sentences, zero waste: opens with action, follows with authorization constraint, security details, and return behavior. Every clause delivers unique value beyond the schema.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Complete for a sensitive credential-management tool: covers purpose, security model, authorization, and return value redaction. Output schema exists, so detailed return structure doesn't need description, though the credential exclusion is helpfully noted.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
While the schema has 100% coverage, the description adds critical semantic context for the credential_value parameter (encryption guarantees and security handling) that raw schema descriptions cannot convey.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('Register an external service') and mechanism ('store its credential via Vault Transit'), distinguishing it from siblings like list_external_services (read vs. write) and request_credentialed_call (usage vs. registration).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit authorization constraint ('Human-only (admin or owner role)') that clearly restricts when the tool should be invoked, but does not explicitly name alternative tools for non-admin users or automated contexts.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
request_credentialed_callCredentialed External CallAInspect
Proxied external API call with server-side credential injection. VaultCrux retrieves the credential from Vault Transit, injects it per the service's auth template, makes the call, and returns the response. The credential never appears in your context window. Request and response are receipted and audit-logged.
| Name | Required | Description | Default |
|---|---|---|---|
| path | Yes | Request path (e.g. /v1/chat/completions) | |
| method | Yes | HTTP method | |
| headers | No | Additional headers (not auth) | |
| payload | No | Request body for POST/PUT | |
| service_id | Yes | Registered service ID | |
| session_id | No | Session ID for audit trail |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden and excels by disclosing: (1) the credential retrieval mechanism ('VaultCrux retrieves from Vault Transit'), (2) security isolation ('credential never appears in your context window'), and (3) side effects ('receipted and audit-logged'). This provides complete behavioral context beyond the structured schema.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, zero waste. The first establishes core function, the second explains the technical mechanism (Vault injection), and the third discloses security guarantees and audit properties. Information density is high with no redundant phrases.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the presence of an output schema (covering return values) and the description's thorough coverage of the security model, credential isolation guarantees, and audit side effects, the description is complete for a credential-proxy tool. The critical security and behavioral contexts are fully addressed despite zero annotations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
While the schema has 100% description coverage (establishing a baseline of 3), the description adds meaningful context by explaining how service_id relates to the Vault retrieval flow and how session_id connects to the audit trail mentioned. This elevates understanding of the parameter roles within the security architecture.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description opens with a specific verb-noun construction ('Proxied external API call') and immediately clarifies the unique mechanism ('server-side credential injection'). This clearly distinguishes it from sibling tools like register_external_service or list_external_services, which handle service lifecycle rather than proxied execution with credentials.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies prerequisites by referencing 'the service's auth template' and 'Registered service ID,' signaling that a service must be pre-registered. However, it does not explicitly name sibling alternatives (e.g., 'use register_external_service to configure new services before calling this') or explicit exclusion criteria.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
retract_skillRetract SkillAInspect
Retract a previously promoted skill. Sets the Engine artifact's living status to 'retracted', removing it from future retrieval results. Use when a skill is found to be incorrect or outdated.
| Name | Required | Description | Default |
|---|---|---|---|
| notes | No | Additional notes | |
| reason | Yes | Reason for retraction | |
| submission_id | Yes | ID of the skill submission to retract |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It successfully discloses the mechanism ('sets living status to retracted') and consequence ('removing it from future retrieval results'), but fails to address reversibility, authorization requirements, or side effects for this state-changing operation.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences with zero waste: first defines the action, second explains the technical mechanism, third states usage conditions. Information is front-loaded and appropriately sized.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 100% schema coverage and the existence of an output schema, the description adequately covers the business logic. However, for a destructive-status-changing tool with no annotations, it should explicitly address safety characteristics (e.g., reversibility) to be considered complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, establishing a baseline of 3. The description mentions 'previously promoted skill' which maps to submission_id conceptually, but adds no syntax, format, or semantic details beyond what the schema already provides for the parameters.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Retract') and resource ('previously promoted skill'), and distinguishes it from sibling tools like submit_skill by specifying it operates on promoted skills. However, it does not explicitly contrast with dismiss_skill, which appears to be a related sibling.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit guidance on when to use the tool ('when a skill is found to be incorrect or outdated'). Lacks explicit 'when not to use' guidance or named alternatives (e.g., when to use dismiss_skill instead), preventing a score of 5.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
session_debriefSession DebriefAInspect
Structured session-end reflection. Routes discoveries to appropriate capture tools (suggest_constraint, submit_skill, flag_for_review). Produces a receipted debrief record. Call before closing any session longer than 10 minutes.
| Name | Required | Description | Default |
|---|---|---|---|
| session_id | Yes | Session ID for the debrief | |
| discoveries | No | Discoveries made during the session | |
| suggested_actions | No | Actions to route from session discoveries | |
| assumptions_validated | No | Assumptions that were validated during the session | |
| assumptions_invalidated | No | Assumptions that were invalidated during the session |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden. It discloses key behavioral traits: routing to downstream tools and producing a 'receipted' record (write operation). However, it omits safety details like idempotency, partial failure handling, or whether routing calls are synchronous/transactions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four sentences with zero waste. Front-loaded with core function ('Structured session-end reflection'), followed by integration points, output artifact, and usage timing. Every sentence earns its place; no redundant or filler text.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given complex nested parameters (arrays of discovery/action objects) and presence of output schema, the description provides sufficient workflow context without replicating schema details. Mentions receipt generation covering the output aspect. Could improve by noting idempotency or validation behavior for the session_id parameter.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, providing detailed field descriptions (e.g., 'Actions to route from session discoveries'). The description adds workflow context implying how discoveries relate to suggested_actions, but does not augment individual parameter semantics beyond what the schema already documents. Baseline 3 appropriate for high-coverage schemas.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Specific verb+resource combination ('session-end reflection', 'routes discoveries', 'produces receipted debrief record'). Explicitly names sibling tools it integrates with (suggest_constraint, submit_skill, flag_for_review), clearly distinguishing it from those capture tools by identifying itself as the router/reflection mechanism.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit temporal trigger ('Call before closing any session longer than 10 minutes') and indicates workflow context (session-end). Lacks explicit negative guidance (when not to call) or alternative tools for shorter sessions, but the positive guidance is specific and actionable.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
submit_correctionSubmit CorrectionAInspect
Submit a correction for a knowledge item with evidence chain. The original item is never mutated — a versioned enrichment layer is created.
| Name | Required | Description | Default |
|---|---|---|---|
| evidence | Yes | ||
| correction_type | Yes | ||
| original_item_id | Yes | ||
| corrected_content | Yes | ||
| parent_receipt_id | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It successfully communicates the non-destructive nature of the operation ('original item is never mutated') and the versioning behavior ('versioned enrichment layer'), which is critical for agent decision-making.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description consists of exactly two high-value sentences with zero redundancy. The first establishes purpose and primary input requirement; the second discloses critical behavioral characteristics. Every word earns its place.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has an output schema (per context signals), the description appropriately omits return value details. However, with 5 parameters including a complex nested evidence array and 0% schema coverage, the description is insufficiently complete—it should explain parameter relationships or enum semantics to enable correct invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, requiring the description to compensate significantly. While it mentions 'evidence chain' (mapping to the evidence parameter), it fails to explain the four other parameters: correction_type enum values, original_item_id semantics, parent_receipt_id purpose, or corrected_content formatting.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool submits corrections for knowledge items using an evidence chain, specifying both the action and required input method. It distinguishes itself from direct mutation tools by mentioning the 'versioned enrichment layer' mechanism, though it doesn't explicitly contrast with specific siblings like update_constraint.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies prerequisites by mentioning 'evidence chain,' suggesting evidence is required. However, it lacks explicit guidance on when to use this versus alternatives (like update_constraint), or when corrections are preferred over other knowledge management operations.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
submit_skillSubmit SkillAInspect
Submit a procedural workflow skill discovered during work. Pro+ private skills auto-approve; Starter skills enter a review queue. ATAM injection scanning runs automatically — quarantined skills cannot be promoted. Returns submission ID, approval status, and scan results.
| Name | Required | Description | Default |
|---|---|---|---|
| title | Yes | Short title summarising the skill | |
| run_id | No | AgentCrux run ID | |
| content | Yes | Full procedural skill content (markdown) | |
| agent_role | No | Role of the submitting agent | |
| session_id | No | Session ID for provenance tracking | |
| skill_domains | No | Knowledge domains this skill applies to | |
| discovery_context | No | How/where the skill was discovered | |
| target_visibility | No | Visibility scope for the skill | private |
| skill_tool_references | No | Tool names this skill references | |
| supersedes_artifact_id | No | Artifact ID this skill replaces | |
| skill_trigger_description | No | When this skill should be activated |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden and discloses significant behavioral traits: ATAM injection scanning, tier-based approval differences, quarantine consequences, and return value structure (submission ID, status, scan results). Minor gap on idempotency or error handling specifics.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Four dense sentences with zero waste: (1) purpose definition, (2) approval tier behavior, (3) security scanning and promotion constraints, (4) return values. Information is front-loaded with the action verb and every clause delivers necessary behavioral or workflow context.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given 11 parameters and a complex approval workflow, the description adequately covers the submission lifecycle, security implications, and output structure. It compensates well for missing annotations. Minor enhancement would be explicit mapping to the skill lifecycle (submit → assess → promote/dismiss/retract).
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, establishing a baseline of 3. The description adds context that 'content' contains procedural workflow markdown and implies 'discovery_context' relates to work discovery, but does not elaborate on parameter formats, valid ranges, or constraints beyond the schema definitions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the core action ('Submit a procedural workflow skill') and resource type ('discovered during work'). It effectively distinguishes from sibling tools by mentioning the promotion quarantine constraint, positioning this as the initial submission entry point before using promote_skill.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear behavioral guidance on approval flows (Pro+ auto-approve vs Starter review queue) and notes the quarantine constraint that blocks promotion. However, it could explicitly state 'use this for initial submission before promote_skill' to fully clarify the lifecycle sequence against siblings like dismiss_skill and retract_skill.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
suggest_constraintSuggest ConstraintAInspect
Propose an organisational constraint discovered during work for human review. Agents can suggest boundaries, policies, or context flags they discover — humans decide whether to promote them to active constraints. Low barrier (1 credit); authority gate is on promotion, not suggestion.
| Name | Required | Description | Default |
|---|---|---|---|
| scope | No | ||
| evidence | No | ||
| severity | No | ||
| assertion | Yes | ||
| confidence | No | ||
| session_id | No | ||
| constraint_type | Yes | ||
| discovery_context | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full disclosure burden and succeeds in explaining cost ('1 credit'), permission model ('authority gate is on promotion'), and the draft/pending nature of suggestions. However, it omits whether suggestions can be retracted, if duplicates are handled, or the exact lifecycle state created.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three efficient clauses progress logically: purpose (propose constraint), workflow (human promotion authority), and constraints/cost (1 credit, authority gate). Every sentence earns its place with zero redundancy or tautology.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description adequately covers the conceptual model and approval workflow for an 8-parameter tool with nested objects. However, given 0% schema coverage and complex parameters (nested scope/evidence objects, enums), the lack of parameter semantic guidance creates a significant operational gap, though the output schema reduces the need for return value explanation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, requiring significant description compensation. While it maps 'boundaries, policies, or context flags' to the constraint_type enum (partially), it fails to explain the 7 other parameters including required fields 'assertion' (what format?), 'scope' (nested object structure), 'evidence' (what constitutes valid evidence?), and 'discovery_context.'
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool 'Propose[s] an organisational constraint discovered during work for human review,' using a specific verb-resource pair. It effectively distinguishes from sibling tools like 'declare_constraint' by emphasizing that 'humans decide whether to promote them to active constraints,' clarifying this is a suggestion mechanism rather than direct creation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly defines the workflow boundary: 'authority gate is on promotion, not suggestion' with 'Low barrier (1 credit).' This establishes when to use this tool (discovering potential constraints during work) versus alternatives like declare_constraint (when authority exists to create active constraints immediately). The human-in-the-loop requirement is clearly stated.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
update_constraintUpdate ConstraintBInspect
Update an existing constraint. Content changes create a new version (append-only). Status-only changes update in place. This is a mutation operation.
| Name | Required | Description | Default |
|---|---|---|---|
| scope | No | ||
| status | No | ||
| evidence | No | ||
| severity | No | ||
| assertion | No | ||
| expires_at | No | ||
| constraint_id | Yes | ||
| assertion_structured | No | ||
| review_interval_days | No |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. Effectively discloses critical versioning behavior (append-only content changes vs in-place status updates) and explicitly warns 'This is a mutation operation'. Missing error handling and auth requirements.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences with zero waste: purpose statement, versioning logic, and mutation warning. Front-loaded and efficiently structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Inadequate for the complexity: 9 undocumented parameters with nested objects and enums require description compensation that isn't provided. Output schema exists (per context signals), but input parameter semantics are insufficiently covered for safe invocation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema has 0% description coverage across 9 complex parameters including nested objects. Description only provides high-level categorical mapping ('Content' vs 'Status') without explaining specific parameters, enums (active/suspended/superseded/expired), or nested object structures (scope, evidence).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
States specific action (Update) and resource (constraint) clearly. Distinguishes operation type through versioning behavior (append-only vs in-place), though could better differentiate from sibling 'declare_constraint' explicitly.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides internal usage guidance distinguishing content changes from status-only changes. However, lacks explicit comparison to siblings (e.g., when to use 'declare_constraint' vs 'update_constraint') and prerequisites like constraint lookup.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
verify_before_actingVerify Before ActingAInspect
Pre-action and pre-conclusion verification gate. Checks Shield policy, org constraints, watch alerts, knowledge pressure, and memory freshness. Returns a combined verdict: proceed, warn, require_approval, or block. Use before committing to an answer when the stakes are high or when your evidence is thin — it catches constraint conflicts and stale-context risks that query_memory alone won't surface.
| Name | Required | Description | Default |
|---|---|---|---|
| team_id | No | ||
| metadata | No | ||
| tool_name | Yes | ||
| is_mutation | No | ||
| publisher_id | No | ||
| server_digest | No | ||
| target_resources | No | ||
| action_description | Yes |
Output Schema
| Name | Required | Description |
|---|---|---|
No output parameters | ||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden and discloses the verdict taxonomy (proceed, warn, require_approval, or block) and the specific subsystems interrogated (Shield, constraints, alerts, etc.). It could improve by explicitly stating whether this is a read-only check or if it modifies server state, though 'pre-action verification' implies advisory behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Every sentence earns its place: the first establishes identity and function, the second enumerates specific checks and return values, and the third provides usage context and differentiation. No redundant or filler text; information density is high without being verbose.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a complex orchestration tool with 8 parameters and numerous siblings, the description adequately covers the behavioral contract and decision criteria without needing to exhaustively detail return values (since an output schema exists). The only material gap is the lack of parameter documentation, which prevents a perfect score given the high parameter count and zero schema coverage.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Given 0% schema description coverage across 8 parameters, the description fails to compensate by explaining critical inputs like is_mutation, target_resources, server_digest, or metadata. While the description implies an action is being verified, it does not document what action_description should contain or how tool_name relates to the verification scope.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly defines the tool as a 'Pre-action and pre-conclusion verification gate' with specific checks against Shield policy, org constraints, watch alerts, knowledge pressure, and memory freshness. It effectively distinguishes itself from siblings by contrasting with query_memory, explicitly stating it catches risks that query_memory alone won't surface.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit contextual guidance: 'Use before committing to an answer when the stakes are high or when your evidence is thin.' It identifies a specific alternative (query_memory) and explains the unique value proposition (catches constraint conflicts and stale-context risks), giving clear criteria for when to invoke this tool versus simpler memory queries.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail — every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control — enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management — store and rotate API keys and OAuth tokens in one place
Change alerts — get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption — public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics — see which tools are being used most, helping you prioritize development and documentation
Direct user feedback — users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!