mcp-revenue-empire — Japan public-data ledgers

Server Details

Tamper-evident daily time-series ledgers of Japanese public data: subsidies, public comments (e-Gov), research grants (JST), public bids (kkj), regulatory sanctions (FSA), and licensed-entity registries (FSA). Provides search, full history/timeline, recent changes, and hash-chain verification. No auth required for reads.

Status: Healthy
Last Tested: 2026-07-30 07:32
Transport: Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

B3.1/5.0

Tool DescriptionsB

Average 3.5/5 across 147 of 147 tools scored. Lowest: 2.3/5.

Server CoherenceA

Disambiguation4/5

Most tools are clearly distinguished by domain prefixes (e.g., bid_watch, grant_watch) and specific action verbs. However, the high number of similarly structured watch tools could still cause confusion, though descriptions clarify exact purposes.

Naming Consistency5/5

Every tool follows a consistent `domain_subdomain_action` pattern with underscores, e.g., `agent_audit_query`, `bid_watch_search`. Even long names like `commerce_catalog_agent_readiness_score` adhere to this structure.

Tool Count2/5

With 147 tools, the server is far too broad, covering weather, carbon estimates, domain intel, and more—well beyond its stated 'Japan public-data ledgers' scope. This sheer volume overwhelms agents and dilutes focus.

Completeness3/5

The server offers many read-only tools for Japanese public data (bids, grants, licenses, etc.), but lacks create/update/delete operations for those domains. Additionally, numerous unrelated tools (e.g., carbon estimates, weather) feel tacked on, leaving gaps in core coverage.

Available Tools

147 tools

agent_audit_queryCInspect

Query agent actions with filters

ParametersJSON Schema

Name	Required	Description	Default
`to`	No
`from`	No
`limit`	No
`agentId`	No
`riskMin`	No
`sessionId`	No
`actionType`	No

Tool Definition Quality

C2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full responsibility for behavioral disclosure. It states 'Query' (implying read-only) and 'with filters' but gives no details on pagination, limits, error handling, or what happens with invalid parameters. This is insufficient for an agent to understand the tool's behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely short (4 words), which could be seen as concise, but it lacks essential details. It does not earn its place as it fails to provide enough information for effective tool invocation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (7 parameters, no schema descriptions, no output schema, no annotations), the description is severely incomplete. It neglects to describe return format, parameter formats, or any constraints, making it nearly useless for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description does not explain any of the 7 parameters (to, from, limit, agentId, riskMin, sessionId, actionType). The phrase 'with filters' is too vague to add meaning, leaving the agent to guess parameter purposes from names alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Query agent actions with filters' identifies the verb (query) and resource (agent actions), and implies filtering. However, it lacks specificity about what exactly constitutes 'agent actions' and does not clearly differentiate from sibling tools like agent_audit_record or agent_audit_report, though the verb 'query' suggests a list operation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. There is no mention of context, prerequisites, or exclusions, leaving the agent to infer usage from the name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_audit_recordCInspect

Record an agent action for audit and compliance

ParametersJSON Schema

Name	Required	Description	Default
`input`	No
`output`	No
`agentId`	Yes
`metadata`	No
`sessionId`	No
`actionName`	Yes
`actionType`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, and description fails to disclose side effects, permission requirements, or behavioral traits such as idempotency or rate limits. The description only states the action without behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One sentence with 8 words is concise, but it sacrifices valuable information. It earns its place only minimally; the brevity leads to ambiguity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, no annotations, and no parameter explanations, the description is inadequate for a tool recording audit data. Missing details on record structure, required fields, and expected behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage, and the description adds no meaning to any of the 7 parameters. The agent must infer parameter semantics solely from names and types.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Record an agent action for audit and compliance,' which communicates a write operation distinct from query/report siblings. However, 'agent action' is vague and could be more specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like agent_audit_query or agent_audit_report. No context on prerequisites or expected use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_audit_reportCInspect

Generate audit report (json/markdown/soc2 format)

ParametersJSON Schema

Name	Required	Default
`to`	Yes
`from`	Yes
`format`	No	markdown
`agentId`	Yes

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden but only states the basic action. It does not disclose behavioral traits such as whether the operation is read-only or destructive, what happens with the report (e.g., returned or stored), or any auth requirements. The agent cannot infer safety or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no unnecessary words. It conveys the core action and available formats efficiently. However, it could benefit from slight restructuring (e.g., listing required parameters) without becoming verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema and annotations, the description is incomplete. It does not explain what the audit report contains, the purpose of agentId and date range, or any output format details. The tool generates a report, but an agent would need to infer return type and behavior from the name alone.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning the description does not explain any parameters beyond what the schema provides. The description mentions format options but does not elucidate agentId, from, or to (e.g., what agentId represents, the date-time range format). The default value and enum for format are already in the schema, so no added value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the action ('generate audit report') and lists the available formats (json, markdown, soc2), which matches the input schema enum. It clearly distinguishes from sibling tools like 'agent_audit_query' (queries existing audits) and 'agent_audit_record' (records an event), as this tool generates a report.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. Sibling tools exist (e.g., agent_audit_query, agent_audit_record) but are not mentioned or contrasted. The description does not specify prerequisites, such as the need for a prior audit record or the scope of the report.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_captcha_solveBInspect

Solve a CAPTCHA for a domain you own or have explicit permission to access

ParametersJSON Schema

Name	Required	Description	Default
`type`	Yes
`action`	No
`pageUrl`	Yes
`siteKey`	No
`question`	No
`imageBase64`	No

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose behavioral traits. It only mentions permission requirements but omits details like return format, error handling, rate limits, or destructive potential. This is a significant gap for an agent deciding to invoke the tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single efficient sentence that front-loads the main action and constraint. However, it could benefit from additional structured detail without losing conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, no behavioral details, no parameter descriptions, and 6 parameters including optional ones, the description is severely incomplete. An agent would struggle to understand how to properly provide inputs like 'imageBase64' or which 'type' to choose for a given CAPTCHA.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 6 parameters with 0% coverage in descriptions. The description adds no explanation of parameter meanings, such as the enum values for 'type' (e.g., differences between recaptcha_v2 and image_text). Despite the self-explanatory names, the lack of guidance forces the agent to guess correct usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool solves a CAPTCHA for owned/permitted domains. It uses the specific verb 'solve' and resource 'CAPTCHA', and the permission constraint distinguishes it from related tools like agent_captcha_verify_domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description 'for a domain you own or have explicit permission to access' provides clear context on when to use the tool (i.e., when authorized). It implies exclusions for unauthorized domains, though no explicit alternatives or 'when not to use' guidance is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_captcha_verify_domainAInspect

Verify ownership of a domain before using CAPTCHA solving

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes
`method`	Yes

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, but description indicates a non-destructive verification operation. Lacks details on side effects or failure behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded, no unnecessary words. Efficient communication of purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, no annotations, and 0% parameter descriptions, description is too minimal. Missing behavioral and parameter details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage, description adds no meaning to parameters. It does not explain 'domain' or 'method', relying solely on the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool verifies domain ownership and positions it as a prerequisite for CAPTCHA solving. Distinguishes from sibling agent_captcha_solve.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage context by stating 'before using CAPTCHA solving', but does not explicitly mention alternatives or when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_identity_activitiesBInspect

List activity records for an identity, newest first (owner only)

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`identityId`	Yes

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It mentions ordering ('newest first') and access restriction ('owner only'), but fails to disclose pagination behavior, what 'activity records' contain, error handling, or rate limits. This is insufficient for a tool with no annotation support.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is one sentence of 10 words, front-loaded and efficient. However, it omits necessary parameter details, so conciseness comes at the cost of completeness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with two parameters and no output schema, the description covers core purpose, ordering, and access restriction. However, it lacks parameter documentation and output shape, leaving gaps for an agent to use it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, yet the description does not explain the 'limit' parameter (pagination, default) or explicitly map 'identityId' to the identity. It adds minimal value beyond the schema field names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('List'), the resource ('activity records for an identity'), and key properties ('newest first', 'owner only'). It effectively distinguishes from siblings like agent_identity_record (likely single record) and agent_identity_lookup (likely search).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool is for listing activities of an identity, but does not provide explicit guidance on when to use it versus alternatives, nor does it mention exclusions or prerequisites beyond 'owner only'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_identity_badgeCInspect

Get the issuer-signed badge and signed fields for an identity

ParametersJSON Schema

Name	Required	Description	Default
`identityId`	Yes

Tool Definition Quality

C2.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, and the description fails to disclose behavioral traits such as being read-only, auth requirements, or rate limits. As a get operation, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence with no wasted words. However, it could convey more useful information without increasing length.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity of the tool (one parameter, no output schema, no annotations), the description lacks necessary details about what a 'badge' or 'signed fields' are, how to provide the identityId, and what the return value looks like.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description does not add any meaning to the single parameter 'identityId'. It neither explains the format nor provides examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get') and the resource ('issuer-signed badge and signed fields for an identity'), distinguishing it from sibling tools like agent_identity_lookup or agent_identity_record.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description does not mention when-not-to-use or provide context for selecting among identity-related siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_identity_lookupCInspect

Look up an identity. Returns signatureValid (issuer+integrity only, NOT an authenticity/safety signal) and a disclaimer.

ParametersJSON Schema

Name	Required	Description	Default
`identityId`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses that signatureValid is only a signature integrity signal, not an authenticity/safety check, and mentions a disclaimer. It adds moderate context beyond the name, but lacks details on error handling or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise, with a single sentence for the primary action and a clear note about the return values. No wasted words, and the most important information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple lookup tool with one parameter and no output schema, the description covers the basic purpose and the key nuance of the return value. However, it omits the structure of the response, potential errors, and the format of identityId, leaving some gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, yet the description provides no information about the identityId parameter (e.g., format, type beyond schema). It fails to compensate for the low schema coverage, leaving the agent with no guidance on how to provide the identity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Look up an identity' and specifies the return values (signatureValid and disclaimer), providing a clear verb-resource pairing. However, it does not differentiate from sibling tools like agent_identity_record, so it falls short of a 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The note about signatureValid not being an authenticity signal provides an indirect caveat but no positive selection criteria or comparison to other identity tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_identity_recordBInspect

Append a hash-chained activity record (owner only). Optional provenance (repo/version/config) is self-reported.

ParametersJSON Schema

Name	Required	Description
`content`	No
`identityId`	Yes
`provenance`	No	Self-reported origin of the activity (NOT verified)
`activityType`	Yes

Tool Definition Quality

B3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description discloses key traits: hash-chaining implies immutability and append-only behavior, 'owner only' scopes access, and provenance is self-reported and unverified. This provides good behavioral context beyond a simple 'append record'.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, concise but not optimized for quick scanning. It front-loads the core action but omits important context like required parameters and record structure.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters (2 required), no output schema, and complex nested objects, the description is insufficient. It doesn't explain the content field, return value, error conditions, or how 'hash-chained' affects usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 25% (provenance described). The description adds little beyond restating provenance as optional self-reported. The content, identityId, and activityType parameters remain undocumented in both schema and description, leaving ambiguity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (append) and resource (hash-chained activity record), and specifies 'owner only' which distinguishes it from tools like agent_identity_register or agent_audit_record.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The phrase 'owner only' implies restricted scope but doesn't explain when not to use it or what alternatives exist for other users.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_identity_registerAInspect

Register an agent and get a unique identity ID + issuer-signed badge. agent_name/metadata are self-reported and unverified.

ParametersJSON Schema

Name	Required	Description	Default
`metadata`	No
`agentName`	Yes
`publicKey`	No

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It correctly discloses that agent_name and metadata are self-reported and unverified. However, it omits other behavioral aspects such as whether the tool is idempotent, what happens on duplicate agentName, or response structure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no unnecessary words. The key information is front-loaded: the action and output, then the caveat about data verification.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of annotations, output schema, and the presence of 3 parameters (one required, with nested object), the description is insufficient. It does not specify the return format, parameter roles for publicKey, or any preconditions/errors.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must clarify each parameter. It only mentions agentName and metadata (noting they are self-reported) but omits the publicKey parameter entirely. It does not explain the metadata object structure or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (register an agent) and the result (unique identity ID + issuer-signed badge). It distinguishes from siblings like agent_identity_lookup by focusing on creation. The note about self-reported and unverified data adds specificity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (to register an agent) but does not explicitly state when not to use or mention alternatives. Given the sibling tools, an agent might benefit from knowing that agent_identity_lookup is for retrieval, but this is not stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_memory_deleteCInspect

Delete a memory or all memories in a namespace

ParametersJSON Schema

Name	Required	Description	Default
`key`	No
`agentId`	Yes
`namespace`	No

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. Only states the action without disclosing irreversible nature, permissions needed, or what happens on success/failure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence is concise but lacks necessary detail. Could be restructured to include parameter hints without much length increase.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Incomplete for a delete operation with 3 parameters and no output schema or annotations. Missing key usage (how to delete single vs all), and return behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% (no parameter descriptions), and the description does not explain any parameters. Agent cannot understand the role of key, agentId, or namespace beyond basic names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states verb 'Delete' and resource 'memory or all memories in a namespace', distinguishing it from siblings like get, search, store. Purpose is unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage for deleting memories but provides no guidance on when to delete single vs all, no alternatives, no prerequisites or context about namespace usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_memory_getCInspect

Retrieve a stored memory by key

ParametersJSON Schema

Name	Required	Default
`key`	Yes
`agentId`	Yes
`namespace`	No	default

Tool Definition Quality

C2.4/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, and the description does not disclose behavioral traits such as what happens if the key doesn't exist, whether it is read-only, or any side effects. This is a significant gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single short sentence, which is concise but lacks necessary information. It could include more context without becoming verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With three parameters, no output schema, and no annotations, the description is highly incomplete. It does not explain what the return value is, the nature of the memory, or any constraints.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, meaning no parameter descriptions exist. The tool description does not explain the meaning or usage of the three parameters (agentId, key, namespace). It adds no value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Retrieve a stored memory by key', which uses a specific verb and resource. It distinguishes from sibling tools like agent_memory_store, agent_memory_delete, and agent_memory_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The agent must infer that it is for retrieving a specific memory by key, but there is no mention of when to use search or other siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_memory_searchCInspect

Search memories by prefix, tags, or type

ParametersJSON Schema

Name	Required	Default
`tags`	No
`type`	No
`limit`	No
`agentId`	Yes
`keyPrefix`	No
`namespace`	No	default

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are present, so the description carries the full burden of disclosing behavior. It does not state that the operation is read-only, whether it requires specific permissions, or any side effects. The agent cannot infer safety or cost from this description.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with no wasted words. It immediately tells the agent what the tool does (search) and the resource (memories) followed by the key filters.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 6 parameters and no output schema, the description is insufficient. It does not explain return values, pagination via limit, or the role of namespace. The agent lacks context to craft a complete call (e.g., required agentId is not mentioned in description, only in schema).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaning by naming the search criteria (prefix, tags, type), which map to parameters keyPrefix, tags, and type. However, it omits agentId, limit, and namespace, leaving 3 of 6 parameters unexplained. With 0% schema description coverage, the description partially compensates but is incomplete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool searches memories by prefix, tags, or type, clearly identifying the action (search) and resource (memories). It distinguishes from siblings like agent_memory_get (single memory retrieval) and agent_memory_store (storage), but could be more explicit about the resource being agent-specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool over alternatives. It does not mention when to use search vs get, nor does it specify prerequisites or excluded scenarios (e.g., exact match versus semantic search).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_memory_storeBInspect

Store a memory for an AI agent (key-value, with TTL and metadata)

ParametersJSON Schema

Name	Required	Description	Default
`key`	Yes
`value`	Yes	Any JSON value
`agentId`	Yes	Agent identifier
`metadata`	No
`namespace`	No		default
`ttlSeconds`	No

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose behavioral traits. It mentions 'key-value, with TTL and metadata', but omits critical details such as whether storing overwrites existing keys (upsert behavior), what is returned on success, or required permissions. The agent lacks clarity on side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence of 12 words, front-loaded with the action and resource, parenthetical adds key details. No redundancy, every phrase earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 6 parameters, no output schema, and no annotations, the description is too spare. It does not explain return values, error conditions, namespace behavior, or idempotency. The agent cannot reliably infer how to invoke the tool or what to expect from it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 33%, only defining agentId and value. The description adds context for 'key', 'value', 'TTL' (ttlSeconds), and 'metadata', partially compensating for missing parameter descriptions. However, 'namespace' and 'agentId' remain unexplained beyond the schema, and the description does not provide explicit parameter mappings.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Store a memory for an AI agent' with verb 'store' and resource 'memory'. It distinguishes from sibling tools like agent_memory_delete, agent_memory_get, and agent_memory_search by implying creation/update versus retrieval or deletion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives (e.g., agent_memory_search for retrieval, agent_memory_delete for deletion). The description does not specify prerequisites, best practices, or context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_proxy_fetchCInspect

Fetch a URL via a rotating proxy (region/type selectable). robots.txt enforced.

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes
`body`	No
`type`	No
`method`	No
`region`	No
`headers`	No
`sessionId`	No

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description should disclose behavioral traits. It mentions rotating proxy and robots.txt enforcement but omits crucial details like redirect handling, default method, rate limiting, error handling, and session usage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The two-sentence description is short but lacks structure. It is not wasteful but fails to convey necessary information efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (7 parameters, no annotations, no output schema, 0% schema description), the description is completely inadequate. It does not explain return values, behavior, errors, or parameter relationships.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description only vaguely mentions region and type being selectable, without explaining any of the 7 parameters (e.g., url, body, headers, sessionId). The description adds no value beyond the raw schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool fetches a URL via a rotating proxy with selectable region/type and enforces robots.txt. This is a specific verb+resource combination that distinguishes it from sibling tools like agent_proxy_session.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives (e.g., agent_webhook_poll), nor any prerequisites or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_proxy_sessionCInspect

Create a sticky proxy session (same IP for multiple requests)

ParametersJSON Schema

Name	Required	Description	Default
`type`	No
`region`	No
`ttlSeconds`	No

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must disclose all behavioral traits. It only mentions stickiness and IP reuse, but omits details on session lifecycle, error conditions, or impact of parameters like ttlSeconds (e.g., what happens on expiry).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, highly concise sentence. It front-loads the core action and differentiator. However, it sacrifices completeness for brevity, which slightly lowers the score.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of output schema, annotations, and parameter descriptions, the tool description is critically incomplete. It does not explain return values (e.g., session ID, IP address), parameter details, or behavioral context needed for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning the description must explain each parameter. It does not address type (e.g., residential vs. datacenter), region (format or allowed values), or ttlSeconds (beyond a numeric default). The agent gains no additional meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Create' and the resource 'sticky proxy session', and specifies the key benefit 'same IP for multiple requests'. This effectively distinguishes it from sibling tools like agent_proxy_fetch, which likely provides a single-use proxy.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for scenarios needing persistent IP, but it does not explicitly state when to use this tool versus alternatives (e.g., agent_proxy_fetch), nor does it mention any prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_tempmail_createCInspect

Create a temporary email address (auto-expires)

ParametersJSON Schema

Name	Required	Description	Default
`ttlSeconds`	No
`preferredPrefix`	No

Tool Definition Quality

C2.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description must disclose behaviors. Only mentions auto-expiry but omits details like default TTL, rate limits, or what happens on expiration.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence with no waste. It could be slightly more informative without losing conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Does not mention return value, output format, or effect beyond creating. Lacks completeness for a create tool with no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% (description says nothing about parameters). The description does not explain ttlSeconds or preferredPrefix, which is essential since schema lacks descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (create) and resource (temporary email address) with auto-expiration note. It distinguishes from siblings like agent_tempmail_get, agent_tempmail_list, and agent_tempmail_wait.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives. The description lacks context for appropriate usage scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_tempmail_getCInspect

Get full message content with extracted verification links/codes

ParametersJSON Schema

Name	Required	Description	Default
`mailboxId`	Yes
`messageId`	Yes
`includeRaw`	No

Tool Definition Quality

C2.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description must fully communicate behavioral traits. It mentions that the tool returns 'full message content' and 'extracted verification links/codes', suggesting parsing. However, it fails to disclose whether the operation is read-only, if authentication is required, rate limit implications, or any side effects. This lack of transparency is a significant gap for a tool that retrieves content.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that conveys the core purpose without extraneous words. However, it could be improved by adding parameter details without losing conciseness. It earns its place but is slightly under-informative given the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of annotations and output schema, the description is incomplete. It does not mention return format, error conditions, or relationship to sibling tools beyond implicit differentiation. The tool is straightforward, but the description leaves significant gaps in operational context, especially regarding parameter semantics and behavioral expectations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description does not explain any of the three parameters (mailboxId, messageId, includeRaw). The description only hints at the output content but adds no meaning to the input schema. The agent receives no guidance on what values are valid, the role of includeRaw, or the format of IDs, which is critically insufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get' and the resource 'full message content', and specifies the extraction of verification links/codes, which differentiates it from sibling tools like agent_tempmail_list (which likely retrieves metadata) and agent_tempmail_wait (which waits for messages). This makes the tool's purpose highly specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, such as when to choose get over list or wait. It does not mention prerequisites, use cases, or conditions for using optional parameters like includeRaw, leaving the agent to infer usage without context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_tempmail_listCInspect

List received messages in a mailbox

ParametersJSON Schema

Name	Required	Description	Default
`after`	No
`limit`	No
`mailboxId`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, and the description only says 'list'. It doesn't disclose whether this is a read-only operation, pagination behavior, or any side effects. Very little behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise at one sentence, but at the cost of missing critical information. It's not overly verbose, but it sacrifices completeness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 3 parameters and no output schema, the description is too sparse. It doesn't explain the return format, default behavior (e.g., limit=50), or how after filters messages. Missing essential details for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning the description does not explain any of the three parameters (mailboxId, after, limit). This leaves the agent uninformed about required vs optional fields or their meanings.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'List received messages in a mailbox', which is a specific verb+resource. It distinguishes from siblings like agent_tempmail_create and agent_tempmail_get without explicit differentiation but still clear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like agent_tempmail_get or agent_tempmail_wait. No context on prerequisites or limitations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_tempmail_waitCInspect

Wait for an incoming message (long polling, max 60s)

ParametersJSON Schema

Name	Required	Description	Default
`mailboxId`	Yes
`fromContains`	No
`timeoutSeconds`	No
`subjectContains`	No

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose behavioral traits. While it mentions long polling and a 60s timeout, it omits critical details like what happens on timeout (returns null/error?), whether the operation is cancellable, and if it consumes messages or leaves them intact.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that efficiently conveys the core function. However, it omits enough detail that the conciseness comes at the cost of completeness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations, no output schema, and four undocumented parameters, the description is far from complete. An agent lacks critical information about return values, error handling, and proper parameter usage to invoke this tool correctly without external knowledge.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description adds no meaning to the four parameters. Key aspects like default timeout (30s vs the noted 60s max), filtering semantics (fromContains, subjectContains), and the required mailboxId's format or source are left entirely to the agent to infer from schema names alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Wait') and resource ('incoming message') with an important qualifier ('long polling, max 60s'). It implies a blocking operation distinct from sibling tools like agent_tempmail_list or agent_tempmail_get, which are likely immediate/retrieval operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is given on when to use this tool versus other tempmail operations or alternatives. The agent receives no context about prerequisites, polling strategies, or whether to prefer synchronous waiting vs manual polling.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_trust_batchCInspect

Get trust scores for multiple subjects in one call (max 100)

ParametersJSON Schema

Name	Required	Description	Default
`subjects`	Yes

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose behavioral traits. It only mentions a max 100 constraint but lacks details on idempotency, side effects, rate limits, or authentication needs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very short and front-loaded, which is good for conciseness, but it is under-specified and lacks detail that would justify its brevity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of a batch operation with nested parameters and no output schema, the description is insufficient. It does not cover error handling, output format, or parameter semantics.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% and the description does not explain what 'subjects' are, the meaning of 'type' and 'value', or expected formats. It adds no value beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get' and resource 'trust scores' with a specific constraint 'multiple subjects in one call (max 100)'. This distinguishes it from singlesubject tools like 'agent_trust_score'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies batch usage but does not explicitly state when to use it versus alternatives like 'agent_trust_score'. No when-not or explicit alternative guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_trust_feedbackCInspect

Submit feedback about an agent/wallet (positive or negative)

ParametersJSON Schema

Name	Required	Description	Default
`rating`	Yes
`category`	Yes
`evidence`	No
`subjectType`	Yes
`subjectValue`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, and the description only states that feedback is submitted. It fails to disclose behavioral traits such as idempotency, authorization needs, rate limits, or whether feedback can be modified or deleted.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, which is concise, but it sacrifices valuable information. It is not optimally structured for clarity; the phrase 'positive or negative' could be replaced with specific parameter details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 5 parameters (4 required), including a nested evidence object and no output schema. The description is far too minimal—it omits input format, return values, and behavior, making it inadequate for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description adds almost no parameter context. It mentions 'positive or negative' but does not map this to the rating parameter (integer -2 to 2). No explanation of subjectType, category, or evidence.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (submit feedback) and the resource (agent/wallet) and mentions positive/negative feedback. However, it lacks differentiation from sibling tools like agent_trust_batch (batch submission) and agent_trust_score (querying score).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description does not mention batching or scoring, leaving the agent without decision context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_trust_scoreBInspect

Get trust score for a wallet, agent card URL, or domain

ParametersJSON Schema

Name	Required	Description	Default
`subjectType`	Yes
`subjectValue`	Yes
`includeDetails`	No

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided; description lacks behavioral details such as authentication requirements, rate limits, or response structure, which is critical for a read operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with verb, no wasted words. However, more structure could improve readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Without output schema or annotations, the description leaves gaps: no definition of trust score, no explanation of includeDetails, and no indication of response format.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, and the description only lists subject types without explaining subjectValue or includeDetails, failing to add meaningful context beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Get trust score' and lists three specific subject types (wallet, agent card URL, domain), distinguishing from related tools like batch query or feedback.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage for single trust score queries but gives no explicit guidance on when to choose this tool over siblings like agent_trust_batch or agent_trust_feedback.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_webhook_createCInspect

Create a webhook endpoint that relays requests to your agent

ParametersJSON Schema

Name	Required	Description	Default
`agentId`	No
`pushUrl`	No
`ttlSeconds`	No
`description`	No
`deliveryMode`	Yes
`transformRules`	No

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must fully disclose behavioral traits, but it only states the creation action. It fails to mention any side effects, return values, authentication needs, or rate limits, leaving significant gaps in understanding the tool's behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise at one sentence, but it is too sparse. It could be more structured while maintaining conciseness, such as mentioning key parameters or what the tool returns.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (six parameters, nested objects, no output schema), the description covers only the high-level action. It lacks details about parameter usage, return values, and operational context, making it inadequate for an agent to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds no information about the six parameters, which have 0% schema coverage. Even the enum 'deliveryMode' is not explained. The agent must rely on parameter names and enum values alone, which is insufficient for correct invocation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a webhook endpoint for relaying requests to an agent. While it uses a specific verb and identifies the resource, it does not differentiate from sibling webhook tools like listing or polling, but the verb 'create' sufficiently implies the action.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives, such as prerequisites or the order of operations (e.g., need to create before polling). The description lacks any context for appropriate usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_webhook_list_requestsCInspect

List requests received by a webhook endpoint

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`offset`	No
`endpointId`	Yes

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description should disclose behavioral traits, but it only states the basic function. Missing details like read-only nature, pagination behavior, error handling, or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very brief (6 words), but it omits essential information about parameters and usage, making it too short for adequate understanding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 3 parameters, no output schema, and no annotations, the description fails to provide enough context about result format, ordering, default behavior, or error cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must explain parameters, but it does not mention endpointId, limit, or offset at all, leaving the agent to infer meaning from the tool name alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (list) and resource (requests received by a webhook endpoint), distinguishing it from sibling tools like agent_webhook_create, agent_webhook_poll, and agent_webhook_replay.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide any guidance on when to use this tool versus alternatives (e.g., agent_webhook_poll) or any usage prerequisites or limitations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_webhook_pollCInspect

Poll for new webhook requests (long polling, max 60s)

ParametersJSON Schema

Name	Required	Description	Default
`after`	No
`limit`	No
`timeout`	No
`endpointId`	Yes

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must disclose behavior. It mentions 'long polling, max 60s', indicating blocking behavior, but omits details like what happens on timeout, whether it is read-only, or the nature of returned data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise, front-loading the action and key behavior (long polling, max 60s). It avoids verbosity but at the cost of missing parameter and usage details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given four parameters, no output schema, and no annotations, the description is incomplete. It does not explain the return format, how to use the 'after' parameter for pagination, or the implications of the timeout, leaving agents to guess critical usage details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must explain parameters. It fails to clarify the meaning of 'after', 'limit', 'timeout', or 'endpointId', providing no semantic value beyond the schema's type and default values.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'poll' and resource 'webhook requests', with 'long polling, max 60s' distinguishing it from sibling tools like agent_webhook_list_requests. However, it does not elaborate on the concept of 'new' or how polling differs from a one-time fetch.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like agent_webhook_list_requests. The description does not mention prerequisites, exclusions, or appropriate contexts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_webhook_replayCInspect

Replay a stored webhook request

ParametersJSON Schema

Name	Required	Description	Default
`toUrl`	No
`requestId`	Yes
`endpointId`	Yes

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must disclose side effects. 'Replay' implies resending the webhook, but it doesn't state whether it modifies state or has other consequences.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

While the description is a single sentence, it is too minimal and does not earn its place by adding value beyond the name. It could include more context without sacrificing conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given three parameters and no output schema or annotations, the description is insufficient. It omits essential details like what the 'toUrl' parameter does and what happens after replay.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description provides no explanations for the three parameters (endpointId, requestId, toUrl), leaving the agent without guidance on how to populate them.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Replay a stored webhook request' uses a specific verb and resource, clearly differentiating it from sibling tools like agent_webhook_create and agent_webhook_list_requests.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives, such as when to replay vs. create or poll a webhook.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

alerts_create_watchAInspect

Create a saved watch: when a NEW matching event appears in a ledger, a notification is pushed to your destination. Filters: ledger (optional), keyword and/or entity (case-insensitive title substring). At least one filter is required. Backfill never fires.

ParametersJSON Schema

Name	Required	Description
`entity`	No	Additional case-insensitive substring (e.g. company name)
`ledger`	No	Ledger key, e.g. 'sanction' (omit for all ledgers)
`keyword`	No	Case-insensitive substring of the item title
`destinationType`	Yes
`destinationTarget`	Yes	Webhook URL / relay endpoint / email address

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses key behaviors: only watches for NEW events, does not backfill, and pushes notifications. However, it does not mention idempotency, error handling, or authorization needs, leaving some gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (two sentences plus filter list) and front-loads the purpose. Every sentence adds information without redundancy. Room for minor improvement in structure (e.g., bullet points for filters).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description omits the return value (e.g., watch ID) and error scenarios. Since no output schema is provided, the description should cover what the tool returns. This gap makes it incomplete for an agent to fully understand the tool's behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds value beyond the schema by stating that keyword and entity are case-insensitive substrings, and that at least one filter (keyword or entity) is required despite the schema only requiring destinationType and destinationTarget. It does not elaborate on destination parameters, but schema covers them (80% coverage).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a saved watch that triggers notifications on new matching events. It uses a specific verb ('Create') and resource ('saved watch'), and distinguishes from sibling watch tools (e.g., alerts_list_watches, alerts_delete_watch).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides filter requirements (at least one of keyword/entity) and clarifies that backfill never fires. It implicitly guides when to use (for new event notifications) but does not explicitly mention alternatives like alerts_list_watches for viewing existing watches.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

alerts_delete_watchAInspect

Deactivate a saved watch by id (soft delete; stops future notifications).

ParametersJSON Schema

Name	Required	Description	Default
`watchId`	Yes

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description explicitly states 'soft delete' and 'stops future notifications', providing key behavioral insight beyond what annotations would. No annotations provided.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence front-loads core purpose and key side effect (soft delete, stops notifications). No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a one-parameter tool with no output schema, description adequately covers purpose, effect, and soft-delete behavior. Could mention return value or error cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage, description adds meaning by clarifying 'by id'. However, it could be more explicit about the parameter format or type.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Tool name and description clearly state it deactivates a watch by ID (soft delete). Distinguishes from sibling tools like alerts_create_watch and alerts_list_watches.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied usage for deactivating watches, but no explicit guidance on when to use vs. alternatives or when not to use (e.g., if watch already inactive).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

alerts_list_watchesAInspect

List the calling user's saved watches.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states 'List' implying a read operation, but does not mention authentication requirements, rate limits, pagination behavior, or the nature of the returned data (e.g., whether it includes details or just IDs).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, concise sentence that conveys the core function with no extraneous words. It is appropriately short for the simplicity of the tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of an output schema, the description would benefit from explaining what the returned watches contain (e.g., watch ID, query, timestamp). The current description is minimal and leaves the agent guessing about the response format, making it slightly incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has no parameters (100% coverage), so the description does not need to add parameter meaning. The description is consistent with the schema, and no additional semantic value is required.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'List' and the resource 'the calling user's saved watches', distinguishing it from sibling tools like alerts_create_watch and alerts_delete_watch. However, it does not specify what a 'watch' represents in this context, leaving some ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when you need to list the user's saved watches, but it provides no explicit guidance on when to use this tool versus alternatives (e.g., creating or deleting watches) or any constraints or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bid_watch_getCInspect

Get a bid notice detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description must fully disclose behavior. It states returns but does not confirm read-only nature, error handling, or whether it has side effects. Lacks detail on what happens with invalid IDs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Very concise at two sentences, front-loading the main purpose. Could be slightly more detailed without harming conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema or annotations, description should provide more context on return structure and usage. Only mentions two fields, missing full account of what the detail and timeline include.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% and description adds no meaning for the single parameter 'itemId'. It does not explain what itemId represents or how to obtain it, leaving the agent guessing.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it retrieves a bid notice detail and its full event timeline, specifying returned fields. This distinguishes from siblings like bid_watch_search and bid_watch_timeline, though not explicitly naming them.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives such as bid_watch_search or bid_watch_timeline. No context on prerequisites or typical use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bid_watch_recent_changesBInspect

Recent appearance / deadline-move / close / cancel / award events across all bid notices since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`entity`	No

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It explains the scope (all bid notices) and output fields (firstSeenAt, ledgerVerified), but lacks disclosure of side effects, security requirements, or performance characteristics. Adequate but not thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two succinct sentences: first defines purpose and scope, second describes output fields. No redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Without output schema or annotations, and with only 1 of 3 parameters explained, the description is insufficient. The return structure is partially described but not fully, and differences from numerous sibling tools are not addressed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. Description only mentions the 'since' parameter implicitly (ISO8601 timestamp) and ignores 'limit' and 'entity'. The 'entity' parameter is unexplained, and no details on default or format are provided beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool returns recent bid notice events (appearance, deadline-move, close, cancel, award) since an ISO8601 timestamp, which differentiates it from other bid watch tools. However, it doesn't explicitly contrast with siblings like bid_watch_search or bid_watch_timeline.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description only implies use for recent changes but doesn't address filtering, prerequisites, or comparison to other watch tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bid_watch_searchBInspect

Search Japanese public-procurement bid notices (kkj.go.jp). Each hit includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No
`since`	No
`entity`	No	調達機関 (partial match)
`status`	No
`bidType`	No	一般競争入札 / 指名 / 随意等

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description must disclose behavioral traits. It only states that it searches and returns hits with two fields. It does not mention whether it is read-only, any authentication requirements, rate limits, pagination behavior, or side effects. This is insufficient for a search tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise (one sentence) and front-loads the source and result fields. However, it could be slightly more structured (e.g., grouping parameters or providing an example). It is minimally sufficient but not bloated.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema and annotations, the description should provide more completeness about the return format, pagination, error handling, and limitations. It only mentions two fields in the response. The complexity of the tool (6 parameters, many siblings) warrants more detail.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 33% (only 'entity' and 'bidType' have descriptions in the schema). The tool description adds no additional meaning to parameters like 'query', 'since', 'limit', or 'status'. It does not explain how parameters interact or provide examples. The description does not compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it searches Japanese public-procurement bid notices from a specific source (kkj.go.jp) and mentions that results include 'firstSeenAt' and 'ledgerVerified'. This differentiates it from sibling tools like 'bid_watch_get' (retrieval) and 'bid_watch_recent_changes' (list changes).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool versus alternatives. While the purpose is clear, there is no guidance on scenarios where other tools (e.g., 'bid_watch_get') would be more appropriate. Usage context is implied but not explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bid_watch_timelineAInspect

Time-ordered events only for a bid notice (the differentiator: when it appeared, deadline moved, closed, was cancelled or awarded). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations exist, so description bears full burden. It mentions return fields (firstSeenAt, ledgerVerified) but omits behavioral traits like read-only status, auth requirements, or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences are efficient, but the second sentence listing fields is somewhat sparse. No front-loading issues; every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given simple schema (1 param) and no output schema, description explains purpose and return fields but lacks details on parameter semantics, ordering, or additional usage guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema coverage, the description must explain parameters. It does not mention the required 'itemId' parameter at all, offering no guidance on its purpose or format.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides 'time-ordered events only for a bid notice', listing specific event types. It distinguishes from siblings like bid_watch_get by emphasizing the chronological, historical nature.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It defines when to use (for timeline of events) and implies not for current state, but lacks explicit alternatives or when-not conditions. Context is clear but no exclusions stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bid_watch_verify_ledgerBInspect

Verify the hash-chain integrity of a bid notice (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It lists return fields but does not disclose whether the tool is read-only, destructive, or requires authentication. The behavior is implied as a read operation, but not stated explicitly.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that front-loads the purpose and lists return fields. Every word earns its place; no filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description covers the purpose and return fields. However, it lacks parameter documentation and behavioral context, making it minimally viable but with gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% and the description does not mention the only parameter 'itemId' at all. It does not explain its format, meaning, or constraints, providing no added value over the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool verifies hash-chain integrity of a bid notice for tamper detection, which is a specific verb-resource combination. It distinguishes from sibling bid_watch tools (get, recent_changes, search, timeline) and other verify_ledger tools by specifying 'bid notice'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context: use this tool when you need to verify the integrity of a bid notice. However, it does not explicitly exclude scenarios or mention alternatives among the many bid_watch tools, so it lacks exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

carbon_estimate_compute_emissionsAInspect

Estimate electricity CO2e (kg) from energy use (kWh) and a regional grid-intensity factor (defaults to the IEA world average). Pure compute; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`kWh`	Yes	Electricity consumed in kWh (>= 0)
`region`	No	Grid region (default: global).

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses that the tool is 'pure compute' (side-effect-free, idempotent) and 'price 0.0 (free)', which are key behavioral traits. It does not detail what happens on invalid input (e.g., negative kWh) or specify rate limits, but for a simple estimation tool, the core behavior is adequately communicated.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: one sentence and a short clause totaling about 20 words. It front-loads the action and output ('Estimate electricity CO2e (kg)') with no extraneous information. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given there is no output schema, the description should fully explain the return value. It indicates the output is in 'CO2e (kg)' but does not specify the structure (e.g., a single number vs. an object), nor does it mention error handling or constraints beyond the schema. For a simple tool this may be sufficient, but more detail on the output format would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description adds meaningful context: it explains that the region parameter provides a 'regional grid-intensity factor' and clarifies that the default is the 'IEA world average'. This enhances understanding beyond the schema's enum list and default mention.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Estimate' and the resource 'electricity CO2e (kg)' from 'energy use (kWh) and a regional grid-intensity factor'. It differentiates from similar carbon tools (e.g., carbon_estimate_emission_factor) by specifying the computation of emissions rather than returning a factor, and it mentions the default factor source (IEA world average). The phrase 'pure compute; price 0.0 (free)' further clarifies the nature of the tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when you have kWh data and want to compute emissions for electricity, and it mentions the default regional factor. It explicitly states 'pure compute; price 0.0 (free)', which is a strong usage signal (no cost, no side effects). However, it does not explicitly state when not to use it or directly contrast it with siblings like carbon_estimate_emission_factor, which could provide clearer guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

carbon_estimate_emission_factorAInspect

Look up the published emission factor (value, unit, category) for a named activity key (e.g. shipping_air, electricity_global, travel_car, gasoline, beef). Pure compute; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`activity`	Yes	Canonical activity key (e.g. shipping_air, travel_car, gasoline)

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses that the tool is 'pure compute' and free, but omits details like error handling, caching, or authorization requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with zero waste. The first sentence immediately conveys the exact purpose, and the second adds a cost/behavior note.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple lookup tool with one parameter and no output schema, the description adequately covers what the tool returns (value, unit, category) and provides examples. However, it lacks details on error states or response format.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (the only parameter 'activity' has a description). The description adds no additional meaning beyond the schema's examples, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Look up' and the resource 'published emission factor (value, unit, category)' for a specific activity key, differentiating it from sibling tools like carbon_estimate_compute_emissions which compute full emissions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates it is a pure compute, free operation, implying safe and cost-free usage. It clearly implies when to use (looking up a single factor) but does not explicitly exclude or compare to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

carbon_estimate_offset_estimateAInspect

Estimate voluntary-market offset cost (USD) and tree- / forest-year equivalents for a given kg CO2e, using published constants. Pure compute; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`kgCO2e`	Yes	Emissions to offset, in kg CO2e (>= 0)

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, but the description states 'Pure compute; price 0.0 (free)', indicating no side effects and statelessness. This adds transparency about the tool's behavior, though it could mention that no external calls are made.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences that front-load the main purpose and key behavioral trait (free pure compute). Every word adds value, no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter, no output schema tool, the description provides sufficient context: input is kg CO2e, output is USD cost and tree/forest-year equivalents, and the computation is based on published constants. Nothing is missing for an agent to use it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema already documents the single parameter kgCO2e with description. The description adds context about output (USD, tree-year equivalents) but does not add meaning to the parameter beyond the schema. Baseline 3 applies due to 100% schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool estimates voluntary-market offset cost (USD) and tree/forest-year equivalents for a given kg CO2e, using published constants. It differentiates from sibling carbon_estimate tools (e.g., compute_emissions) by focusing on offset estimation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for offset cost estimation but does not explicitly state when to use or when not to use, nor does it provide alternatives. The pure compute nature is noted, but guidance on when to prefer this over other carbon tools is lacking.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

carbon_estimate_shipping_emissionsAInspect

Estimate freight CO2e (kg) from weight, distance and transport mode using embedded GLEC/DEFRA-order emission factors. Pure compute; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Freight transport mode.
`weightKg`	Yes	Shipment weight in kilograms (>= 0)
`distanceKm`	Yes	Transport distance in kilometres (>= 0)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It states 'Pure compute; price 0.0 (free)' suggesting no side effects, but does not disclose return format, synchronization, or potential limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no filler. Purpose and pricing are front-loaded. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple compute tool with 3 parameters, the description covers purpose, inputs, and pricing. Lacks explicit output format, but output is implied as CO2e in kg.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds the output unit 'kg CO2e' but does not elaborate on each parameter beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool estimates freight CO2e from weight, distance, and mode using standard emission factors. It specifies 'freight' which distinguishes it from sibling tools like carbon_estimate_travel_emissions. The verb 'estimate' and resource 'freight CO2e' are specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for freight shipping emissions but does not explicitly compare to sibling tools like carbon_estimate_compute_emissions. It mentions being free, but lacks when-not or alternative guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

carbon_estimate_travel_emissionsAInspect

Estimate passenger-travel CO2e (kg) from distance and travel mode using embedded per-passenger-km factors. Pure compute; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`mode`	Yes	Passenger travel mode.
`distanceKm`	Yes	Travel distance in kilometres (>= 0)

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, description says 'pure compute' and 'free', but doesn't disclose limits (e.g., distance bounds, mode assumptions) or whether results are cached. Adequate but not detailed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, direct and efficient. No fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks output description (likely a number), and doesn't specify if distance is one-way or round trip, or handling of mixed modes. Adequate for simple tool but not complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so description adds minimal value beyond naming inputs. Mentions 'embedded per-passenger-km factors' but doesn't elaborate on factor sources.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it estimates CO2e (kg) for passenger travel from distance and mode, using embedded factors. It distinguishes from siblings like carbon_estimate_shipping_emissions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Mentions it's pure compute and free, implying it's a lightweight calculation. No explicit when-to-use vs alternatives, but sibling names provide context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

commerce_catalog_agent_readiness_scoreAInspect

Agentic-commerce readiness score (0-100) for how well a product is structured for autonomous agents, with a transparent rationale, from a url, raw html, or inline product. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`url`	No	Product page URL to fetch (one of url / html / product)
`html`	No	Raw page HTML to parse (one of url / html / product)
`product`	No	Inline normalized product object (one of url / html / product)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It explicitly states 'Read-only' (non-destructive) and 'price 0.0 (free)', which are key behavioral traits. However, it does not disclose rate limits, authentication requirements, or other constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence. It front-loads the core purpose (score 0-100 with rationale) and then lists input options efficiently. No redundant or verbose phrasing.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (score computation from product data), the description covers inputs and basic behavior (read-only, free, transparent rationale). However, no output schema exists, and the description does not detail the output format beyond 'score with rationale'. This is acceptable for a score tool but could be slightly more explicit.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% (all three parameters have descriptions). The description merely restates the parameters ('from a url, raw html, or inline product') without adding new semantics. Baseline 3 is appropriate as schema already documents parameters adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('score') and resource ('product readiness for autonomous agents'), specifying the output range (0-100) and input sources (url, html, inline product). It distinguishes from sibling tools like commerce_catalog_availability_check and commerce_catalog_price_compare, which address different aspects of product data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists the three input modes (url, raw html, inline product) and notes it is read-only and free. While it doesn't contrast with alternatives, the sibling tools serve distinct purposes (availability, validation, price, extract), making the usage context clear without needing explicit exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

commerce_catalog_availability_checkAInspect

Resolve product stock availability to a coarse signal (in_stock / out_of_stock / limited / preorder / unknown), from a url, raw html, or inline product. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`url`	No	Product page URL to fetch (one of url / html / product)
`html`	No	Raw page HTML to parse (one of url / html / product)
`product`	No	Inline normalized product object (one of url / html / product)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description adds value by stating it is read-only and free, but does not disclose other behavioral traits like error handling or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences efficiently convey the core function, output, and cost. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description specifies the output coarse signal values, and is sufficient for a simple read-only tool. Minor gap: no mention of error handling.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with each parameter described. The description summarizes the three mutually exclusive options, adding little beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool resolves stock availability into a coarse signal (in_stock/out_of_stock/limited/preorder/unknown), distinguishing it from sibling commerce tools which focus on extraction, validation, or price comparison.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies the input modes (url, html, product) but does not provide explicit when-to-use or alternative guidance. The context is clear but lacks exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

commerce_catalog_catalog_validateAInspect

Validate product-feed completeness: which required fields are present or missing and a completeness score, from a url, raw html, or an inline product object. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`url`	No	Product page URL to fetch (one of url / html / product)
`html`	No	Raw page HTML to parse (one of url / html / product)
`product`	No	Inline normalized product object (one of url / html / product)

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It states 'Read-only; price 0.0 (free)', which adds transparency about side effects and cost. However, it lacks details about response format, limits, or what happens with invalid inputs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: first delivers purpose and output, second states read-only and cost. No fluff, front-loaded, and efficient for an agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite parameter coverage, the description lacks details about the output (completeness score, required fields), error handling, or how the input sources differ. For a validation tool in a large catalog, agents would benefit from more context on return values and behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents the three parameters. The description briefly mentions 'url, raw html, or an inline product object' but adds no additional meaning beyond the schema's descriptions. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool validates product-feed completeness, listing required fields, completeness score, and three input sources (url, html, product). This specific verb+resource combination distinguishes it from sibling tools like product_extract or availability_check.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for validating completeness and specifies input options but does not explicitly guide when to use this tool over alternatives like commerce_catalog_product_extract or commerce_catalog_availability_check. No exclusions or when-not advice.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

commerce_catalog_price_compareAInspect

Compare a set of offers and return the cheapest, most expensive, spread and per-offer ranking. Requires a non-empty offers array. Pure; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`offers`	Yes	Non-empty list of offers to compare.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the burden. It declares 'Pure; price 0.0 (free)', indicating no side effects and no cost. It also notes the non-empty requirement. It could add error handling details, but the provided info is helpful.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first states function, second adds prerequisites and cost. Every sentence earns its place with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains the return values (cheapest, most expensive, spread, per-offer ranking) despite no output schema. It covers the tool's core function but omits error handling details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are already described. The description adds the constraint that the array must be non-empty and explains the result, but does not add significant meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The tool's verb 'compare' and resource 'offers' are specific. It clearly states it returns cheapest, most expensive, spread, and ranking. No sibling tool performs price comparison, distinguishing it effectively.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly requires a non-empty offers array, providing a clear precondition. However, it does not mention when to avoid using the tool or suggest alternatives, which would improve guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

commerce_catalog_product_extractAInspect

Extract a normalized product (name, price, currency, availability, brand, GTIN, ...) from Schema.org / JSON-LD markup. Provide a url to fetch or raw html. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`url`	No	Product page URL to fetch (one of url / html)
`html`	No	Raw page HTML to parse (one of url / html)

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes the tool as read-only and free, and specifies it works on Schema.org/JSON-LD markup. No annotations present, so description carries the burden. Could mention what happens if no markup is found, but core behaviors are disclosed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no fluff. Front-loaded with purpose and key details (input type, cost). Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks details on output format or response structure (no output schema). While it lists example fields, it doesn't specify whether extraction returns all fields or only those present, or behavior on errors (e.g., missing markup). Adequate for a simple tool but incomplete for robust usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the two parameters (url, html). Description adds minimal extra context: 'Provide a url to fetch or raw html' reiterates the schema's 'one of' constraint. No additional format or constraint details beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool extracts normalized product data (name, price, currency, etc.) from Schema.org/JSON-LD markup. Verb 'extract' and resource 'product' are specific. Distinguishes from sibling commerce tools which deal with availability, validation, or price comparison.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says to provide a URL to fetch or raw HTML, indicating how to invoke. Declares read-only and free pricing. However, lacks guidance on when to use versus alternatives or when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

company_registry_company_profileAInspect

Fetch a company profile (name, status, incorporation / dissolution dates, type, address) by jurisdiction + company number via OpenCorporates. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`jurisdiction`	Yes	Jurisdiction code (e.g. gb, us_de)
`companyNumber`	Yes	Registry company number

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It states 'Read-only; price 0.0 (free)', which covers safety and cost, but lacks details on rate limits, data freshness, supported jurisdictions, or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that efficiently conveys the core purpose, required inputs, and key attributes (read-only, free). No extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description includes a list of returned fields (name, status, dates, type, address) and the data source (OpenCorporates), which is adequate for a simple fetch tool. However, it could mention expected response format or limitations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description merely reiterates 'jurisdiction' and 'company number' without adding new semantic meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Fetch', the resource 'company profile', and lists specific fields (name, status, dates, type, address). It distinguishes itself from sibling tools like `company_registry_search_company` by requiring a jurisdiction and company number.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implicitly indicates usage when a jurisdiction and company number are known, but does not explicitly compare to alternatives like `company_registry_search_company` for unknown numbers or `company_registry_jurisdiction_info` for jurisdiction data.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

company_registry_jurisdiction_infoAInspect

Look up metadata for a jurisdiction code (name, registry details). Pure compute; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`code`	Yes	Jurisdiction code (e.g. gb, us_de)

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It discloses that the tool is pure compute and free, implying no side effects. It does not detail return format or errors, but for a simple lookup this is adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence plus a brief note on pricing. It is front-loaded with the core purpose and wastes no words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a one-parameter lookup without an output schema, the description sufficiently covers purpose, input, and behavioral traits (free/compute). It could mention potential error cases but is otherwise complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear description of the 'code' parameter. The description adds context about outputs ('name, registry details') beyond the schema, aiding understanding of what the tool returns.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool 'Look up metadata for a jurisdiction code' with specific outputs like 'name, registry details'. It distinguishes itself from sibling tools like company_registry_company_profile which focus on individual companies rather than jurisdiction metadata.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implicitly tells when to use this tool (when needing jurisdiction metadata) and notes it is 'Pure compute; price 0.0 (free)', indicating no cost. It lacks explicit alternatives but is clear enough for an agent to decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

company_registry_officersAInspect

List a company officers (name, position, start / end dates, current flag) by jurisdiction + company number via OpenCorporates. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`jurisdiction`	Yes	Jurisdiction code (e.g. gb, us_de)
`companyNumber`	Yes	Registry company number

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden for behavioral disclosure. It states 'Read-only' and price 0.0, which is useful. However, it omits details like rate limits, pagination, or error handling. The read-only hint is helpful but incomplete for a full behavioral picture.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with a parenthetical list of output fields. It is concise, front-loaded, and contains no redundant information. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple listing tool with 2 parameters and no output schema, the description covers purpose, input, output fields, source, and cost/read-only status. Missing details like pagination or response structure are minor gaps; the listed output fields partly compensate for the lack of output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description mentions 'by jurisdiction + company number', matching the schema but adding no extra semantics such as format examples or allowed values. Baseline 3 is appropriate given full schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists company officers, specifies output fields (name, position, dates, current flag), and required inputs (jurisdiction + company number). It names the data source (OpenCorporates) and is distinct from sibling tools like company_registry_company_profile (profile) or company_registry_search_company (search).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description notes it is read-only and free, which helps usage decisions. However, it does not explicitly state when to use vs. alternatives like company_registry_company_profile for detailed company info or when not to use (e.g., no company number). The sibling tools' purposes are different, so the context is reasonably clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

company_registry_search_companyAInspect

Search for companies by name (optionally scoped to a jurisdiction) via the OpenCorporates open API. Returns candidate companies for disambiguation. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`name`	Yes	Company name to search (partial match)
`limit`	No	Max candidates to return
`jurisdiction`	No	Optional jurisdiction code (e.g. gb, us_de)

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose behavior. It states the tool is read-only and free, but does not mention rate limits, pagination, or response structure. The description is adequate but not rich.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loaded with the verb and resource, and contains no redundant information. Every word is purposeful.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Though there is no output schema, the description mentions that it returns candidate companies for disambiguation, which provides enough context for a simple search tool. It lacks details on result fields but is adequate given low complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema already documents all parameters clearly. The description adds context ('optionally scoped to a jurisdiction') but no new semantics beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb (Search), resource (companies), and scoping (by name and optionally jurisdiction). It distinguishes from sibling tools like company_registry_company_profile by mentioning OpenCorporates API and disambiguation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The phrase 'Returns candidate companies for disambiguation' implies this tool is a first step before getting full profiles, but it does not explicitly mention when to avoid it or list alternatives. The context is clear enough for typical use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

company_registry_validate_numberAInspect

Validate a company-registration number against the expected format for its jurisdiction. Pure compute; returns valid flag, normalized value and reason. Price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`jurisdiction`	Yes	Jurisdiction code (e.g. gb, us_de, au)
`companyNumber`	Yes	Company number to validate

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must fully convey behavioral traits. It states 'Pure compute' to indicate no side effects, and lists the return values (valid flag, normalized value, reason) and pricing (free). This provides sufficient transparency for a validation tool, though it could mention that it only checks format, not registry existence.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: three short sentences deliver the purpose, behavioral note, return summary, and pricing. Every sentence adds value, and the most critical information (what it does) is front-loaded. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 required params, no output schema, no nested objects), the description adequately covers the core task, output summary, and pricing. It does not detail potential error cases or behavior for unknown jurisdictions, but for a pure validation tool with a clear return structure, it is mostly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema documents both parameters with examples. The description does not add additional semantics beyond the schema; it mentions jurisdiction and company number implicitly but provides no new information about parameter format, constraints, or meaning. A score of 3 is appropriate as the description is not required to repeat schema details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool's purpose: validating a company-registration number against expected format for its jurisdiction. It uses specific verbs ('validate') and resources ('company-registration number'), and clearly distinguishes from sibling tools like company_registry_company_profile and company_registry_search_company, which fetch or search data rather than validate format.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies this tool is for format validation only ('Pure compute'), but does not explicitly state when to use it versus alternatives. It does not mention that this tool does not check the existence of the company or provide additional data, which would help differentiate it from siblings. The lack of explicit 'when to use' or 'when not to use' guidance keeps this at a 3.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_authenticity_ai_likelihoodAInspect

Heuristic likelihood (0-100) that a passage of text is AI-generated, from lexical-diversity and burstiness signals with a transparent rationale. Pure, no network; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`text`	Yes	Text passage to score

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose behavioral traits. It mentions the tool uses lexical-diversity and burstiness signals, provides a transparent rationale, and is free with no network calls. However, it does not describe the return format, idempotency, or any potential side effects, which are needed for a tool with no output schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at 28 words across two sentences. It front-loads the core purpose and key constraints (free, no network), with no unnecessary words or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of an output schema, the description should clarify the return value. It mentions a 'likelihood (0-100)' and 'transparent rationale', but does not explicitly state the output structure (e.g., object with score and explanation). The tool's simplicity mitigates this, but completeness could be improved.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for the single 'text' parameter, with a clear description 'Text passage to score'. The description reinforces this but does not add significant new semantic information beyond what the schema provides, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: assessing the heuristic likelihood (0-100) that a text passage is AI-generated, using lexical-diversity and burstiness signals. It is distinct from sibling content_authenticity tools that deal with C2PA, domain reputation, provenance, and watermark detection, which are for different media types and methods.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool is suitable for scoring text passages and is free with no network calls, but it does not explicitly state when to use this tool over alternatives or when not to use it. No contrast with other tools is provided, leaving the agent to infer usage from purpose alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_authenticity_c2pa_inspectAInspect

Detect a C2PA / Content Credentials manifest by scanning the raw image bytes for JUMBF / C2PA markers and lifting any human-readable claim generator. Provide an image url. Read-only; one HTTPS GET; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	Image URL to inspect

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses read-only behavior, that it performs one HTTPS GET, and that it is free. This is good transparency for a simple tool, though it does not mention failure modes or output format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences: each sentence serves a distinct purpose (action, instruction, behavioral context). No redundancy, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description hints at the output ('lifting any human-readable claim generator') but does not fully specify return value structure. It covers main aspects of input and behavior, making it mostly complete for a simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents the parameter. The description adds 'Provide an image url,' which repeats the schema description without additional semantics. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool detects a C2PA/Content Credentials manifest by scanning raw image bytes for JUMBF markers and lifting the claim generator. It distinguishes from sibling tools like watermark_detect or ai_likelihood by specifying the technical method and output.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when needing to check for C2PA credentials on an image URL but does not explicitly state when to use this tool versus alternatives, nor does it provide exclusion criteria. Usage is implied but not guided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_authenticity_domain_reputationAInspect

Transparent heuristic reputation score (0-100) for a domain combining age, TLS validity, DNS / MX and SPF / DMARC signals via free HTTPS (DoH / RDAP / CT). Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes	Domain name to score

Tool Definition Quality

A3.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description discloses it is read-only and free, and outlines the methodology (age, TLS, DNS/MX/SPF/DMARC via DoH/RDAP/CT), providing good transparency for a simple tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single sentence efficiently captures purpose, methodology, and cost with no waste. All information is front-loaded and relevant.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Without an output schema, the description mentions the score range (0-100) which partially covers return values. It is mostly complete for a simple tool, though exact response structure is missing.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema already describes the domain parameter. The tool description adds context about the signals used, which adds some meaning, but does not elaborate beyond the schema's 'Domain name to score.'

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it provides a heuristic reputation score (0-100) for a domain based on multiple signals. However, it does not explicitly differentiate from the sibling tool domain_intel_reputation, leaving some ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like domain_intel_reputation or other domain tools. The description only explains functionality, not selection criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_authenticity_provenance_checkAInspect

Combined media-provenance verdict for an image url: C2PA presence plus the hosting domain reputation, with rationale. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	Image URL to check

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations exist, so the description carries the full burden. It clearly states the tool is read-only and free (price 0.0), and mentions that rationale is provided. This adequately discloses key behavioral traits for a read-only, non-destructive tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with two additional short statements, all front-loaded. Every element adds value: the action, the components, the read-only and cost attributes. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description covers the core functionality, cost, and safety. It does not detail the response format, but the mention of 'rationale' provides some expectation. Overall sufficient for the complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a single parameter described as 'Image URL to check'. The description adds context by specifying 'image url' and 'media-provenance verdict', but the schema already provides the essential meaning. The description adds marginal value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool produces a combined media-provenance verdict for an image URL, combining C2PA presence and domain reputation with rationale. This clearly distinguishes it from sibling tools like content_authenticity_c2pa_inspect and content_authenticity_domain_reputation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates the tool is read-only and free, but does not explicitly state when to use this combined check versus individual sibling tools. However, the presence of sibling tools and the description's emphasis on 'combined' implies the appropriate use case.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_authenticity_watermark_detectAInspect

Best-effort watermark and provenance-marker detection from the raw image bytes (e.g. C2PA / IPTC / XMP markers). Provide an image url. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	Image URL to inspect

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses best-effort behavior and read-only nature, but does not detail failure modes, rate limits, or authentication needs. More context on what happens when markers are not found would be helpful.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise, using two sentences to convey purpose, required input, and safety/cost. Every sentence adds value, and the key info is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description covers purpose, required input, and behavioral traits (best-effort, read-only, free). It lacks a brief note on the likely return format (e.g., list of detected markers), but overall it is quite complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a single parameter 'url' described as 'Image URL to inspect'. The description adds context about 'raw image bytes', providing slight additional meaning. Baseline is 3 since schema already covers the parameter well.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: best-effort detection of watermark and provenance markers (C2PA/IPTC/XMP) from an image URL. It distinguishes itself from sibling tools like content_authenticity_c2pa_inspect by being broader and lighter.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies that the user should provide an image URL and notes the tool is read-only and free. While it implies context for use, it does not explicitly state when to choose this tool over alternatives like c2pa_inspect or ai_likelihood.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

domain_intel_dns_lookupAInspect

Resolve A / AAAA / MX / TXT / NS / CNAME records for a domain via DNS-over-HTTPS (Cloudflare 1.1.1.1 JSON). Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`types`	No	Subset of record types to resolve (default: all).
`domain`	Yes	Domain name (bare host; URLs/trailing dots tolerated)

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses read-only nature and free pricing, plus the backend (Cloudflare 1.1.1.1 JSON). Lacks details on rate limits, error handling, or response structure. Adequate but not thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise: one sentence plus a short phrase. All information is front-loaded and necessary.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema; description does not explain return format or behavior for missing records. Simple tool, but could mention response structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema already provides 100% coverage with descriptions for both parameters. The tool description restates record types but adds no new semantic value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it resolves specific DNS record types (A, AAAA, MX, TXT, NS, CNAME) for a domain via DNS-over-HTTPS. Differentiates from sibling domain intel tools like whois, email auth, reputation, TLS by focusing on DNS records.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. Mentions read-only and free, but does not compare to sibling tools or provide context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

domain_intel_email_auth_checkAInspect

Parse SPF / DMARC / DKIM presence and policy from TXT records (via DoH). SPF all-qualifier (strict/softfail/neutral/pass), DMARC p= policy + pct + rua, and best-effort DKIM selector probing. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes	Domain name to check
`dkimSelectors`	No	Optional DKIM selectors to probe (default: common selectors).

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses read-only behavior, cost, and 'best-effort DKIM selector probing' as a limitation. It does not detail error handling or response format, but the information provided is adequate for safe invocation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three concise sentences, front-loaded with the main action, then detailing extraction specifics, then behavioral/cost info. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 params, no output schema), the description sufficiently explains what is extracted. It lacks output format details, but that is acceptable without a schema. Error handling is not mentioned, but the tool is straightforward.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description adds value by stating 'Optional DKIM selectors to probe (default: common selectors)', clarifying the default behavior beyond the schema's 'Optional DKIM selectors to probe'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Parse SPF / DMARC / DKIM presence and policy from TXT records (via DoH)', specifying the exact verb and resource. It details the extracted policies (SPF all-qualifier, DMARC p/pct/rua, DKIM selectors), distinguishing it from sibling tools like domain_intel_dns_lookup which is a generic DNS lookup.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes 'Read-only; price 0.0 (free)', providing clear usage context. It lacks explicit guidance on when to use this tool over alternatives like domain_intel_dns_lookup, but the specific focus on email authentication implicitly directs the agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

domain_intel_reputationAInspect

Transparent heuristic reputation score (0-100) combining domain age, TLS validity and email-auth strictness (SPF/DMARC) plus an MX signal. Every contribution is returned in rationale (no black box). Informational only, not a security guarantee. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes	Domain name to score

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description discloses that the tool is read-only, free, returns a heuristic score with a rationale, and is not a security guarantee. This covers key behavioral traits, though it lacks details on rate limits or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (four sentences), front-loaded with the core purpose and components, and every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given one parameter and no output schema, the description explains the score range (0-100), components, and that rationale is included. It is mostly complete but could benefit from specifying the output format (e.g., JSON structure).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter 'domain' has a schema description 'Domain name to score'. The tool description reinforces this but adds no additional format, validation, or examples. With 100% schema coverage, a baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it computes a heuristic reputation score (0-100) combining domain age, TLS validity, email-auth strictness, and MX signal. It distinguishes from sibling tools like domain_intel_dns_lookup by being a composite score rather than a raw lookup.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions 'Informational only, not a security guarantee' which provides a caution, but does not explicitly state when to use this tool over alternatives like domain_intel_email_auth_check or content_authenticity_domain_reputation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

domain_intel_tls_cert_infoAInspect

TLS certificate issuer / validity / SAN summary sourced from public Certificate-Transparency logs (crt.sh JSON). Picks the freshest leaf and reports currentlyValid + daysUntilExpiry. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes	Domain name to inspect

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses read-only nature and zero cost, which is helpful for safe invocation. It mentions the source and selection logic (freshest leaf). However, it does not cover potential error conditions (e.g., domain not found), rate limits, or authentication requirements, leaving gaps for the agent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no filler. The first sentence establishes core purpose and source, the second adds specific output details and cost/read-only reminder. Information is front-loaded and every phrase earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the essential aspects: source, selection logic, key output fields, and safety/cost. However, without an output schema, it does not fully describe the return structure (e.g., format of SAN summary, data types). A slightly more detailed output description would elevate completeness to a 5.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for the single parameter 'domain', which is described as 'Domain name to inspect'. The tool description does not add additional semantics beyond the schema, such as format expectations or validation rules. A baseline score of 3 is appropriate since the schema already suffices.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides TLS certificate issuer, validity, and SAN summary from public Certificate-Transparency logs. It specifies the data source (crt.sh JSON) and key output fields (currentlyValid, daysUntilExpiry). This distinctly differentiates it from sibling domain intelligence tools like DNS lookup or whois summary.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool is for TLS certificate inspection but does not explicitly state when to use it versus alternatives. There are no usage exclusions or comparisons to other domain intel tools. While the context is clear, explicit guidance on when not to use it or which sibling to choose instead is missing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

domain_intel_whois_summaryAInspect

Domain age / registrar / expiry summary via RDAP over HTTPS (no legacy port-43 WHOIS). Returns registeredAt, expiresAt, ageDays, registrar, status and nameservers. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes	Domain name to look up

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the burden. It states the tool is read-only and free, and mentions the protocol (RDAP over HTTPS). However, it lacks details on rate limits, authentication requirements, or what happens if the domain does not exist or is invalid.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences that front-load key outputs, protocol, and cost. Every word adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description lists expected return fields, making it fairly complete for a simple lookup tool. It could mention error responses or domain validation, but overall provides sufficient context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (one parameter 'domain' described as 'Domain name to look up'). The description adds no additional meaning beyond the schema's existing description, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides a domain WHOIS summary via RDAP, listing specific fields like registeredAt, expiresAt, ageDays, registrar, status, and nameservers. It distinguishes itself from sibling tools like domain_intel_dns_lookup by focusing on RDAP/WHOIS data rather than DNS or email auth.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implicitly indicates this is for WHOIS summary lookups but does not explicitly mention when to use it over other domain tools (e.g., DNS lookup, reputation). No guidance on when not to use or alternatives is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

entity_searchAInspect

Search a company/person name across ALL ledgers (sanctions, licenses, recalls, pharma, bids, subsidies, grants, public comments, ordinances, ToS, real-estate, land-price). Returns hits grouped per ledger, each with matchedField, a summary, detailUrl and ledgerVerified (hash-chain integrity). ledgerVerified proves the records were not altered after they were recorded here — NOT the truth of the underlying data.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Max hits per ledger (1-100)
`query`	Yes	Entity name (company / person), partial match

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations, so description carries full burden. Discloses output structure (grouped per ledger, matchedField, summary, detailUrl) and defines ledgerVerified integrity meaning. Lacks mention of rate limits or pagination but covers key behavioral aspects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no filler. First sentence sets scope and purpose, second explains output. Perfectly front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a simple search tool with 2 params and no output schema. Covers output grouping and ledgerVerified concept. Could expand on matchedField and summary, but not critical.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% so description adds little beyond schema. Repeats 'partial match' and limit range. No additional semantic explanation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it searches company/person names across ALL listed ledgers, distinguishing it from per-ledger watch siblings. The verb 'search' and resource 'entity across all ledgers' is specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implicitly provides context for cross-ledger search vs per-ledger watches, but does not explicitly state when to use or not use alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

freshness_listAInspect

List the calling user's freshness webhook subscriptions (the stored secret is never returned).

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full transparency burden. It discloses that the secret is never returned, which is a key behavioral trait. However, it does not mention pagination or errors, but given no parameters and simple output, it is sufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is front-loaded with the action and includes a critical constraint about the secret. Every word is valuable and there is no unnecessary verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema, so the description should help the agent understand the return format. It states it lists subscriptions but does not describe what fields are included (e.g., subscription IDs, topics, etc.). This lack of return structure detail reduces completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has no parameters and 100% schema coverage. The description adds no parameter information beyond the schema, but that is acceptable since none are needed. Baseline for 0 parameters is 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists the calling user's webhook subscriptions, using a specific verb and resource. It distinguishes from siblings (freshness_subscribe, freshness_unsubscribe) and adds a notable detail about the secret not being returned.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (to list subscriptions) but does not explicitly state alternatives or when not to use. For a simple list tool, this is adequate but lacks formal guidance on tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

freshness_subscribeAInspect

Subscribe a machine/AI pipeline to ledger-change notifications: when a matching record changes, a signed "stale" event (with an F-037 receipt proving the change) is POSTed to your callback_url so your RAG/index can re-index. Filters: ledger (optional), entity and/or topic (case-insensitive title substring). At least one filter is required. Body is HMAC-signed with your secret (X-Receipt-Signature). Backfill never fires. Price 0.0.

ParametersJSON Schema

Name	Required	Description
`topic`	No	Additional case-insensitive substring
`entity`	No	Case-insensitive substring of the item title
`ledger`	No	Ledger key, e.g. 'sanction' (omit for all ledgers)
`secret`	Yes	Shared secret used to HMAC-sign the POST body
`callbackUrl`	Yes	HTTPS endpoint the stale event is POSTed to
`jurisdiction`	No	Jurisdiction code (default 'jp')

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description details the signed POST event, HMAC signing, filter behavior, and constraints (no backfill). It provides sufficient transparency for a subscription tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded with purpose. It packs many details into a single paragraph, but may benefit from bullet points for clarity. Still, every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no output schema and no annotations, the description is comprehensive: covers trigger, security, filters, and cost. No significant gaps identified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaning beyond the schema: explains filter behavior (case-insensitive substring), that at least one filter is required, and the purpose of the secret. It does not elaborate on the jurisdiction default.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool subscribes a pipeline to ledger-change notifications for re-indexing, using a specific verb and resource. It distinguishes from siblings 'freshness_list' and 'freshness_unsubscribe' by focusing on subscription creation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explains when to use the tool (when a matching record changes) and specifies required filters. It mentions backfill never fires and price 0.0, but lacks explicit exclusions or direct comparisons to alternatives like polling.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

freshness_unsubscribeAInspect

Deactivate a freshness webhook subscription by id (soft delete; stops future deliveries).

ParametersJSON Schema

Name	Required	Description	Default
`subscriptionId`	Yes

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses 'soft delete' and 'stops future deliveries' but does not mention required permissions, reversibility, error cases, or response format. With no annotations, more detail would improve transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with action and resource, no extraneous words. All information is relevant and efficiently presented.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (1 required param, no output schema), the description covers key aspects (action, effect, parameter role). Could mention typical response or error conditions but is largely sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%; description only adds 'by id' to indicate the parameter purpose. Does not specify format, constraints, or how to obtain the ID. Minimal added value over the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action ('Deactivate'), the resource ('freshness webhook subscription'), and the mode ('soft delete; stops future deliveries'). Distinguishes from siblings like freshness_subscribe (create) and freshness_list (list).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies use when you have a subscription ID and want to stop deliveries. Lacks explicit when-not or alternatives, but the sibling tools are distinct enough that the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

geo_intel_bbox_centerAInspect

Compute the center point plus width/height (km) of a geographic bounding box. Pure math; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`maxLat`	Yes	Bounding-box maximum latitude
`maxLon`	Yes	Bounding-box maximum longitude
`minLat`	Yes	Bounding-box minimum latitude
`minLon`	Yes	Bounding-box minimum longitude

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. States it is pure math and free, but does not disclose edge-case behavior (e.g., invalid bounding box) or precision details. Not contradictory, but incomplete.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is one sentence, direct, and to the point. No extraneous information. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While the input schema is complete, the tool has no output schema and the description does not specify the return format (e.g., center lat/lon, width/height in km). Simplicity partially compensates, but missing output details reduce completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so each parameter is already well-documented in the schema. The tool description adds no additional parameter semantics beyond the schema, placing it at baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool computes the center point plus width/height of a geographic bounding box, using specific verbs and resource. Distinguishes from siblings like geo_intel_distance (distance between points) and geo_intel_geocode (address to coordinates).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage through description ('Pure math; free') but does not explicitly state when to use this tool versus alternatives like geo_intel_distance or geo_intel_geocode. Agent must infer from sibling tool names.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

geo_intel_distanceAInspect

Great-circle (haversine) distance between two lat/lon coordinates, in kilometres and miles. Pure math; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`lat1`	Yes	First point latitude
`lat2`	Yes	Second point latitude
`lon1`	Yes	First point longitude
`lon2`	Yes	Second point longitude

Tool Definition Quality

A3.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description discloses it is 'pure math' and 'free', signaling no side effects or cost. This is useful beyond the input schema, though it does not mention error handling or coordinate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences covering the algorithm, inputs, outputs, and pricing. No wasted words; front-loaded with core functionality.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description implies output in both km and miles but does not specify the return format (single value vs. object). Without an output schema, more explicit structure would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with each parameter described (e.g., 'First point latitude'). The description adds no additional meaning, sticking to baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool computes great-circle distance between two coordinates and outputs both kilometres and miles. It distinguishes itself from sibling geo_intel tools which are for geocoding, timezone, etc., though no explicit comparison is made.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like geo_intel_geocode. It only describes what it does, leaving the agent to infer usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

geo_intel_geocodeAInspect

Forward-geocode a place / address query to coordinates via the free OpenStreetMap Nominatim API. Returns ranked results with lat/lon and display name. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Max results to return
`query`	Yes	Place name or address to geocode

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. Notes it is read-only and free, and describes output as ranked results with lat/lon and display name. But does not disclose rate limits, accuracy, or other potential behavioral traits of the external API.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no fluff. First sentence states action, second describes output and key traits (read-only, free). Every word adds value; front-loaded with the main purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately explains return values (ranked results, lat/lon, display name). Mentions free read-only nature and API source. Could mention error handling or usage limits, but overall complete for a geocoding tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters with descriptions. Description adds value by specifying that results are 'ranked' and include 'display name', supplementing the schema's minimal descriptions. This helps the agent understand output beyond static schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it forward-geocodes a place/address query to coordinates, distinguishing from reverse geocoding (geo_intel_reverse_geocode) and other geo tools. Specific verb 'Forward-geocode' and resource 'coordinates' are clear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Mentions it uses the free OpenStreetMap Nominatim API and is read-only, providing context. However, lacks explicit guidance on when to use vs alternatives like geo_intel_reverse_geocode or geo_intel_distance. Still, the context is helpful.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

geo_intel_reverse_geocodeAInspect

Reverse-geocode a lat/lon coordinate to the nearest address / place via OpenStreetMap Nominatim. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`lat`	Yes	Latitude (-90..90)
`lon`	Yes	Longitude (-180..180)

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses read-only nature and free cost, which are helpful beyond missing annotations. Does not mention rate limits, data source usage policies, or potential limitations, leaving some behavioral traits opaque.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise, front-loaded sentences. Every word adds value: action, resource, data source, safety, cost. No redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple two-parameter tool with no output schema, the description covers purpose, data source, and cost. Missing only a hint about return value format, but overall adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for lat and lon already provided. The description adds no extra meaning beyond what the schema offers.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it reverse-geocodes a lat/lon coordinate to nearest address/place via OpenStreetMap Nominatim. Verb 'Reverse-geocode' and resource are specific, and it naturally distinguishes from sibling geo_intel_geocode.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Purpose implies usage for reverse geocoding, and notes it is read-only and free. However, it does not explicitly state when to use vs alternatives like geo_intel_geocode or provide exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

geo_intel_timezoneAInspect

Resolve the IANA timezone, abbreviation and current UTC offset for a lat/lon coordinate via the free Open-Meteo API. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`lat`	Yes	Latitude (-90..90)
`lon`	Yes	Longitude (-180..180)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided. Description discloses read-only behavior and free pricing, but does not mention error handling, rate limits, or data freshness. Minimal transparency beyond what is stated.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence efficiently conveys purpose, input, source, and attributes. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with 2 parameters and no output schema, description names the output fields (IANA timezone, abbreviation, offset). Lacks return format or error details, but adequate for basic use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema covers both parameters with descriptions (100% coverage). Description adds context about the API source but does not enhance parameter semantics beyond what schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Resolve the IANA timezone, abbreviation and current UTC offset for a lat/lon coordinate', specifying the action and resource. Distinct from sibling tools like geo_intel_distance or geo_intel_geocode.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Indicates it is 'Read-only' and 'free', but does not provide explicit guidance on when to use or alternative tools. Implies usage context but lacks exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

grant_watch_getCInspect

Get a grant call detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so the description must fully convey behavioral traits. It states that the tool returns firstSeenAt and ledgerVerified, but does not confirm read-only nature, side effects, authorization needs, or rate limits. The description is minimal and leaves ambiguity.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no wasted words. It conveys the essential purpose efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema and one parameter, but the description only mentions two return fields. It lacks explanation of what a grant call is, the event timeline structure, or any additional details that an agent might need. The description is too sparse for full context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0% and the description does not explain the itemId parameter. No format, example, or context is provided, leaving the agent without guidance on what value to supply.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action 'Get a grant call detail plus full event timeline' with specific verb and resource. It distinguishes from sibling tools like grant_watch_search or grant_watch_timeline by combining detail and timeline in one operation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. Does not mention prerequisites, conditions, or exclusions. Relies solely on the tool name and context from sibling names.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

grant_watch_recent_changesCInspect

Recent appearance / deadline-move / close / close-early events across all grant calls since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`field`	No
`limit`	No
`since`	Yes
`funder`	No

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must fully disclose behavior. It mentions output includes firstSeenAt and ledgerVerified but lacks details on rate limits, pagination, error states, or authorization requirements. The brief description offers only basic transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, which is concise but omits crucial details. It could be slightly expanded to cover parameters without becoming verbose. The structure is acceptable but not optimal given the missing information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters, no output schema, and no annotations, the description is incomplete. It does not explain the purpose of all parameters, the full output structure, or usage contexts. The agent lacks sufficient information to invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, meaning no parameter descriptions exist in the schema. The description only hints at the 'since' parameter ('since the given ISO8601 timestamp') and completely ignores 'field', 'limit', and 'funder'. This fails to provide essential semantic meaning for the agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns recent events across all grant calls filtered by timestamp. It mentions specific event types (appearance, deadline-move, close, close-early). However, it does not differentiate from sibling tools like grant_watch_timeline or grant_watch_search, which may have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives such as grant_watch_search or grant_watch_timeline. There is no mention of prerequisites, context, or exclusions, leaving the agent without direction for tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

grant_watch_searchBInspect

Search Japanese research-grant calls-for-proposals. Each hit includes firstSeenAt and ledgerVerified (hash-chain integrity).

ParametersJSON Schema

Name	Required	Description
`field`	No	研究分野
`limit`	No
`query`	No
`since`	No
`funder`	No	配分機関 (JST/AMED/NEDO 等)
`status`	No
`amountMin`	No

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations exist, so the description bears full burden. It discloses that results include firstSeenAt and ledgerVerified fields, offering some transparency. However, it lacks details on pagination, sorting, error states, or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is compact (two sentences) and front-loaded with the purpose. However, it uses space to mention return fields, which could be replaced by more practically helpful details about usage.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (7 parameters, no output schema, no annotations), the description is incomplete. It omits how to use parameters, response structure beyond two fields, and any constraints or prerequisites.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 29% schema description coverage, the description does not compensate. It fails to explain any of the 7 parameters (e.g., query, limit, status) or their roles, forcing reliance on the sparse schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it searches Japanese research-grant calls-for-proposals, distinguishing it from sibling watch-search tools for other domains (e.g., bid_watch_search, landprice_watch_search). The verb 'search' and specific resource are explicit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives (e.g., grant_watch_get, grant_watch_timeline). No conditions or exclusions provided, leaving the agent to infer usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

grant_watch_timelineBInspect

Time-ordered events only for a grant call (the differentiator: when it opened, deadline moved, closed, or closed early). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided; description does not disclose behavioral traits beyond listing included fields (firstSeenAt, ledgerVerified). Lacks context on side effects, permissions, or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences front-load the purpose and add relevant details without extraneous text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite low complexity (single param), the description omits parameter explanation and full return structure. The agent cannot determine how to specify itemId or what the full response looks like.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 1 parameter (itemId) with 0% description coverage. The description does not explain what itemId represents, leaving the agent without necessary context to invoke the tool correctly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it provides time-ordered events for a grant call, with specific examples (opened, deadline moved, closed). The phrase 'the differentiator' distinguishes it from sibling tools like grant_watch_get and grant_watch_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies use for timeline events only, but lacks explicit when-to-use vs alternatives. No mention of when not to use or comparisons with other watch tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

grant_watch_verify_ledgerBInspect

Verify the hash-chain integrity of a grant call (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided. The description lists return fields but does not disclose whether the operation is read-only, required permissions, or any side effects. As a verification tool, it likely has no destructive impact, but that is not stated.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence followed by a list of return fields. It is efficient but slightly repetitive (e.g., 'ledgerVerified' mentioned twice). No unnecessary text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple verification tool with one parameter and no output schema, the description covers the purpose and return fields. However, it lacks context on what constitutes a 'grant call' and when to use it. It is minimally complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. The description does not mention the 'itemId' parameter or explain what it represents for the grant call. The schema only shows the parameter type and requirement, but the description adds no semantic context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Verify' and the resource 'hash-chain integrity of a grant call', with tamper detection. It distinguishes from sibling verify_ledger tools by specifying 'grant call'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for verifying ledger integrity but does not explicitly state when to use this tool versus other verify_ledger tools or other grant_watch tools. No alternatives or exclusions are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kyb_reportAInspect

Turn a company/entity name into a structured English KYB / due-diligence report for foreign-inbound buyers. Aggregates cross-ledger hits (administrative sanctions, licenses, public bids, recalls, etc.) via entity_search, summarizes them (counts + deterministic risk_flags), and attaches an F-037 provenance receipt to EACH hit so the screening is auditable. No model-invented facts: every asserted fact originates from a ledger hit and carries a receipt. Informational only; not legal advice. Price 0.0.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes	Company / entity name (partial match across all ledgers)
`jurisdiction`	No	Jurisdiction code (default 'jp')

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, description fully discloses behavior: it aggregates cross-ledger hits via entity_search, summarizes with counts and risk_flags, attaches F-037 provenance receipts, and asserts no invented facts. This provides comprehensive transparency beyond basic purpose.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is front-loaded with the core purpose and efficiently conveys key details. While slightly verbose, each sentence adds value (aggregation method, risk flags, receipts, disclaimers). Structure is logical and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, description clearly explains the output structure: a structured English report with aggregated hits, counts, risk_flags, and provenance receipts. It also covers input constraints (partial match) and limitations (no invented facts), making it complete for a report-generation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (both parameters documented). Description adds value by clarifying that 'query' supports partial matching across all ledgers and that 'jurisdiction' has a default of 'jp', which goes beyond the schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool converts a company/entity name into a structured English KYB/due-diligence report for foreign-inbound buyers. It specifies the verb 'turn into,' the resource (report), and the context (foreign-inbound buyers), distinguishing it from lower-level search tools like entity_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description provides context (for foreign-inbound buyers) and includes a disclaimer that it's informational only, but does not explicitly state when not to use this tool or mention alternative tools. It implies usage for due-diligence screening but lacks exclusionary guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

landprice_watch_getCInspect

Get a 地価公示 standard-land record detail plus its full event timeline (price revisions, reissues, vanish events). Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

C2.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description mentions it returns data and lists specific fields, but without annotations, it fails to disclose behavioral traits like idempotency, safety, or authentication requirements. The tool appears read-only, but this is not explicitly stated.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief with two sentences, efficiently conveying the core functionality. It avoids redundancy and front-loads the main action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema), the description is minimally adequate. It specifies returned fields but omits details on error handling, pagination, or authentication, which may be needed for an agent to use it reliably.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The sole parameter 'itemId' has no schema description (0% coverage) and the description adds no additional meaning. While the name suggests an identifier, the description does not clarify its format or source, leaving agents to infer.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it retrieves a standard-land record detail and its full event timeline, and specifies returned fields. However, it does not differentiate from sibling watch_get tools (e.g., bid_watch_get, grant_watch_get), relying on the tool name to imply the domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like landprice_watch_search or landprice_watch_timeline. The description lacks contextual cues for appropriate usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

landprice_watch_recent_changesBInspect

List landprice events observed after the given ISO8601 timestamp. Optional prefectureCode filter.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`prefectureCode`	No

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It adds minimal behavioral context beyond the schema (e.g., no mention of pagination, ordering, rate limits, or what constitutes an 'event').

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise, front-loaded sentences with no waste. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (3 parameters, no output schema), the description covers the main purpose but lacks details on return format or the meaning of 'events'. It is adequate but not thorough.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It explains 'since' (ISO8601 timestamp) and 'prefectureCode' (optional filter), but omits any explanation for 'limit', which is a common parameter with a default value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists landprice events after a timestamp, with an optional prefecture code filter. The verb 'List' and specific resource 'landprice events' make the purpose unambiguous, and it differentiates from sibling tools like 'get' or 'search'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description does not mention when to prefer 'recent_changes' over 'search' or 'timeline'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

landprice_watch_searchAInspect

Search Japanese 地価公示 (MLIT 国土数値情報 L01) standard-land snapshots. Each hit is one standard point for a given year, ledgered for tamper-detection. Returns ledgerVerified per hit.

ParametersJSON Schema

Name	Required	Description
`year`	No	年度 (e.g. 2026)
`limit`	No
`query`	No	市区町村名 / 所在 / 利用現況部分一致
`areaCode`	No	5-digit prefecture+municipality code (e.g. "13101")
`registry`	No
`prefectureCode`	No	JIS X 0401 都道府県コード (e.g. "13")

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses ledger verification and tamper-detection, but with no annotations, it omits auth, rate limits, or destructive potential. Searches are read-only, but this is not explicit.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences efficiently convey purpose and a key behavioral trait. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a search tool, but no output schema means description should clarify return fields beyond ledgerVerified. Could mention pagination or error conditions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 67%; description adds no new parameter details beyond what the schema provides. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool searches Japanese land price snapshots from a specific dataset (MLIT L01), specifies ledger verification, and distinguishes from other watch tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus sibling watch tools like bid_watch_search or grant_watch_search. No mention of prerequisites or scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

landprice_watch_timelineBInspect

Timeline view of one standard-land point — all observation events with diff, observedAt and chain-hash entries. ledgerVerified is computed end-to-end against IDENTITY_SIGNING_SECRET.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description bears full burden. It discloses that ledgerVerified is computed against IDENTITY_SIGNING_SECRET, adding a behavioral detail. However, it does not state whether the operation is read-only, requires authentication, or has rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences without redundancy. Could potentially incorporate parameter semantics, but the description is well-structured and front-loaded with core functionality.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description covers output fields (diff, observedAt, chain-hash, ledger verification) but fails to explain the required input. This gap limits the agent's ability to invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter 'itemId' (required string) has no description in schema (0% coverage). The tool description does not explain its meaning or format, leaving the agent without guidance on how to specify the land point.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it provides a timeline view of a single standard-land point, listing observation events with specific fields (diff, observedAt, chain-hash). This distinguishes it from sibling tools like landprice_watch_get or landprice_watch_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives such as landprice_watch_get, recent_changes, or search. The description implies historical context but does not specify exclusions or preferred scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

landprice_watch_verify_ledgerAInspect

Verify the hash-chain integrity of a landprice record (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses the return fields (chainValid, brokenAt, etc.) and mentions 'tamper detection', indicating a read-only operation. However, it omits details on error behavior (e.g., invalid itemId), rate limits, authorization needs, or any side effects. The disclosure is adequate but not thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long: the first states the core purpose, the second lists key return fields. It is front-loaded, concise, and contains no unnecessary words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema) and the presence of many sibling verify_ledger tools, the description sufficiently covers the purpose and return values. It lacks details on error handling or input validation, but for a verification tool, the provided info is mostly complete. The absence of an output schema makes the listing of return fields valuable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0% (itemId has no description). The description compensates partially by stating the tool verifies a 'landprice record', implying that itemId identifies that record. It does not explicitly describe the parameter format or constraints, but the context is sufficient for basic understanding. The compensation is moderate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Verify') and resource ('hash-chain integrity of a landprice record'), clearly distinguishing it from sibling verify_ledger tools which operate on different watch types (e.g., bid_watch_verify_ledger, grant_watch_verify_ledger). The action and scope are unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states the tool is for tamper detection of landprice records, implying when to use it (to verify integrity). However, it does not explicitly contrast with alternative tools or provide 'when not to use' guidance. The sibling tools are domain-specific, so the name and description offer implicit differentiation but lack explicit use-case boundaries.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

license_watch_getCInspect

Get a license registration detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so the description must disclose behavioral traits. It only mentions return fields but not whether the operation is read-only, idempotent, or any side effects. Minimal transparency for a tool that likely performs a safe get.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two short sentences, concise and front-loaded with the main action. No wasted words, though some detail could be added without harming conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter, the description is minimal. It does not explain the timeline's scope, the response format, or any prerequisites. Given the lack of output schema, more clarity would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The sole parameter itemId has no description in the schema (0% coverage) and the description does not explain it. The agent must infer that itemId identifies a license registration, which is unclear.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it retrieves a license registration detail and full timeline, mentioning specific return fields (firstSeenAt, ledgerVerified). However, it does not distinguish it from similar siblings like license_watch_timeline or license_watch_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives among the many license_watch_* siblings. The description only states what it does, not when or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

license_watch_recent_changesAInspect

Recent appearance / revoked / suspended events across all license ledgers since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`registry`	No

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations exist, so the description must cover behavioral traits. It describes output fields (firstSeenAt, ledgerVerified) but omits side effects, authentication, rate limits, or whether it is read-only. Adequate but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with key information. No unnecessary words. Efficiently conveys purpose and key point about output fields.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 3 parameters (one required), no output schema, and no annotations, the description covers the main functionality and time filter. However, it lacks explanation for 'registry' and 'limit', and could better differentiate from sibling tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. The description clarifies the 'since' parameter as ISO8601 timestamp, but does not explain 'limit' or 'registry'. Adds meaning to one of three parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists recent events (appearance/revoked/suspended) across all license ledgers, with a temporal filter. It distinguishes from siblings like license_watch_get and license_watch_search by focusing on recent changes and the scope 'across all license ledgers'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for recent changes since a timestamp, but lacks explicit when-to-use vs alternatives. No direct comparison to sibling tools like license_watch_timeline or other watch recent_changes tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

license_watch_searchAInspect

Search Japanese license / registration ledgers (FSA menkyo: 金融商品取引業者, 預金取扱金融機関 …). Each hit includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No	業者名・登録番号・部分一致
`since`	No
`status`	No
`licensor`	No	許認可権者 (関東財務局長・内閣総理大臣（金融庁）等)
`registry`	No	名簿種別 (fsa-kinyushohin / fsa-ginkou など)

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It mentions output fields (firstSeenAt, ledgerVerified) but omits behavioral traits like read-only nature, pagination, rate limits, or authentication requirements. Adequate for a simple search but not detailed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with primary action and crucial registry context. Every sentence adds value with zero fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given six parameters (none required), no output schema, and only partial schema coverage, the description is insufficient. It fails to detail important aspects like default limit, since behavior, status interpretation, or result structure beyond two fields. A more complete description would address these gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 50% (3 of 6 params have descriptions). The tool description adds no parameter-specific meaning beyond what the schema provides. It neither compensates for undocumented parameters nor clarifies param usage (e.g., query syntax, since format). Baseline score due to partial coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the tool's function: 'Search Japanese license / registration ledgers' and lists exact registry categories (FSA menkyo: 金融商品取引業者, 預金取扱金融機関 …). It distinguishes from sibling watch tools like license_watch_get by emphasizing search and from other domain search tools by targeting Japanese financial licenses.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for searching multiple registries but provides no explicit guidance on when to use this tool versus alternatives (e.g., license_watch_get or search tools for other jurisdictions). No exclusion criteria or context for when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

license_watch_timelineAInspect

Time-ordered events only for a license registration (the differentiator: when it appeared, when it was revoked / expired / suspended). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.5/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose behavioral traits. It mentions included fields (firstSeenAt, ledgerVerified) but does not state if the operation is read-only, safe, or any side effects. It lacks information on authentication, rate limits, or pagination behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is one sentence with a brief tag, highly concise and front-loaded. It efficiently communicates the tool's purpose and differentiator, though it could benefit from a bit more structure without becoming verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has one parameter and no output schema. The description mentions it returns time-ordered events and includes two fields, but does not describe the full output structure, required permissions, or prerequisites. It is adequate but has clear gaps in providing a complete picture.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning parameter descriptions are empty. The description mentions 'for a license registration' but does not explicitly map itemId to the license registration ID or describe its format. This is insufficient for a single parameter tool.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides time-ordered events for a license registration, listing specific event types (appearance, revocation, expiry, suspension). It distinguishes itself from sibling tools like license_watch_get and license_watch_recent_changes by focusing on the timeline.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly calls out the 'differentiator'—it is for timeline events, not current state or changes. This gives clear context for when to use this tool, though it does not provide explicit when-not or alternative tool names.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

license_watch_verify_ledgerAInspect

Verify the hash-chain integrity of a license registration (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It lists return fields (chainValid, brokenAt, etc.) which adds value, but does not disclose side effects, authentication needs, rate limits, or other behavioral traits like whether it's a read-only operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is efficient and front-loaded, conveying the core purpose and return fields without waste. However, it could be slightly more structured with explicit parameter explanation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description lists return fields, which is helpful. However, the lack of parameter explanation and behavioral context leaves gaps. For a single-param tool, it is moderately complete but not fully self-contained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter itemId has no description in the schema (0% coverage). The description does not explain what itemId represents (e.g., license ID), failing to add meaning beyond the raw schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Verify' and the resource 'hash-chain integrity of a license registration', with a specific purpose of tamper detection. It distinguishes from sibling verify_ledger tools by specifying 'license registration' and from other license_watch tools by focusing on verification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for tamper detection, but provides no explicit guidance on when to use this tool versus alternatives (e.g., license_watch_get) or when not to use it. There are no exclusions or prerequisites mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

onchain_risk_address_riskAInspect

Heuristic address risk score (0-100) from on-chain reads (code, balance, nonce) plus bundled OFAC-style sanction and flagged lists, over public EVM JSON-RPC. Every contribution is returned in rationale. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`chain`	No	EVM chain (default: ethereum).
`address`	Yes	EVM address (0x-hex)

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses read-only behavior, free cost, data sources (code, balance, nonce, bundled lists), and output nature (rationale included). It effectively communicates key behavioral traits beyond what the schema provides.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences contain all essential information: purpose, input, output, features, and constraints. No extraneous content; optimal front-loading of the score range and data sources.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of output schema, the description sufficiently explains the return value (score with rationale) and sources. It could specify the exact output structure (e.g., JSON fields), but the current level allows an agent to understand what to expect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description adds no new parameter details beyond enum and string descriptions. However, it provides context about how the address is used (on-chain reads) and the role of chain default, which modestly enhances understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns a heuristic address risk score (0-100) from on-chain reads and sanction/flagged lists with rationale. It distinguishes from sibling tools like onchain_risk_sanctions_screen and onchain_risk_token_safety by focusing on a composite risk score for an address.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for assessing address risk but does not explicitly state when to use this over alternatives like onchain_risk_sanctions_screen or onchain_risk_approval_risk. No when-not-to-use guidance is provided, leaving the agent to infer context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

onchain_risk_approval_riskAInspect

ERC-20 approval exposure: scans Approval logs granted by the address and flags unlimited / active allowances, via eth_getLogs over public EVM JSON-RPC. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`chain`	No	EVM chain (default: ethereum).
`address`	Yes	EVM address (0x-hex)
`fromBlock`	No	Optional start block for the log scan (default: a recent window).

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It discloses read-only nature, free pricing, and the underlying JSON-RPC method. This provides good transparency about behavior, though it does not mention rate limits or output format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: a single sentence that covers purpose, method, input, and cost. Every word is necessary, and no redundant information is present.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity of the tool, complete schema coverage, and no output schema, the description provides sufficient context. It could be enhanced by clarifying what 'unlimited / active allowances' means, but overall it is adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaning beyond the schema by explaining that the address parameter is the grantor of approvals and that the tool scans Approval logs. This contextualizes the parameters effectively, especially since schema coverage is 100%.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool scans ERC-20 approval logs from an address and flags unlimited/active allowances. It mentions the method (eth_getLogs) and distinguishes itself from sibling onchain risk tools by focusing specifically on approval exposure.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for use (checking approval allowances) but does not explicitly state when to use this tool versus alternatives like onchain_risk_address_risk or token_safety. However, the domain-specific language ('ERC-20 approval exposure') makes the intended use case evident.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

onchain_risk_contract_verifyAInspect

Contract / bytecode summary: whether the address is a contract, bytecode size and any embedded CBOR metadata, via public EVM JSON-RPC and block-explorer / Sourcify HTTPS. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`chain`	No	EVM chain (default: ethereum).
`address`	Yes	EVM address (0x-hex)

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Given no annotations, the description discloses it is read-only, free, and uses external public data sources (EVM JSON-RPC, block-explorer, Sourcify). It does not mention potential rate limits or authentication needs, but provides sufficient context for a simple read operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose, then details on outputs, data sources, and pricing. Every sentence adds value; no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Without an output schema, the description explains expected return values (contract status, bytecode size, CBOR metadata) and data sources. It could include caveats about data freshness or error handling, but overall is adequate for a straightforward tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters (chain, address). The description does not add new parameter-level meaning beyond the schema, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks if an address is a contract, provides bytecode size and CBOR metadata. It distinguishes from sibling onchain risk tools (e.g., address_risk, token_safety) by focusing on contract verification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for contract/bytecode analysis but does not explicitly state when to use this tool vs alternatives (e.g., onchain_risk_address_risk). No guidance on when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

onchain_risk_sanctions_screenBInspect

Screen an address against the bundled OFAC-style sanction list; returns the sanctioned flag, any matches and the list size. Pure list lookup. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`chain`	No	EVM chain (default: ethereum).
`address`	Yes	EVM address (0x-hex)

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the burden and correctly states the tool is read-only and free (price 0.0). It also describes the output (sanctioned flag, matches, list size) and calls it a 'pure list lookup'. However, it omits potential details like response structure, rate limits, or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the key action, no wasted words. Every sentence adds value: first describes purpose and output, second clarifies it's a pure lookup and is free.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description covers input (address, chain via schema) and output (sanctioned flag, matches, list size). It lacks details on error cases or format of matches, but for a simple lookup, this is largely sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description does not need to add parameter details. It does add context about the sanction list being 'bundled OFAC-style', but does not elaborate on the chain or address parameters beyond what the schema provides. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool screens an address against an OFAC-style sanction list, returning a flag, matches, and list size. It specifies it is a pure list lookup, read-only, and free. However, it does not explicitly differentiate from sibling sanctions tools like sanctions_screen_check_address or sanctions_screen_entity, though the context of 'bundled OFAC-style list' provides some distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for simple, free sanctions checks but provides no explicit guidance on when to use this tool versus alternatives. It lacks exclusions or mentions of when a more comprehensive tool (e.g., sanctions_screen_entity) would be needed.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

onchain_risk_token_safetyAInspect

ERC-20 token safety summary via on-chain eth_call reads: name / symbol / decimals / totalSupply, owner and ownership-renounced signal. Public EVM JSON-RPC. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`chain`	No	EVM chain (default: ethereum).
`address`	Yes	EVM address (0x-hex)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description declares the tool as 'Read-only' and 'price 0.0 (free)', but without annotations, it carries the burden of behavioral disclosure. It does not mention rate limits, auth requirements, or what happens if the address is not an ERC-20 token.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: three sentences front-loaded with the tool's purpose, using specific verbs and resource references. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately lists the fields returned (name, symbol, decimals, totalSupply, owner, ownership-renounced signal). It also notes it's read-only and free. Missing error handling or environmental context, but reasonably complete for a simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds context about the data returned (name, symbol, etc.) but does not enhance parameter semantics beyond what the schema already provides for chain and address.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides an ERC-20 token safety summary via on-chain eth_call reads, listing specific fields (name, symbol, decimals, totalSupply, owner, ownership-renounced signal). It distinguishes itself from sibling tools like onchain_risk_address_risk and onchain_risk_contract_verify by focusing specifically on token safety metrics.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for checking token safety but does not explicitly state when to use this tool versus alternatives (e.g., onchain_risk_contract_verify for contract code verification). No when-not-to-use or exclusion criteria are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ordinance_watch_getBInspect

Get an ordinance detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It only mentions that the tool returns 'firstSeenAt' and 'ledgerVerified', but fails to disclose other important behavioral traits such as whether the operation is read-only, idempotent, requires authentication, or has any side effects. The mute scope is not indicated.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at two sentences, with the primary action front-loaded. However, it sacrifices completeness by omitting parameter explanations and usage context. It is not wasteful but could be improved by including critical details without adding length.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple input schema and no output schema, the description provides the tool's basic purpose and two return fields. However, it does not fully cover what constitutes 'ordinance detail' or clarify the parameter's role. An agent might struggle to use the tool correctly without additional inference.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema coverage is 0% and the description does not explain the meaning of 'itemId'. The agent must infer that 'itemId' is the ordinance identifier, but no confirmation or format details are given. The description adds no value beyond the schema's parameter list.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves an ordinance detail plus a full event timeline, using a specific verb ('Get') and resource. It distinguishes itself from sibling tools like 'ordinance_watch_timeline' by indicating it provides both detail and timeline.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. For example, it does not clarify that 'ordinance_watch_timeline' should be used if only the timeline is needed, or that this tool is the primary choice for full details. No exclusions or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ordinance_watch_recent_changesBInspect

Recent appearance / amendment / repeal events across all ordinances since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`issuerCode`	No

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description discloses return fields (firstSeenAt, ledgerVerified) and event types, indicating a read-only operation. However, it does not mention pagination, rate limits, or any side effects. Without annotations, this is adequate but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise two-sentence description: first states the action, second lists key fields. No redundant information, easy to scan.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Description partially explains the output (fields) but omits details on pagination, default limit behavior, or the full response shape. With no output schema, more detail would be beneficial for accurate use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%—the description alludes to 'since' (ISO8601 timestamp) but does not name or explain the 'limit' or 'issuerCode' parameters. Parameter meaning is largely left to the schema, which provides minimal context beyond defaults and format.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool returns recent appearance, amendment, and repeal events across ordinances, specifying an ISO8601 timestamp filter and included fields. This distinguishes it from sibling watch tools for other domains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives like ordinance_watch_get, ordinance_watch_search, or ordinance_watch_timeline. The description implies broad monitoring ('across all ordinances') but lacks direct comparisons.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ordinance_watch_searchAInspect

Search Japanese national laws / ordinances (e-Gov 法令検索 v2; Stage 1 covers the national level only — 自治体例規 ships in a follow-up). Each hit includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No	法令名 / 法令番号 / 略称・部分一致
`since`	No
`status`	No
`issuerCode`	No	自治体コード JIS X 0401/0402 (国は不要)
`jurisdiction`	No	'国' (Stage 1 only)
`ordinanceType`	No	法律 / 政令 / 省令 / 勅令 / 規則 / 憲法 …

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description bears full burden. It mentions output fields (firstSeenAt, ledgerVerified) but does not disclose read-only nature, rate limits, authentication, or any side effects. The tool appears safe but transparency is limited.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences that front-load the core action and scope, with no extraneous words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters, no output schema, and no annotations, the description should cover result structure and usage more. It mentions hit fields but omits pagination, ordering, or total count. Completeness is adequate but not thorough.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 57%, and the main description adds no parameter-specific guidance beyond what the schema already provides (e.g., query, issuerCode). It does not explain parameters like limit, since, or status further.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches Japanese national laws/ordinances via e-Gov, identifies the specific data source, and notes scope limitation to national level. It distinguishes from sibling watch tools (get, recent_changes) and entity_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for national-level law search and mentions a follow-up for local ordinances, providing context. However, it does not explicitly state when not to use or list alternative tools, though the purpose is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ordinance_watch_timelineBInspect

Time-ordered events only for an ordinance (the differentiator: when it appeared / was amended / was repealed). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must carry the transparency burden. It mentions output fields (firstSeenAt, ledgerVerified) but does not disclose whether the tool is read-only, requires authentication, or has rate limits. The phrase 'only for an ordinance' hints at scope but is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that conveys the core purpose and differentiator efficiently. It front-loads key information, though the phrase 'the differentiator' is slightly redundant.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and one undocumented parameter, the description should explain input format and output structure more thoroughly. It mentions two output fields but omits format, data types, and how to retrieve itemId.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The sole parameter, itemId, has no description in the schema (0% coverage) and the description does not explain what it represents or how to obtain it. The tool adds no semantic meaning beyond the schema definition.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns time-ordered events for an ordinance, explicitly distinguishing it from siblings by noting 'the differentiator: when it appeared / was amended / was repealed.' This provides a specific verb and resource.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for viewing chronological changes, but does not explicitly contrast with other ordinance_watch tools (e.g., get, search, recent_changes). The mention of 'differentiator' helps but lacks a clear when/when-not statement.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ordinance_watch_verify_ledgerAInspect

Verify the hash-chain integrity of an ordinance record (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Given no annotations, the description carries full burden. It discloses the operation (hash-chain verification), return fields, and purpose (tamper detection). It does not mention side effects, but verification is inherently read-only. Missing explicit read-only guarantee.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences front-load the purpose and list returns. No wasted words, highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers what the tool does and its output, but lacks parameter explanation and usage guidance. For a simple one-param tool, it is adequate but has clear gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter itemId is not explained in the description. With 0% schema coverage, the description should clarify its role (e.g., the ordinance record ID). It adds no value beyond the parameter name.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool verifies hash-chain integrity of an ordinance record for tamper detection, and lists return fields. This distinguishes it from sibling tools like get, search, timeline, and other verify_ledger tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for tamper detection but does not explicitly say when to use this vs alternatives (e.g., ordinance_watch_get for current data). With many similar verify_ledger tools, guidance on context would help.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pharma_watch_getCInspect

Get a pharmaceutical record detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided; description only mentions returns firstSeenAt and ledgerVerified. Does not disclose if the operation is read-only, idempotent, or any side effects. Fails to cover behavioral traits beyond what is minimally stated.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single sentence that states purpose and return fields. Adequate but minimal; could benefit from slight expansion on context without being verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple get tool with one parameter and no output schema, the description provides basic purpose but omits details like error handling, default behavior, or relation to other pharma_watch tools. Incomplete for confident use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% and description adds no meaning to the itemId parameter. It does not explain what itemId represents (e.g., pharma record ID from search). The parameter is entirely opaque.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it retrieves a pharmaceutical record detail plus full event timeline, with specific return fields. It distinguishes from siblings like pharma_watch_search or pharma_watch_timeline by combining detail and timeline in one call.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this vs. alternatives such as pharma_watch_timeline or pharma_watch_search. No exclusions or context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pharma_watch_recent_changesAInspect

Recent approval / NHI-listed / price-revised events across all pharmaceutical records since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`category`	No

Tool Definition Quality

A3.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions returned fields but does not disclose read-only nature, authentication needs, rate limits, or pagination behavior. Minimal behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with clear, front-loaded information. Every sentence adds value without fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description mentions returned fields, which is helpful given no output schema. However, it lacks details on pagination, result ordering, category values, and limit semantics. Adequate but with clear gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. The description only indirectly explains the 'since' parameter by mentioning the ISO8601 timestamp. The 'limit' and 'category' parameters are not explained, leaving their meaning unclear.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns 'Recent approval / NHI-listed / price-revised events' across pharmaceutical records, with a temporal filter. It distinguishes from sibling tools like pharma_watch_get (single record) and pharma_watch_search (search) by focusing on recent changes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates usage when you need events after a given ISO8601 timestamp. It implies a temporal query context but does not explicitly state when not to use it or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pharma_watch_searchAInspect

Search Japanese pharmaceutical approvals (PMDA) and NHI-listed drugs (MHLW yakka). Each hit includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No	販売名 / 成分名・部分一致
`since`	No
`status`	No
`category`	No	PMDA分野 (第１等) / MHLWセグメント (内用薬等)
`applicant`	No	製造販売業者 / メーカー

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It mentions that each hit includes 'firstSeenAt and ledgerVerified', adding some transparency about output fields. However, it does not disclose other behavioral traits like read-only nature, pagination, or rate limits, leaving gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that effectively communicates the purpose and key output fields. It has no unnecessary words and is appropriately concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 6 parameters and no output schema, the description is too brief. It does not address parameter usage, default values, or pagination, leaving the agent with insufficient context for effective invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 50%, meaning several parameters (limit, since, status) lack descriptions in the schema. The tool description does not compensate by explaining these parameters, so it adds no semantic value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches 'Japanese pharmaceutical approvals (PMDA) and NHI-listed drugs (MHLW yakka)', which is a specific verb and resource. It distinguishes from siblings like pharma_watch_get, pharma_watch_recent_changes, etc., by indicating it is a search function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool versus alternatives. The usage is implied by the tool name and the description, but no guidance on when not to use or specific scenarios is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pharma_watch_timelineAInspect

Time-ordered events only for a pharma record (the differentiator: when it was approved / NHI-listed / price-revised). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses key traits: returns time-ordered events, includes firstSeenAt and ledgerVerified. However, it does not mention pagination, date range, or response format, which are gaps for a tool without output schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no filler. Every phrase adds meaning: defines scope, differentiates, and lists included fields. Ideal conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool (1 param, no output schema), the description covers purpose and core inclusions. But completeness suffers from missing return structure and usage constraints. Adequate but not thorough.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so description must compensate. It says 'for a pharma record', implying itemId is the record ID, but does not provide format, examples, or constraints. Adds minimal value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool provides 'time-ordered events only for a pharma record' and lists specific event types (approved, NHI-listed, price-revised). This distinguishes it from sibling tools like pharma_watch_get (single record) and pharma_watch_search (filtered list). The differentiator is explicit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for timeline events but does not explicitly state when to use this tool versus alternatives like pharma_watch_recent_changes or pharma_watch_verify_ledger. No exclusions or clear context are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pharma_watch_verify_ledgerAInspect

Verify the hash-chain integrity of a pharma record (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Since no annotations are provided, the description carries full burden for behavioral disclosure. It explicitly lists return fields (chainValid, brokenAt, checked event count, firstSeenAt, ledgerVerified), giving clear output expectations. However, it does not confirm non-destructive behavior or mention any side effects, authentication needs, or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences: one stating the primary action and purpose, the other listing return fields. Every sentence adds value, and there is no extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple verification tool with one parameter and no output schema, the description covers the core purpose and return values adequately. However, it omits parameter details and usage context, which slightly reduces completeness relative to the tool's simplicity and the presence of many similar siblings.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema description coverage at 0%, the description must add parameter meaning. It does not describe the lone parameter 'itemId' beyond the schema type. Although the tool name and description imply it is the pharma record identifier, no explicit clarification is given, which is a notable gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool verifies hash-chain integrity for tamper detection on a pharma record, using a specific verb 'verify' and a specific resource. The 'pharma_' prefix distinguishes it from other domain-specific verify_ledger siblings like bid_watch_verify_ledger or grant_watch_verify_ledger.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. While the name implies verification, there is no mention of prerequisites, exclusions, or conditions for use. The context among siblings like pharma_watch_get or pharma_watch_search is not addressed, leaving the agent to infer usage from the name.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

price_oracle_convertAInspect

Convert an amount between any two supported currencies (crypto or fiat), routing each leg through its USD value. Returns the rate and converted amount. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`to`	Yes	Target currency (crypto ticker or fiat ISO code)
`from`	Yes	Source currency (crypto ticker or fiat ISO code)
`amount`	Yes	Amount to convert

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so the description carries the burden. It states the tool is read-only and free, and explains the conversion routing via USD. This is good, but could add more details like rate limits or precision.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise with three short sentences. It is front-loaded with the primary action and adds essential details (return values, cost) with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with 3 parameters and no output schema, the description covers purpose, return format ('rate and converted amount'), cost, and read-only nature. It is complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining the routing method ('each leg through its USD value'), which is not in the schema, enhancing parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Convert') and resource ('amount between any two supported currencies'), clearly distinguishing it from sibling tools like price_oracle_crypto_price or price_oracle_fx_rate which handle single currency pairs or rates.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when converting between any two currencies, but does not explicitly state when not to use it or mention alternatives. It provides clear context but lacks explicit exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

price_oracle_crypto_priceAInspect

Live crypto spot price via the CoinGecko public API. Returns the price of symbol in the requested fiat quote. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`quote`	No	Fiat quote currency (ISO code; default: USD)
`symbol`	Yes	Crypto ticker (e.g. BTC, ETH)

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It discloses 'read-only' and free usage, and states it uses CoinGecko public API. Lacks details on rate limits or error handling, but acceptable for a simple price tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two short sentences, front-loaded with purpose. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description hints at returning 'price of symbol in fiat'. Missing details on response format, but among sibling tools, the purpose is clear and actionable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. Description adds context (live, via CoinGecko) but does not add significant meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns 'live crypto spot price via CoinGecko' and specifies it returns price of a symbol in fiat quote. Among siblings like price_oracle_convert and price_oracle_fx_rate, this uniquely targets crypto spot prices.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Mentions 'read-only' and 'price 0.0 (free)', implying it's free to use, but does not explicitly guide when to use this versus sibling tools like price_oracle_price_history for historical data.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

price_oracle_fx_rateAInspect

Fiat FX rate via the Frankfurter / ECB reference API. Returns the to units per one from unit plus the reference date. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`to`	Yes	Quote fiat currency (ISO code)
`from`	Yes	Base fiat currency (ISO code)

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description discloses read-only, free cost, and return structure (units per one from + date). Sufficient for a simple tool, though rate limits or supported currencies not mentioned.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, 21 words, no fluff. Front-loaded with core function, then key properties.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complete for this simple tool: describes source, return value, read-only, free. No output schema needed; description covers return content.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds meaning beyond schema by clarifying the exchange rate direction ('to units per one from unit'), which is not explicit in parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns fiat FX rates via Frankfurter/ECB API, distinguishing it from sibling tools like price_oracle_crypto_price for crypto rates.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Indicates read-only and free nature, and the source API gives implicit context for fiat rates. No explicit when-not-to-use, but sibling names provide clear differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

price_oracle_price_historyAInspect

OHLC price history for a crypto symbol via the CoinGecko public API, with a min / max / change summary. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`days`	No	Look-back window in days (default: 7).
`symbol`	Yes	Crypto ticker (e.g. BTC, ETH)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses read-only and free, which is good for a tool with no annotations. However, missing details like rate limits, output format, or pagination. Annotation burden fully on description.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, direct and informative. No filler words. Every sentence adds value (purpose, data source, summary, cost).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, description hints at return values (OHLC, min/max/change). Covers purpose, behavior, and cost. Adequate for a simple two-parameter tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters with descriptions. Description adds minimal value beyond stating the output includes OHLC and summary. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it provides OHLC price history for a crypto symbol, with a summary. Distinct from siblings like price_oracle_crypto_price (current price) and price_oracle_convert (conversion).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implicitly suggests use for historical price data, but no explicit when/when-not or comparison to siblings. Mentions 'free' and 'CoinGecko public API', but lacks alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

price_oracle_stablecoin_pegAInspect

Stablecoin peg check: signed deviation (basis points) of the current price from the 1 USD target, with a banded status. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`symbol`	Yes	Stablecoin ticker.

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It discloses read-only nature, free pricing, and the output type (signed deviation, banded status). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One succinct sentence plus a short note. No wasted words, front-loaded with key action and output.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple check tool with one parameter and no output schema, the description adequately explains the output (deviation, status) and constraints (read-only, free). Missing would be a brief note about the banded status definition.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single parameter 'symbol'. Description adds minimal value with 'Stablecoin ticker.' beyond the enum list. Baseline is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks stablecoin peg, returning signed deviation in basis points and banded status. It distinguishes from sibling tools like price_oracle_crypto_price and price_oracle_fx_rate.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions it's read-only and free, but does not specify when to use vs alternatives like price_oracle_crypto_price or the watch tools. No explicit exclusions or context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pubcom_watch_getCInspect

Get a public-comment notice detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description must disclose all behavioral traits. It only states what the tool does, not side effects, required permissions, or whether it's read-only. Mutually exclusive with siblings is implied but not explicit.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, efficient and front-loaded. Could be improved by separating purpose from return details, but overall concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple get-by-id tool, the description provides some return fields but omits the full structure. Without output schema, it should describe the response object shape. Also unclear if the timeline is identical to pubcom_watch_timeline's output.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description does not explain the required parameter 'itemId'. No information about its format, meaning, or allowed values.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get' and resource 'public-comment notice detail' plus 'full event timeline', and mentions specific return fields. It effectively distinguishes from sibling tools like pubcom_watch_search (searches) and pubcom_watch_timeline (separate timeline tool).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like pubcom_watch_search, pubcom_watch_timeline, or pubcom_watch_recent_changes. No prerequisites or context given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pubcom_watch_recent_changesAInspect

Recent appearance / deadline-move / close / result-published events across all notices since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`agency`	No

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It discloses that the tool returns events with fields 'firstSeenAt' and 'ledgerVerified', but does not mention if it is read-only, rate limits, pagination, or whether it modifies state. Basic behavior is clear but not exhaustive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with event types and structure. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description adequately outlines what the tool returns for its purpose, but lacks detail on the 'agency' parameter and does not mention ordering or pagination. Without annotations or output schema, more behavioral context would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage; the description does not explain the parameters beyond mentioning 'since the given ISO8601 timestamp'. The 'limit' and 'agency' parameters are undocumented in both schema and description, adding no semantic value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns 'recent appearance / deadline-move / close / result-published events across all notices since a given timestamp'. The verb 'watch recent changes' and resource 'notices' are specific, and the scope 'all notices' distinguishes it from siblings like pubcom_watch_get (single watch) and pubcom_watch_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for polling recent changes to all notices. It does not explicitly state when not to use or compare to alternatives, but the scope 'across all notices' gives clear context on its applicability.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pubcom_watch_searchBInspect

Search e-Gov public-comment notices. Each hit includes firstSeenAt and ledgerVerified (hash-chain integrity).

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No
`since`	No
`agency`	No	所管府省・行政機関
`status`	No

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context by stating each hit includes firstSeenAt and ledgerVerified, indicating hash-chain integrity. However, with no annotations, it fails to disclose read-only nature, authentication needs, rate limits, or whether the operation is destructive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at two sentences, front-loading the core purpose and key fields. While efficient, it could include more detail without sacrificing brevity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description only partially explains return values (firstSeenAt, ledgerVerified). It omits other fields, pagination, handling of 5 parameters (0 required), and how to combine filters. Incomplete for a search tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only 'agency' has a description). The description does not elaborate on any parameters, leaving the agent to infer from parameter names alone. No additional semantics beyond the schema are provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states this tool searches e-Gov public-comment notices, a specific verb+resource combination. It distinguishes itself from sibling watch_search tools (e.g., bid_watch_search, grant_watch_search) by targeting a unique domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like pubcom_watch_get or pubcom_watch_recent_changes. The description does not mention context, prerequisites, or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pubcom_watch_timelineBInspect

Time-ordered events only for a notice (the differentiator: when it opened, deadline moved, closed, or result was published). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions included fields (firstSeenAt, ledgerVerified) but does not disclose return format, pagination, or confirm read-only nature. Minimal behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two short sentences. It front-loads the main purpose and adds a detail about included fields. Could be slightly more structured but efficient for its length.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description provides basic understanding of output (time-ordered events) and mentions two fields. However, it does not fully describe the output structure or event types, leaving some ambiguity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% and the description does not explain the itemId parameter. The description implies itemId identifies a notice but does not specify its format or provide examples. No additional meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns time-ordered events for a notice, listing specific event types (opened, deadline moved, closed, result published). It distinguishes from sibling tools like pubcom_watch_get by focusing on timeline events.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implicitly tells when to use it: when needing time-ordered events for a notice. It mentions 'the differentiator' which helps distinguish from other pubcom tools. However, it lacks explicit when-not-to-use or alternative tool references.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pubcom_watch_verify_ledgerBInspect

Verify the hash-chain integrity of a notice (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description must fully disclose behavior. It explains return fields but does not state if the tool is read-only or has side effects. The description implies a verification action with no state mutation, but this is not explicit.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no fluff. Lists return fields efficiently. Every sentence is informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a simple tool with one parameter and no output schema. Covers purpose and return fields, but lacks parameter definition and any usage context or alternatives.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter 'itemId' is not explained. The description mentions 'notice' but does not clarify that itemId is the notice ID. Schema coverage is 0%, so description should compensate, but it does not.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool verifies hash-chain integrity of a notice for tamper detection, and lists return fields. This is specific and distinguishes from sibling verify_ledger tools for different domains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives such as other verify_ledger tools or other tamper-detection methods. Usage is only implied through the description's mention of 'notice' tamper detection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

realestate_watch_getBInspect

Get a real-estate transaction record detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It states returns and fields, but does not disclose auth requirements, rate limits, error cases, or read-only status. Inferred as safe retrieval but not explicit.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, efficient, front-loaded with action and result, no extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a simple get operation, but missing parameter description and contextual details like error handling or prerequisites. With no output schema, the description could do more to set expectations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%; description does not explain the itemId parameter beyond the schema name. The parameter's role is implied but not clarified, which is insufficient for a tool with zero schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it gets a real-estate transaction record detail plus full event timeline, and mentions specific return fields. This distinguishes it from siblings like realestate_watch_search, realestate_watch_timeline, and other watch_get tools for different entities.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. It implies use for a specific record by ID, but does not differentiate from realestate_watch_timeline or realestate_watch_verify_ledger. No when-not-to or prerequisite information.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

realestate_watch_recent_changesBInspect

Recent appearance / revised events across all real-estate records since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`areaCode`	No

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose behavioral traits. It only describes the output fields and the required input, omitting information about rate limits, authentication, mutability, or any side effects. This is insufficient for a read tool with no annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long: the first defines purpose and required input, the second specifies output fields. It is compact, front-loaded, and every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that the schema has 3 parameters with no descriptions and no output schema, the description is incomplete. It does not explain what 'appearance / revised events' means, the behavior of 'limit' and 'areaCode', or the format of 'firstSeenAt' and 'ledgerVerified'. The agent lacks details to invoke the tool correctly without additional context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 0% description coverage, and the description only implicitly references the 'since' parameter. It does not explain the 'limit' or 'areaCode' parameters, their defaults, or how they affect results. The phrase 'since the given ISO8601 timestamp' adds minimal value beyond the schema field name.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns 'Recent appearance / revised events' for 'all real-estate records' with a required ISO8601 timestamp. It distinguishes from sibling tools like realestate_watch_get, realestate_watch_search, etc., by specifying the scope and event type.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates using the tool with a 'since' timestamp to get recent changes, which is clear. However, it does not provide explicit guidance on when not to use it or mention alternative tools among the many realestate_watch_* siblings, leaving the agent to infer usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

realestate_watch_searchBInspect

Search Japanese real-estate transaction prices (MLIT reinfolib XIT001). Each hit is a single trade snapshot, ledgered for tamper-detection. Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No	地区名 / 物件種別 / 都道府県名・部分一致
`period`	No	"YYYY-QN" (e.g. "2024-Q1")
`areaCode`	No	JIS X 0401 都道府県コード (e.g. "13")
`priceType`	No	"transaction" (Stage 1)

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that results are ledgered for tamper-detection and include firstSeenAt and ledgerVerified fields. However, it does not explain mutation, pagination, or rate limits, and there are no annotations to supplement.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (two sentences), front-loads the purpose, and adds relevant details without superfluous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the data source and hit structure, but omits usage context, pagination, and result interpretation. For a search tool with 5 parameters, more detail would be beneficial.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Parametric coverage in schema is 80%, but the description adds no additional meaning to any parameter (e.g., how to use query, period, areaCode). It could have provided context on combining parameters or default behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it searches Japanese real-estate transaction prices from a specific dataset, and mentions key features (ledgered, timestamps). However, it does not explicitly differentiate from sibling tools like realestate_watch_get or realestate_watch_recent_changes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The description does not mention prerequisites, constraints, or typical use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

realestate_watch_timelineCInspect

Time-ordered events only for a real-estate record (the differentiator: when it appeared / was revised). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It mentions included fields (firstSeenAt, ledgerVerified) but no other behavioral traits like ordering, pagination, error conditions, or side effects. For a tool that likely returns a list, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief but the first sentence is somewhat awkward and could be clearer. It front-loads the differentiator but wastes no words. However, conciseness is not achieved at the expense of completeness, which suffers.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity (1 param, no output schema, no annotations), the description does not adequately equip the agent. Missing details on response structure, limits, or prerequisites make it incomplete for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter itemId has no description in the schema (0% coverage). The description does not explain what itemId represents or its format, leaving the agent without guidance on how to populate the required field.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns time-ordered events for a real-estate record and distinguishes it by focusing on when the record appeared or was revised. This differentiates it from sibling tools like get, recent_changes, search, and verify_ledger.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for historical timeline queries but does not explicitly state when to use this tool versus alternatives like realestate_watch_recent_changes or realestate_watch_get. No when-not-to-use or prerequisite guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

realestate_watch_verify_ledgerAInspect

Verify the hash-chain integrity of a real-estate record (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It describes the operation as verification and lists return fields, indicating a read-only check. However, it does not explicitly state no side effects or mention any destructive potential, which is acceptable but could be more precise.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that clearly states the action and lists return values. No redundant information; all text is purposeful.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema), the description covers purpose, action, and return fields. It could add a note about read-only nature, but overall complete for its complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It mentions the record being verified, which implies itemId identifies the real-estate record. For a single parameter, this is adequate but could explicitly state the parameter's role.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool verifies hash-chain integrity of a real-estate record for tamper detection. It specifies the resource (real-estate ledger) and the action (verify), differentiating it from sibling verify_ledger tools for other domains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for tamper detection on real-estate records, but does not explicitly state when to use vs not use alternatives. Given the domain-specific name and clear purpose, context is sufficient, though no exclusions are listed.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recall_watch_getBInspect

Get a recall detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It implies a read operation by stating 'get' and listing return fields, but does not explicitly confirm read-only behavior, authorization needs, or side effects. Provides minimal behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise two-sentence description that front-loads the action and return details. No unnecessary words, achieving high density of useful information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description is adequate but not comprehensive. It identifies key return fields but lacks details on response structure, potential errors, or how to obtain itemId. Given sibling tool variety, more context would improve agent decision-making.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. The sole parameter 'itemId' is not explained, leaving its purpose and format entirely unspecified. The description adds zero semantic value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves a recall detail and full event timeline, specifying returned fields (firstSeenAt, ledgerVerified). It distinctively identifies the resource (recall) and action (get), differentiating it from sibling tools like recall_watch_search or recall_watch_timeline.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives (e.g., recall_watch_search, recall_watch_timeline). The description does not mention prerequisites, exclusions, or use cases, leaving the agent to infer from the name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recall_watch_recent_changesBInspect

Recent appearance / severity-escalated events across all recalls since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`agency`	No

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are absent, so the description must fully convey behavior. It mentions that each item includes 'firstSeenAt and ledgerVerified' but does not state read-only status, side effects, rate limits, or pagination behavior. Minimal transparency beyond basic return fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single sentence front-loads the core purpose and adds a key detail about returned fields. No redundant words; every part contributes.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 3 parameters, no output schema, and no annotations, the description is too brief. It does not cover pagination, ordering, error conditions, or complete return structure, leaving the agent with significant unknowns.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. The description clarifies that 'since' should be an ISO8601 timestamp, but does not explain 'limit' or 'agency'. Only one of three parameters is partially documented.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it shows 'Recent appearance / severity-escalated events across all recalls' since a given timestamp, which specifies the verb (watch), resource (recalls), and scope. It distinguishes from sibling tools like recall_watch_search and recall_watch_timeline by focusing on recent changes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives such as recall_watch_search or recall_watch_timeline. The description implies usage for recent events but does not set boundaries or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recall_watch_searchCInspect

Search Japanese product / food recall notices (consumer-affairs-agency aggregator). Each hit includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No	商品名 / 事業者名・部分一致
`since`	No
`agency`	No	所管 (消費者庁等)
`status`	No
`recallClass`	No	リコール区分 (返金／回収 / 回収命令 / 注意喚起等)

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden for behavioral disclosure. It only mentions that hits include firstSeenAt and ledgerVerified, but does not cover mutability, authorization needs, rate limits, or side effects. This is insufficient for a search tool with no annotation support.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, concise and front-loaded with purpose. No redundant information, but could be slightly more structured. Still efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 6 parameters, no output schema, and no annotations, the description is too minimal. It mentions only two output fields, leaving users unsure about the full response structure or pagination behavior. Completeness is inadequate for a search tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 50% (three parameters have descriptions). The description adds no additional parameter information beyond what is in the schema. For example, it does not clarify the meaning of 'since', 'limit', or 'status'. With moderate coverage, the description should compensate but does not.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it searches Japanese product/food recall notices from a consumer-affairs-agency aggregator. It specifies the scope (Japanese) and type (recalls), and mentions output fields (firstSeenAt, ledgerVerified), distinguishing it from sibling tools like recall_watch_get or recall_watch_timeline.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives such as recall_watch_get or recall_watch_timeline. The description only states what it does, not when it is appropriate. Sibling tools exist but criteria for choosing between them are unaddressed.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recall_watch_timelineAInspect

Time-ordered events only for a recall (the differentiator: when it appeared, when severity escalated, when it was completed). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description must fully disclose behavior. It identifies the tool as a read operation (time-ordered events) and mentions included fields (firstSeenAt, ledgerVerified), but does not describe any limitations, pagination, authentication needs, or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The single-sentence description is concise and includes key differentiators. However, it could be slightly more structured by separating the purpose from the field list.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple single-parameter tool with no output schema and many siblings, the description provides enough context to understand the tool's purpose and differentiate it. However, it lacks details on output structure beyond two fields, which may leave the agent uncertain about what to expect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has one required parameter (itemId) with 0% description coverage. The description does not explain what itemId represents or specify any format/constraints. Given the low coverage, the description should compensate but fails to add value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool returns time-ordered events for a recall, specifically highlighting the differentiator: when it appeared, severity escalated, and completed. This clearly distinguishes it from sibling tools like recall_watch_get (single entity details) and recall_watch_search (list of recalls).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when needing the event timeline for a recall, but does not explicitly state when not to use it or name alternatives. No exclusion criteria or comparison to other recall tools is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recall_watch_verify_ledgerBInspect

Verify the hash-chain integrity of a recall record (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so the description carries full behavioral disclosure burden. It mentions return fields (chainValid, brokenAt, etc.) but does not discuss side effects, idempotency, permissions, or whether it is read-only. The return list is helpful but insufficient for full transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence followed by a list of return fields. It is front-loaded and free of fluff. Every part contributes to understanding the tool, though the return field list could be formatted more compactly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given a simple tool (1 param, no output schema, no annotations), the description covers purpose and return values adequately. However, it does not explain 'hash-chain integrity' or how to interpret 'brokenAt', leaving some gaps for a complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The sole parameter 'itemId' has no schema description (0% coverage), and the tool description only implies it identifies a recall record. No format, source, or example is given, so the description adds minimal meaning beyond the parameter name.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Verify the hash-chain integrity of a recall record (tamper detection).' It uses a specific verb ('verify') and resource ('recall record'), and the parenthetical distinguishes it from other verify_ledger tools by specifying the record type.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives (e.g., other verify_ledger tools or recall_watch_get/search). The description does not mention prerequisites, when not to use it, or context for choosing it over siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanctions_screen_by_countryAInspect

Summarize sanction programs and entity counts associated with a country (name or ISO-3166 code) across the consolidated lists. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`country`	Yes	Country name or ISO-3166 alpha-2 code

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations; description mentions read-only and free price, but lacks behavioral details like rate limits or error handling. Adequate for a simple lookup.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with action and resource, no wasted words. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema; description lacks details on return format. Adequate for a summary tool but could specify fields or structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage for the single parameter 'country'; description adds no new semantic value beyond what schema states (country name or ISO code). Baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the verb 'summarize' and resource 'sanction programs and entity counts' for a country. Distinguishes from sibling tools like sanctions_screen_entity by specifying country-level scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied usage via name and description, but no explicit when-to-use or when-not-to-use guidance compared to alternatives like sanctions_screen_entity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanctions_screen_check_addressAInspect

Screen a physical or crypto address string against address entries in the consolidated sanctions lists. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`address`	Yes	Physical or crypto address to screen

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description discloses read-only behavior and price (free), adding value. It does not detail error handling or output format, but for a simple screening tool this is acceptable.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: the first states the core action, the second adds behavioral traits. Every word earns its place; no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is simple with one parameter and no output schema. The description covers the 'what' and behavioral traits but does not mention the form of the result (e.g., list of matches or boolean). Minor gap but largely complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description does not add meaning beyond the schema's parameter description; it merely restates the purpose.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool screens a physical or crypto address against consolidated sanctions lists. The name and description uniquely identify this address-specific screening tool among siblings like sanctions_screen_entity and sanctions_screen_by_country.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for address screening but does not explicitly exclude alternatives or provide when-not-to-use guidance. The sibling context shows other sanctions screening tools, so the intended use case is clear but not fully delineated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanctions_screen_entityAInspect

Screen an entity name against public consolidated sanctions lists (OFAC SDN / UN / EU), fetched at request time. Returns scored fuzzy matches with programs and countries. Informational only, not legal advice. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`name`	Yes	Entity / person / vessel name to screen
`limit`	No	Max matches to return
`types`	No	Restrict to sanction subject types.
`minScore`	No	Minimum match score 0-100 (default heuristic threshold)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It confirms the tool is read-only, free, and fetches data at request time. However, it does not disclose potential side effects (e.g., rate limits, caching behavior), the return format structure, or error handling for invalid inputs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is compact (four sentences) and front-loaded with the core purpose. Every sentence adds value: screening action, data sources, output type, disclaimers, pricing. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description covers return format ('scored fuzzy matches with programs and countries') and data freshness. It lacks detail on the structure of individual matches, which might be inferred from parameter semantics. Overall, sufficient for a straightforward screening tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all four parameters. The description adds minimal extra value beyond the schema—e.g., it notes 'scored fuzzy matches' but does not explain how minScore interacts with that scoring. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Screen') and resource ('entity name against public consolidated sanctions lists'), clearly distinguishing from siblings like sanctions_screen_by_country or sanctions_screen_check_address. It also mentions the output type ('scored fuzzy matches').

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states the tool is 'read-only' and 'informational only, not legal advice,' but does not explicitly guide when to use this tool vs. other sanctions-related tools (e.g., sanctions_screen_by_country, list_programs). Usage context is implied but no alternative tools are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanctions_screen_list_programsAInspect

List sanction programs known across the consolidated lists, with per-program entity counts. Optionally filter by source. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`source`	No	Optional source id filter (e.g. ofac / un / eu)

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Declares read-only and free cost, but lacks details on pagination or rate limits. With no annotations, description covers key behavioral traits adequately.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no waste, front-loading the main purpose and optional filter.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers main functionality and mentions entity counts in output, but no mention of pagination or result limits. Adequate given tool simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema describes the only parameter fully, and the description adds no new semantic information beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists sanction programs with entity counts, distinguishing it from related sanctions tools that focus on entities, addresses, or sources.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied usage for listing programs with counts, but no explicit when-to-use or alternatives among sibling tools like sanctions_screen_list_sources or sanctions_screen_entity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanctions_screen_list_sourcesAInspect

List the sanctions data sources queried (OFAC SDN / UN / EU) with metadata. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description correctly declares read-only and free nature. It adds behavioral context beyond the empty input schema, assuring the agent of no side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with no wasted words. Efficiently conveys purpose, examples, and cost.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description adequately covers the tool's output (list of sources with metadata). No further details needed for this simple query.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist, so schema coverage is 100%. The description adds no parameter details, but that is acceptable as none are needed; baseline 4 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists sanctions data sources (OFAC SDN, UN, EU) with metadata. It uses a specific verb 'List' and distinguishes from sibling screening tools like sanctions_screen_entity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description notes it is read-only and free, implying safe usage. It does not explicitly state when to use vs. alternatives, but purpose is self-evident for retrieving available list sources.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanction_watch_getCInspect

Get a sanction detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are present, so the description must carry the full burden of behavioral disclosure. It only mentions the function and returned fields, but does not address whether the tool modifies data, requires authentication, has rate limits, or any side effects. This is minimal for a get operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, which is concise, but it omits critical information about the parameter and behavioral context. It is under-specified given the tool's role.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, no parameter descriptions, and no annotations, the description is incomplete. It highlights two returned fields but does not describe the full response structure or clarify event timeline details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% (no description for itemId). The description does not explain what itemId represents or how to obtain it. No additional meaning is added beyond the schema structure.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves a sanction detail plus full event timeline, and explicitly mentions two returned fields (firstSeenAt, ledgerVerified). However, it does not distinguish this from sibling tools like sanction_watch_timeline, which might return only the timeline.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like sanction_watch_search or sanction_watch_timeline. The description does not mention prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanction_watch_recent_changesAInspect

Recent appearance / lift events across all sanctions since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`regulator`	No

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description mentions what each item includes (firstSeenAt, ledgerVerified), giving some insight into the output. However, with no annotations, it fails to disclose behaviors like pagination, rate limits, or read-only nature. It is adequate but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first states the action and scope, second adds output details. No wasted words, well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 3 parameters and no output schema, the description is too brief. It omits parameter purposes, output structure beyond two fields, and any usage context. For a tool of this complexity, more detail is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage. The description only implicitly references the 'since' parameter by mentioning ISO8601 timestamp. It does not explain 'limit' or 'regulator', leaving the agent to infer from names. This adds little beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns recent appearance/lift events across all sanctions since a given timestamp. It specifies the resource (sanctions) and the action (listing recent changes), which distinguishes it from sibling tools like sanction_watch_search or sanction_watch_get.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives like sanction_watch_search. The description implies it's for recent changes but does not provide when-not or list alternative tools for context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanction_watch_searchBInspect

Search Japanese administrative sanctions (FSA jirei archive). Each hit includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No	被処分者名・部分一致
`since`	No
`status`	No
`regulator`	No	処分庁 (FSA など)
`sanctionType`	No	業務改善命令 / 業務停止等・部分一致

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description must disclose behavioral traits. It only mentions return fields (firstSeenAt, ledgerVerified) but does not indicate whether the tool is read-only, destructive, or any other behavioral characteristics like authentication needs or data freshness.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences that immediately convey the tool's purpose and key output details. It is front-loaded and contains no extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 6 parameters and no output schema, the description lacks sufficient context about how to construct queries, handle pagination, or interpret results. It only mentions two fields, leaving agents with incomplete guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 50% coverage (descriptions for query, regulator, sanctionType). The description adds no parameter information beyond what the schema provides, nor does it explain how to combine parameters or interpret their values.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches 'Japanese administrative sanctions (FSA jirei archive)' with a specific verb and resource, and it distinguishes from sibling tools like sanction_watch_get and sanction_watch_timeline by focusing on search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool is for searching administrative sanctions but provides no explicit guidance on when to use it versus other siblings (e.g., sanction_watch_get for single records, sanction_watch_timeline for time-ordered events). Usage context is implied but not clarified.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanction_watch_timelineBInspect

Time-ordered events only for a sanction (the differentiator: when it appeared and when it was lifted). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It mentions key fields (firstSeenAt, ledgerVerified) but does not disclose whether the tool is read-only, the number of events returned, pagination, or any behavioral traits beyond the basic purpose. The term 'time-ordered events' is vague.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is one sentence, efficient and front-loaded with the core purpose. However, it could be more structured to separate purpose from field listing. Still, it wastes no words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple inputs (one required param, no output schema), the description is minimally adequate. It explains the tool's focus and key fields but lacks details on event structure, order, or limits, which would be helpful for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter (itemId) has 0% schema description coverage. The description does not explain what itemId represents (e.g., sanction ID), assuming domain knowledge. The schema already defines it as a string without further context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns 'time-ordered events only for a sanction' and highlights the differentiator ('when it appeared and when it was lifted'). It also mentions included fields (firstSeenAt, ledgerVerified), making the tool's unique value distinct from siblings like sanction_watch_get or sanction_watch_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies this tool is for timeline events ('time-ordered events only'), but it does not explicitly state when to use this over alternatives like sanction_watch_get or sanction_watch_recent_changes. No when-not or alternative guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanction_watch_verify_ledgerAInspect

Verify the hash-chain integrity of a sanction record (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes return fields (chainValid, brokenAt, etc.) indicating it checks chain integrity and reports issues. However, no mention of side effects or permissions. Since no annotations exist, description should explicitly state it is read-only.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One sentence with a clear purpose followed by a structured list of return fields. No unnecessary words, front-loaded with action and resource.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose and return values adequately for a simple tool. Minor gaps: no description of itemId, no error conditions or prerequisites. But overall sufficient for basic usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% and description does not explain the sole parameter 'itemId'. It only lists return fields but provides no description of what the parameter represents or its expected format.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it verifies hash-chain integrity of a sanction record for tamper detection, listing return fields. It distinguishes from sibling verify_ledger tools by specifying 'sanction'. Verb and resource are specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. Does not mention prerequisites, when-not to use, or reference other verification tools. Agent must infer context from siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

subsidy_watch_getBInspect

Get a subsidy program detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`programId`	Yes

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, but the description hints at read-only operation and specifies returned fields. Lacks disclosure on permissions or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise front-loaded sentences with no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers basic purpose and return fields, but missing parameter description and output structure detail; adequate for a simple get but incomplete given no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Parameter programId is not described at all, despite 0% schema coverage; description fails to explain its purpose or format.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it retrieves a subsidy program detail and full event timeline, distinguishing it from sibling tools like search, changes, or verify.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like subsidy_watch_timeline or subsidy_watch_search.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

subsidy_watch_recent_changesCInspect

Recent appearance / change / close events across all programs since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`category`	No
`issuerCode`	No

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description bears full burden. It mentions return fields (firstSeenAt, ledgerVerified) and the 'since' parameter, but fails to disclose pagination behavior, ordering, limits, or data freshness. The 'limit' parameter is not explained, and the tool's mutation/destructive potential is unclear.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief: two sentences covering purpose and the 'since' constraint plus return fields. It avoids fluff and is front-loaded. However, it could be slightly improved by succinctly noting optional filters without adding bloat.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 4 parameters, no output schema, and no annotations, the description is notably incomplete. It omits key behavioral details like pagination, ordering, and filter semantics, leaving the agent with insufficient information for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must add meaning. It explains 'since' (ISO8601 format) but ignores 'limit', 'category', and 'issuerCode'. The phrase 'across all programs' contradicts the existence of filter parameters, creating ambiguity. Overall, most parameters remain unexplained.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns 'recent appearance/change/close events across all programs' using a timestamp, which clearly indicates the tool's purpose. However, it does not differentiate from siblings like subsidy_watch_search or subsidy_watch_get; the mention of 'across all programs' implies a broad scope but lacks explicit distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives (e.g., subsidy_watch_get, subsidy_watch_search). The description only implies a broad query use case but offers no explicit when-to-use or when-not-to-use instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

subsidy_watch_searchCInspect

Search the current state of subsidy programs. Each hit includes firstSeenAt and ledgerVerified (hash-chain integrity).

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No
`status`	No
`category`	No
`amountMin`	No
`issuerCode`	No	JIS X 0401/0402 自治体コード

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses that results include firstSeenAt and ledgerVerified, but omits other behavioral traits: it does not state that it is read-only, whether it requires authentication, rate limits, pagination behavior, or any side effects. The transparency is minimal.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise—two sentences with no filler. The first sentence states the core function, the second adds a valuable detail about result fields. However, it sacrifices completeness for brevity, omitting parameter and usage context.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 6 parameters, no output schema, and no annotations, the description is severely incomplete. It does not specify output structure beyond two fields, filtering capabilities, default behaviors, pagination, or any constraints. It fails to address the complexity of the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (only issuerCode has a description), yet the tool description adds no parameter information. It does not explain the purpose of query, status, category, amountMin, or limit. The agent must rely on parameter names alone, which may be insufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches the current state of subsidy programs with a specific verb and resource. It mentions result fields (firstSeenAt, ledgerVerified) which adds specificity. However, it does not explicitly differentiate from sibling tools like subsidy_watch_get or subsidy_watch_recent_changes, though the names imply distinct purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives (e.g., subsidy_watch_get for single records, subsidy_watch_recent_changes for updates). No conditions, prerequisites, or exclusion criteria are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

subsidy_watch_timelineBInspect

Time-ordered events only for a program (the differentiator: when it appeared, changed, closed). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`programId`	Yes

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description bears full burden. Does not disclose read-only/destructive nature, authentication needs, or data freshness. Only mentions included fields (firstSeenAt, ledgerVerified) but no behavioral traits beyond that.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no redundancy, front-loading key information and efficiently using parentheticals.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a one-parameter tool with no output schema, but lacks details on expected return format, pagination, or error handling. Could be more complete given no annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter 'programId' lacks any description in schema or text. The description adds no meaning (e.g., format, example, or how to obtain it), leaving agents without context despite low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides 'time-ordered events only for a program' and specifies the differentiator ('when it appeared, changed, closed'), distinguishing it from sibling watch_timeline tools and other subsidy watch tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives like subsidy_watch_get or subsidy_watch_recent_changes. The 'differentiator' hint is present but not actionable without comparison to siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

subsidy_watch_verify_ledgerBInspect

Verify the hash-chain integrity of a program (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`programId`	Yes

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description bears full burden for behavioral disclosure. It explains the verification process and lists return fields, but does not disclose side effects, authorization needs, or safety profile. The return value details add some transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with a clear action followed by a list of return fields. It is concise and front-loaded, with no redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple verification tool, the description covers purpose and output but lacks usage context, parameter guidance, and alternative tool mentions. It is minimally adequate but missing important details for effective tool selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The sole parameter 'programId' has 0% schema description coverage. The description only mentions 'program' without explaining what constitutes a valid programId or how to obtain it. It adds minimal meaning beyond the schema's type declaration.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool verifies hash-chain integrity for tamper detection of a program, specifying the return fields. This is a specific verb-resource combination that distinguishes it from sibling tools by domain (subsidy watch).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like subsidy_watch_get or other verify_ledger tools. The description does not mention context, prerequisites, or usage scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

temporal_queryAInspect

Reconstruct what a ledger item (or items matching a query) officially said AS OF a past date T, by replaying the kept event history. Returns each item's point-in-time state plus an F-037 provenance receipt (observed_at = the state's effective time). existence:false with a first_seen receipt when the item did not yet exist at T; latest state with as_of_clamped:false when T is in the future. Read-only; price 0.0.

ParametersJSON Schema

Name	Required	Description
`as_of`	Yes	ISO-8601 point in time T.
`query`	No	Title substring to reconstruct every matching item (capped at 25). Use instead of item_id.
`ledger`	Yes	Ledger key (e.g. 'sanction', 'license', 'subsidy', 'recall', 'pharma', 'bid', 'grant', 'pubcom', 'ordinance', 'tos', 'realestate', 'landprice').
`item_id`	No	Stable item id (single-item reconstruction).
`jurisdiction`	No	Jurisdiction code (default 'jp').

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully covers behavioral traits: read-only, price 0.0, return format including provenance receipt, handling of future dates (as_of_clamped:false) and non-existence (existence:false). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (3-4 sentences) and front-loaded with the main purpose. Every sentence adds necessary detail without waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, so description must explain return values. It clearly describes the state, receipt, and edge cases. This makes it complete for an agent to understand tool behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% so baseline is 3. The description adds value by explaining capping on query results (25), interaction between query and item_id, and the effect of as_of on receipts. This extra context justifies 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool reconstructs the state of ledger items at a past date, using specific verbs like 'reconstruct' and 'replay'. It distinguishes itself from siblings (no other temporal query tool) and is specific about the resource (ledger items) and action (point-in-time reconstruction).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use the tool (for historical reconstruction) and includes edge cases (future T, non-existence). While it doesn't explicitly exclude other tools, the unique functionality of temporal query makes alternatives unnecessary in context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tos_watch_getCInspect

Get a ToS snapshot detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided. The description only mentions return fields (firstSeenAt and ledgerVerified) but does not disclose read-only behavior, authentication needs, or any side effects. For read operations this is a moderate gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very short (two sentences) and front-loaded with purpose, but it omits critical information about parameters and usage. It is concise but not sufficiently informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, no annotations, and a single undocumented parameter, the description is severely incomplete. It fails to explain what a ToS snapshot is, what the event timeline contains, or how to interpret the return fields.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has one required parameter 'itemId' with no description in schema (0% coverage). The tool description does not explain what itemId represents (e.g., the ID of the ToS snapshot), leaving the agent without necessary semantic context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'Get' and identifies the resource as 'ToS snapshot detail plus full event timeline', and mentions return fields. It distinguishes from sibling watch tools by specifying it returns both snapshot detail and timeline, though it could be more explicit about differentiation from tos_watch_timeline.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like tos_watch_search or tos_watch_timeline. The description does not state prerequisites or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tos_watch_recent_changesCInspect

Recent revised events across all SaaS ToS documents since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`vendor`	No

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided; description partially discloses output fields (firstSeenAt, ledgerVerified) but omits behavioral traits like read-only nature, side effects, or rate limits. Minimal transparency beyond schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states purpose concisely, second describes output fields. No redundant information, efficiently front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 3 parameters, no output schema, and no annotations, the description is too brief. It omits parameter details, pagination hints, and clarification of output fields, leaving gaps for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. The description only implicitly references the 'since' parameter but does not explain 'limit' or 'vendor'. No additional meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool retrieves recent revised events across SaaS ToS documents since a timestamp, which is clear and distinguishes from sibling tools like tos_watch_get or tos_watch_search. However, it lacks an explicit verb (e.g., 'retrieve'), relying on context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives (e.g., tos_watch_search, tos_watch_timeline). The description does not mention when-not or preferability conditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tos_watch_searchCInspect

Search Japanese / English-language SaaS Terms of Service snapshots (Stripe / Anthropic / AWS / Google Cloud / GitHub …). Stage 1 covers 'terms' docs. Each hit includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No	タイトル / 本文先頭抜粋・部分一致
`vendor`	No	'stripe' / 'anthropic' / 'aws' / 'gcp' / 'github' …
`docType`	No

Tool Definition Quality

C2.5/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description claims 'Stage 1 covers terms docs' but input schema includes docType enum with privacy, pricing, sla, creating a contradiction. No annotations to clarify safety or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loads purpose. Efficient but could structure parameter info.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks clarity on the docType limitation contradiction, query language, and output format. Without output schema, more detail on return fields would help.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Description does not add parameter details beyond the schema. With 50% schema coverage, it fails to compensate for missing descriptions on limit and docType.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it searches SaaS Terms of Service snapshots with examples. Mentions 'Stage 1 covers terms docs', indicating scope. However, it doesn't explicitly distinguish from sibling tools like tos_watch_get.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool instead of siblings like tos_watch_get or tos_watch_recent_changes. No when-not-to-use or alternative recommendations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tos_watch_timelineAInspect

Time-ordered events only for a ToS document (the differentiator: when it appeared and each revision since). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It mentions 'time-ordered' and 'includes firstSeenAt and ledgerVerified', implying a read operation, but does not explicitly state read-only behavior or side effects. The behavioral traits are partially disclosed but not comprehensively.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences that front-load the core purpose and include key differentiators. Every word adds value without unnecessary detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description partially describes return fields (firstSeenAt, ledgerVerified) but lacks full structure (e.g., event types, timestamps). The tool's complexity is low (one parameter), so completeness is adequate but not thorough.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% (no description for 'itemId'), and the tool description does not explain the parameter beyond context ('ToS document'). No format or usage hints are given, so the description fails to compensate for the lack of schema documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly defines the tool's purpose: retrieving time-ordered events for a ToS document, specifically focusing on firstSeenAt and ledgerVerified. This distinguishes it from sibling tools like tos_watch_get (current state) and tos_watch_search (search). The verb 'timeline' implies retrieval, and 'differentiator' adds specificity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for timeline queries ('Time-ordered events only') but does not explicitly state when to choose this over siblings like tos_watch_get or tos_watch_search. No exclusions or alternatives are mentioned, leaving the agent to infer context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tos_watch_verify_ledgerBInspect

Verify the hash-chain integrity of a ToS document (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description bears full burden. It indicates a read-only verification (tamper detection) and lists return fields, but does not disclose safety traits (e.g., no side effects, idempotency) or required permissions. It does not contradict any annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise (one sentence plus return fields) and action-focused. However, it omits parameter information, which is a minor structural flaw. Still, it earns its brevity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple one-parameter tool with no annotations and no output schema, the description should at least define the parameter and clarify prerequisites (e.g., existing watch). It partially covers return values but lacks contextual details for complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description fails to explain the sole required parameter 'itemId'. The parameter's purpose and expected format are not mentioned, leaving the agent without guidance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool's function: verifying hash-chain integrity (tamper detection) for a ToS document. It lists specific return fields and distinguishes itself from sibling verify_ledger tools by specifying 'ToS document'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Usage context is implied (use when verifying a ToS document's ledger integrity), but no explicit when-to-use, when-not-to-use, or alternatives are given. Prerequisites like needing an existing watch or the meaning of itemId are not addressed.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

verify_receiptAInspect

Verify a provenance receipt (F-037): recomputes the HMAC signature, checks the intra-receipt chain linkage, and — when the receipt carries an external anchor reference — confirms it against the published F-028 anchor. Returns {valid:true, anchor:"verified"|"pending"|"unverified"} or {valid:false, reason:"signature_mismatch"|"chain_link_broken"|"anchor_mismatch"|"malformed"}. Tamper-evident; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`receipt`	Yes	A receipt JSON object as produced by any provenance-emitting tool.

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses the verification steps (recompute HMAC, check chain linkage, confirm anchor) and possible return values. The mention 'tamper-evident' and 'price 0.0 (free)' adds context. However, it does not explicitly state read-only behavior or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that front-loads the core purpose. It is concise with no redundant information, and every clause adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a verification tool with one parameter and no output schema, the description provides sufficient detail on the verification process, possible return reasons, and pricing. It lacks error handling details beyond the listed reasons but is otherwise complete for an agent to use effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% for the single parameter 'receipt', which is described as a JSON object. The tool description does not add additional semantic meaning beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool verifies a provenance receipt (F-037) by recomputing HMAC, checking chain linkage, and optionally confirming anchor. It specifies the return format and distinguishes from siblings by referencing specific standards and return values.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide any guidance on when to use this tool versus alternatives like agent_audit_query or content_authenticity_provenance_check. No usage context, prerequisites, or exclusions are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

weather_risk_air_qualityAInspect

Air-quality snapshot (PM2.5, PM10, US / European AQI and a coarse category) for a lat/lon via the free Open-Meteo air-quality API. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`lat`	Yes	Latitude (-90..90)
`lon`	Yes	Longitude (-180..180)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses read-only and free nature, but with no annotations, it omits details like API key requirements, error handling, output format, or data recency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences front-loading purpose and key traits (read-only, free), no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers core data fields and source, but lacks details on response structure, error behavior, or rate limits; acceptable for a simple read tool without output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema covers both lat/lon with descriptions; the description adds 'for a lat/lon' but no extra meaning beyond existing schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides an 'Air-quality snapshot' with specific metrics (PM2.5, PM10, US/European AQI, coarse category) for a lat/lon, identifying it as a read-only data retrieval tool distinct from sibling weather tools like forecast or heat index.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this over alternatives like weather_risk_current_weather; it implies air quality monitoring but lacks when-not-to-use or alternative comparisons.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

weather_risk_current_weatherAInspect

Current weather (temperature, apparent temperature, humidity, precipitation, wind, weather code) for a lat/lon via the free Open-Meteo API. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`lat`	Yes	Latitude (-90..90)
`lon`	Yes	Longitude (-180..180)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses that the tool is read-only and uses the free Open-Meteo API, but does not cover rate limits, error handling, or return format. This is adequate but lacks depth.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that front-loads the key purpose and includes all essential information: what data is returned, input format, source, and cost. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 parameters, no output schema, no annotations), the description is fairly complete. It covers input, output variables, source, and cost. Minor gaps include error handling or return structure, but overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% (both lat and lon have descriptions). The tool description does not add further meaning beyond the schema; it lists return variables but not parameter details. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns current weather data (temperature, apparent temperature, humidity, precipitation, wind, weather code) for a given lat/lon. It uses a specific verb ('current weather') and resource, and its name 'weather_risk_current_weather' distinguishes it from siblings like 'weather_risk_forecast' and 'weather_risk_air_quality'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for current weather but does not explicitly state when to use this tool versus siblings (e.g., forecast, heat index). It mentions read-only and free, but lacks explicit guidance on context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

weather_risk_forecastAInspect

Daily weather forecast (temp max/min, precipitation probability, wind, weather code) for up to N days for a lat/lon via the free Open-Meteo API. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`lat`	Yes	Latitude (-90..90)
`lon`	Yes	Longitude (-180..180)
`days`	No	Forecast horizon in days (bounded)

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description mentions read-only and free nature, but lacks details on rate limits, pagination, or error handling. Without annotations, the description carries full burden and provides only basic behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with key information: what tool does, data included, source, and cost. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers main points: data fields, source, read-only, free. Does not specify default for optional days parameter or error handling, but adequate for a simple forecast tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds little beyond stating 'for up to N days' which reinforces the days parameter. No additional semantics for lat/lon or days beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool provides a daily weather forecast for a lat/lon, listing specific data fields (temp max/min, precipitation probability, wind, weather code). This distinguishes it from siblings like weather_risk_current_weather or weather_risk_heat_index.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies usage for daily forecasts via Open-Meteo API and notes it's read-only and free. However, it does not explicitly contrast with siblings or state when to use alternatives, though the context is clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

weather_risk_heat_indexAInspect

Compute the heat index (feels-like temperature) and risk category from air temperature (Celsius) and relative humidity. Pure compute; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`tempC`	Yes	Air temperature in degrees Celsius
`humidityPct`	Yes	Relative humidity percent (0-100)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided; description says 'Pure compute' and 'free' implying no side effects, but lacks details on rate limits, authentication needs, or exact behavior (e.g., return format).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with essential info, no filler words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple compute tool with complete schema and no output schema, description adequately explains inputs and scope. Could mention output units or risk categories, but still clear.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters with descriptions; description adds no further meaning, so baseline 3 as per guidelines.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

States exactly what it computes (heat index and risk category) and from which inputs (temperature, humidity). Distinguishes from sibling tools like weather_risk_air_quality or weather_risk_current_weather.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Mentions it's free and pure compute but gives no guidance on when to use this vs alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

weather_risk_severe_flagsAInspect

Severe-weather flags (heavy rain / high wind / extreme heat / frost) derived from the Open-Meteo forecast, each with its threshold and worst value. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`lat`	Yes	Latitude (-90..90)
`lon`	Yes	Longitude (-180..180)

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Since no annotations are provided, the description carries full responsibility. It discloses the tool is read-only, free, and specifies the output includes flags with thresholds and worst values. This is sufficient for a simple data retrieval tool, though details on rate limits or error handling are absent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at two sentences, front-loading the core purpose and following with behavioral traits. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (two required params, no output schema, no annotations), the description covers the essential aspects: purpose, input (implicit), output (flags with thresholds), and constraints (read-only, free). It is complete enough for an agent to understand and invoke correctly, though it could hint at error scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for lat and lon (range -90..90 and -180..180). The description adds no additional parameter meaning, so it meets the baseline but does not exceed it.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states it provides severe-weather flags (heavy rain, high wind, extreme heat, frost) derived from Open-Meteo forecasts, with each flag's threshold and worst value. This clearly distinguishes it from sibling weather tools like current_weather or forecast.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions it is read-only and free, implying usage for obtaining severe weather data without cost. However, it lacks explicit guidance on when to use this tool over alternatives (e.g., forecast for general weather) or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Resources

Need Help?