Japan Public Ledgers MCP

by com.mcp-revenue-empire

Server Details

Tamper-evident daily ledgers of 12 Japanese public-data domains, with cross-ledger entity search.

Status: Healthy
Last Tested: 2026-07-30 18:37
Transport: Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

C2.9/5.0

Tool DescriptionsC

Average 3.4/5 across 136 of 147 tools scored. Lowest: 2.1/5.

Server CoherenceB

Disambiguation4/5

Tools are grouped by domain prefix (e.g., bid_watch_, carbon_, agent_), making them clearly distinguishable. Overlap exists between entity_search and individual ledger searches, but descriptions clarify their distinct purposes. With 147 tools, some confusion is possible, but the naming convention mostly keeps them separate.

Naming Consistency3/5

Within each domain group (e.g., all 'watch' tools follow verb_search/get/recent_changes/timeline/verify_ledger), naming is consistent. However, across groups, patterns vary: some use verb_noun (alerts_create_watch), others noun_verb (agent_audit_query). The diverse prefixes (agent_, carbon_, company_registry_) lack a unifying scheme.

Tool Count2/5

147 tools is excessive for a server titled 'Japan Public Ledgers'. Many tools cover unrelated domains (weather, carbon estimates, commerce, generic agent utilities), diluting focus. A typical well-scoped server has 5-20 tools; this server far exceeds that, making it unwieldy for agents to navigate.

Completeness3/5

The core watch-ledger tools (bid, grant, landprice, license, etc.) have a consistent set of CRUD-like operations (search, get, recent_changes, timeline, verify_ledger), covering monitoring needs. However, many utility tools (carbon estimates, weather, price oracle) are tangential to the server's stated purpose, and there is no write mechanism for ledgers beyond agent identity records.

Available Tools

147 tools

agent_audit_queryCInspect

Query agent actions with filters

ParametersJSON Schema

Name	Required	Description	Default
`to`	No
`from`	No
`limit`	No
`agentId`	No
`riskMin`	No
`sessionId`	No
`actionType`	No

Tool Definition Quality

C2.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It only implies read-only via 'Query' but does not disclose behavioral traits like auth requirements, rate limits, or whether the operation is destructive. Minimal behavioral info.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, but it underspecifies the tool. For a tool with 7 parameters and no schema descriptions, it is too concise; every sentence should provide value, and here it does not.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and no annotations, the description should explain what the tool returns, but it only mentions 'query agent actions' without describing result format, available filters, or any other contextual information. Completely inadequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, yet the description adds no meaning beyond 'with filters'. The 7 parameters (to, from, limit, agentId, riskMin, sessionId, actionType) are not explained; the agent cannot infer their formats or expected values from the description alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Query' and the resource 'agent actions', and mentions 'filters', indicating a filtering capability. However, it does not differentiate from sibling tools like agent_audit_record or agent_audit_report, lacking specificity for precise selection.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives such as agent_audit_record (likely for a single action) or agent_audit_report (likely aggregated data). The description does not specify prerequisites or context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_audit_recordCInspect

Record an agent action for audit and compliance

ParametersJSON Schema

Name	Required	Description	Default
`input`	No
`output`	No
`agentId`	Yes
`metadata`	No
`sessionId`	No
`actionName`	Yes
`actionType`	Yes

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It only says 'Record an agent action', implying mutation, but does not disclose whether it overwrites, idempotency, auth requirements, rate limits, or other behavioral traits. Minimal disclosure for a write operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single short sentence, which is concise but too brief given the complexity (7 parameters, no schema descriptions). It could include essential parameter guidance without being verbose; currently under-specified.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 7 parameters, 0% schema coverage, no output schema, and no annotations, the description is fundamentally incomplete. It does not tell the agent what inputs are needed, how to structure the request, or what the response contains. Fails to provide a usable tool specification.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description provides zero information about parameters. It does not explain what 'agentId', 'actionType', 'actionName', etc., mean or how to use them. The tool has 7 parameters, all left undocumented.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states 'Record an agent action for audit and compliance', which specifies a clear verb ('Record'), resource ('agent action'), and purpose. It distinguishes from siblings like 'agent_audit_query' (querying) and 'agent_audit_report' (reporting).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives; no mention of prerequisites, exclusions, or conditions. The description is too sparse to help an agent decide when to invoke it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_audit_reportCInspect

Generate audit report (json/markdown/soc2 format)

ParametersJSON Schema

Name	Required	Default
`to`	Yes
`from`	Yes
`format`	No	markdown
`agentId`	Yes

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided. Description does not disclose side effects, authorization needs, rate limits, or whether the report is generated on-the-fly or stored. Lacks behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, straightforward and front-loaded. No wasted words, but lacks necessary detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, 4 parameters at 0% schema coverage, and minimal description, the tool definition is incomplete. Missing details on required parameters, report scope, and return format.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. Description only hints at the 'format' parameter (json/markdown/soc2), but does not explain 'agentId', 'from', or 'to' meanings. Leaves semantic gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates an audit report, with verb 'generate' and resource 'audit report', and lists supported output formats. It distinguishes from siblings like agent_audit_query and agent_audit_record by focusing on report generation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives (e.g., agent_audit_query). No prerequisites or use cases mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_captcha_solveCInspect

Solve a CAPTCHA for a domain you own or have explicit permission to access

ParametersJSON Schema

Name	Required	Description	Default
`type`	Yes
`action`	No
`pageUrl`	Yes
`siteKey`	No
`question`	No
`imageBase64`	No

Tool Definition Quality

C2.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description must disclose behavioral traits. It only says 'Solve a CAPTCHA' without detailing how the solving works, what it returns, or any side effects. This is insufficient for a complex tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no wasted words, but it is overly brief given the tool's complexity. It sacrifices necessary detail for conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, no parameter descriptions, and a vague description, the tool is severely incomplete. The agent cannot determine return format, error handling, or how to use different CAPTCHA types.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It fails to explain any of the 6 parameters (type, action, pageUrl, etc.), leaving the agent without guidance on how to construct valid inputs.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Solve a CAPTCHA' and the resource 'domain you own', distinguishing it from sibling tools like agent_captcha_verify_domain which likely verifies ownership rather than solving.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions 'for a domain you own or have explicit permission to access', providing some usage context but no explicit when-not-to-use or alternatives. It implies a prerequisite with the sibling agent_captcha_verify_domain but does not specify.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_captcha_verify_domainCInspect

Verify ownership of a domain before using CAPTCHA solving

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes
`method`	Yes

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided. The description does not disclose any behavioral traits such as consequences of verification, required authentication, rate limits, or what 'ownership' entails. It merely states the action without elaboration.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single short sentence, which is concise but at the expense of providing necessary detail. It front-loads the purpose but lacks structure or any additional sections.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of annotations and output schema, and only 2 parameters with 0% schema coverage, the description is insufficient. It does not explain what verification means, what the output looks like, or any side effects, leaving significant gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description adds no meaning to the two required parameters ('domain' and 'method'). The enum values for method are not explained, leaving the agent to infer their semantics from the enum names alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Verify ownership') and resource ('domain'), with added context 'before using CAPTCHA solving' that hints at a prerequisite. It distinguishes from sibling 'agent_captcha_solve' but lacks explicit differentiation from other verification tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The phrase 'before using CAPTCHA solving' implies a usage context, but there is no explicit guidance on when to use this tool versus alternatives, nor any exclusions or prerequisites beyond the implied sequence.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_identity_activitiesCInspect

List activity records for an identity, newest first (owner only)

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`identityId`	Yes

Tool Definition Quality

C2.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided. The description only mentions the listing action and ordering, but does not disclose other traits such as rate limits, pagination behavior, error handling, or authentication requirements beyond the owner-only hint.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence of 11 words, very concise. It front-loads the verb and resource. However, it sacrifices detail for brevity, leaving parameter descriptions unaddressed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite the tool's simplicity (2 params, no output schema), the description lacks necessary details like parameter explanations and behavior for edge cases. It is minimally complete but insufficient for an agent to use reliably without additional context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. The description indirectly references identityId by mentioning 'for an identity', but does not explain either parameter (identityId or limit) with any additional meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'List', the resource 'activity records for an identity', and includes ordering ('newest first') and access restriction ('owner only'). This distinguishes it from siblings like agent_identity_record or agent_audit_query.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a context clue ('owner only') but does not explicitly state when to use this tool versus alternatives like agent_audit_query, nor does it give any explicit when-to-use or when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_identity_badgeBInspect

Get the issuer-signed badge and signed fields for an identity

ParametersJSON Schema

Name	Required	Description	Default
`identityId`	Yes

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations exist, so the description carries full burden. It implies read-only via 'Get' but omits details on permissions, error scenarios, or what the badge contains. Minimal disclosure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence with no extraneous words. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple getter with one parameter and no output schema, the description is minimally adequate but lacks context about the badge structure or expected return value, which could help agents select and use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description does not elaborate on the single parameter 'identityId'. No format, validation, or examples are provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get' and the specific resource 'issuer-signed badge and signed fields', distinguishing it from sibling tools like agent_identity_lookup or agent_identity_register.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives, nor any prerequisites or exclusions. The agent must infer usage from the tool name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_identity_lookupBInspect

Look up an identity. Returns signatureValid (issuer+integrity only, NOT an authenticity/safety signal) and a disclaimer.

ParametersJSON Schema

Name	Required	Description	Default
`identityId`	Yes

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description includes a useful caveat about signatureValid being only issuer+integrity, not an authenticity/safety signal. However, without annotations, it lacks disclosure of read-only nature or potential side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise: two sentences, front-loaded with the verb, and no superfluous words. Efficiently communicates the core purpose and a key behavioral note.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple one-parameter tool with no output schema, the description covers the action, return fields, and a caveat. However, the parameter explanation is missing, and the overall completeness is adequate but not comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter 'identityId' is not described beyond its name. With 0% schema description coverage, the description should explain the expected format or type, but it does not.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Look up an identity') and the return fields (signatureValid and disclaimer). It is specific but does not explicitly distinguish from sibling tools like agent_identity_record or agent_identity_badge.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives (e.g., agent_identity_activities or agent_identity_record). No exclusions or context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_identity_recordCInspect

Append a hash-chained activity record (owner only). Optional provenance (repo/version/config) is self-reported.

ParametersJSON Schema

Name	Required	Description
`content`	No
`identityId`	Yes
`provenance`	No	Self-reported origin of the activity (NOT verified)
`activityType`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It mentions 'hash-chained' implying immutability and ordering, but does not disclose error conditions, concurrency behavior, or whether it overwrites previous records. The 'self-reported' note for provenance adds a small behavioral hint, but overall transparency is low.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two short sentences. However, this brevity sacrifices essential information, making it under-specified rather than efficiently compact. It front-loads the main action but omits needed details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 4 parameters (2 required), nested objects, and no output schema. The description does not explain the return value, the nature of 'hash-chained' from the caller's perspective, or prerequisites. It fails to provide a complete picture for safe and correct usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25%. The description clarifies the provenance parameter as 'self-reported' and 'NOT verified', but leaves identityId and activityType unexplained. content is completely unspecified. This adds some value but fails to document the majority of parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it appends a hash-chained activity record, which is a specific operation on a resource. However, 'owner only' is ambiguous and does not clearly distinguish it from sibling tools like agent_identity_register or agent_identity_activities. The purpose is somewhat clear but lacks differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The mention of 'owner only' hints at access restrictions but doesn't clarify context for use or exclude any scenarios. Sibling tools are not mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_identity_registerAInspect

Register an agent and get a unique identity ID + issuer-signed badge. agent_name/metadata are self-reported and unverified.

ParametersJSON Schema

Name	Required	Description	Default
`metadata`	No
`agentName`	Yes
`publicKey`	No

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It discloses that agent_name and metadata are self-reported and unverified, and indicates outputs. However, it does not mention whether the operation is destructive, requires authentication, or has side effects, leaving significant behavioral gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the main purpose, and contains no extraneous information. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (3 params, nested objects, no output schema), the description provides the core purpose and a caveat but lacks parameter details for publicKey, output format, and behavioral specifics. It is adequate but not fully comprehensive for an AI agent to use without additional context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It adds meaning for agentName and metadata (self-reported) but omits any mention of publicKey. The description partially covers 2 of 3 parameters, leaving a key parameter undocumented.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'register' and the resource 'agent', and specifies the outputs: 'unique identity ID + issuer-signed badge'. The caveat about self-reporting distinguishes it from verification tools like agent_identity_lookup.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (for registering a new agent) but does not explicitly state when not to use or mention alternatives. The caveat about self-reporting provides some guidance but not sufficient for clear differentiation from sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_memory_deleteCInspect

Delete a memory or all memories in a namespace

ParametersJSON Schema

Name	Required	Description	Default
`key`	No
`agentId`	Yes
`namespace`	No

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, and the description does not disclose critical behavioral traits such as irreversibility, authentication requirements, or what constitutes 'all memories' deletion. For a destructive action, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no unnecessary words, achieving conciseness. However, it could benefit from a slightly more structured format, e.g., indicating the effect of omitted vs. provided parameters.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and no annotations, the description is too sparse. The agent lacks information about return values, error conditions, or side effects, making it incomplete for a delete operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, and the description does not explain parameter meaning beyond their names. For instance, it doesn't specify whether 'key' is optional or how namespace interacts with key. This leaves ambiguity for the agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool deletes memories, optionally all in a namespace, distinguishing it from get/search/store siblings. However, it could be more precise about the deletion behavior (e.g., key vs. namespace conditions).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs. alternatives like agent_memory_store or agent_memory_get. The description lacks context about typical use cases or when to prefer this over other memory tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_memory_getCInspect

Retrieve a stored memory by key

ParametersJSON Schema

Name	Required	Default
`key`	Yes
`agentId`	Yes
`namespace`	No	default

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations exist, so the description must carry the burden. It only implies a read operation via 'retrieve' but provides no details on side effects, permissions, rate limits, or behavior on missing keys. For a tool with zero annotations, this is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with action and key information. Very concise, though it could benefit from more detail without harming conciseness. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 3 parameters (2 required), no output schema, and no annotations, the description is woefully incomplete. It fails to specify what a 'memory' is, how keys work, the role of namespace, or what the return value looks like. For a tool that likely returns stored data, this is a significant gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0% and the description does not explain any parameter meaning. AgentId, key, and namespace are not elaborated. The description adds no semantic value over the raw schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Retrieve a stored memory by key' clearly states the verb (retrieve), resource (stored memory), and method (by key). It distinguishes from sibling tools like agent_memory_store (write) and agent_memory_delete (remove) which have different purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like agent_memory_search. No when-to-use or when-not-to-use context provided. The description simply states the action without usage scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_memory_searchCInspect

Search memories by prefix, tags, or type

ParametersJSON Schema

Name	Required	Default
`tags`	No
`type`	No
`limit`	No
`agentId`	Yes
`keyPrefix`	No
`namespace`	No	default

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description must disclose behavioral traits. It only lists search criteria, missing details on idempotency, pagination, or whether results are ordered. Insufficient for a search tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with no redundancy. Could be expanded with parameter details without losing conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 6 parameters, no output schema, and no annotations, the description is too minimal. Lacks return format, pagination details, and parameter usage guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage. Description mentions 'prefix, tags, or type', mapping partially to keyPrefix, tags, type. It does not explain agentId, limit, or namespace. Fails to compensate for low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Search' and the resource 'memories' with specific criteria (prefix, tags, type). It distinguishes from sibling tools like agent_memory_get (single retrieval) and agent_memory_store.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives (e.g., agent_memory_get for single memory). No context on prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_memory_storeBInspect

Store a memory for an AI agent (key-value, with TTL and metadata)

ParametersJSON Schema

Name	Required	Description	Default
`key`	Yes
`value`	Yes	Any JSON value
`agentId`	Yes	Agent identifier
`metadata`	No
`namespace`	No		default
`ttlSeconds`	No

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description bears full responsibility. It mentions key-value nature and TTL but does not disclose important behaviors like overwrite semantics, idempotency, authorization needs, or what happens on TTL expiry. The behavioral disclosure is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no wasted words. It is front-loaded with the core purpose. However, given the tool's complexity, a bit more detail would be helpful, but it remains concise and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 6 parameters, 3 required, nested objects (metadata), no output schema, and no annotations, the description is too brief. It fails to explain expected responses, key uniqueness constraints, or how metadata is used. The agent lacks sufficient context to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 33% (only 2 of 6 parameters have descriptions). The description adds context by naming key-value, TTL, and metadata, which maps to parameters key, value, ttlSeconds, metadata. However, it does not explain namespace or the fact that agentId is required. Partial compensation for low coverage, but not fully adequate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Store a memory'), the resource ('for an AI agent'), and the format ('key-value, with TTL and metadata'). This is specific and distinguishes it from sibling tools like agent_memory_get or agent_memory_delete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. It doesn't mention that this tool is for storing new memories, while retrieval should use agent_memory_get, etc. No context is provided for appropriate usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_proxy_fetchBInspect

Fetch a URL via a rotating proxy (region/type selectable). robots.txt enforced.

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes
`body`	No
`type`	No
`method`	No
`region`	No
`headers`	No
`sessionId`	No

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses robots.txt enforcement and proxy rotation behavior, but with no annotations, the description could provide more detail on error handling, redirects, or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise, two sentences, front-loaded with the core action. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 7 parameters including nested objects and no output schema, the description is far too sparse. It does not explain how to use body, headers, method, sessionId, or what the response contains.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 0% description coverage, and the description only hints at 'region' and 'type' but adds no concrete meaning for parameters like body, headers, method, or sessionId.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it fetches a URL via a rotating proxy with selectable region and type, and mentions robots.txt enforcement. This distinguishes it from sibling tools like agent_proxy_session or agent_captcha_solve.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. Does not mention prerequisites, alternatives, or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_proxy_sessionCInspect

Create a sticky proxy session (same IP for multiple requests)

ParametersJSON Schema

Name	Required	Description	Default
`type`	No
`region`	No
`ttlSeconds`	No

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must fully disclose behavior. It only mentions 'sticky' and 'same IP', but omits details about session lifecycle, persistence after TTL, rate limits, or how the session is referenced in subsequent requests. This leaves significant behavioral uncertainty.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence, which is front-loaded. However, it sacrifices necessary detail to maintain brevity. It could be slightly expanded without losing conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has three optional parameters, no output schema, and many related sibling tools, the description fails to provide enough context for correct usage. Agents are left unsure about how to specify parameters, what the return value is, and how session persistence works.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, meaning no parameter descriptions exist. The description adds no information about the enum type values, the region format, or the TTL meaning. The agent must infer from parameter names alone, which is insufficient for correct invocation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a sticky proxy session that maintains the same IP for multiple requests. This specific verb-resource combination distinguishes it from sibling tools like agent_proxy_fetch, which likely fetches a single proxy.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide any guidance on when to use this tool versus alternatives such as agent_proxy_fetch. There is no mention of context, prerequisites, or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_tempmail_createCInspect

Create a temporary email address (auto-expires)

ParametersJSON Schema

Name	Required	Description	Default
`ttlSeconds`	No
`preferredPrefix`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description should disclose behavioral traits; it mentions 'auto-expires' which is a key behavior but omits details like what happens after expiration, whether addresses are unique, or how the TTL parameter works.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, very concise. While it is efficiently short, it sacrifices important details for brevity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, 0% param coverage, and a creation tool that needs clarity on inputs and returns, the description is severely incomplete. The agent cannot infer what the tool returns or how to use the parameters correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0% and the description does not explain the two parameters (ttlSeconds, preferredPrefix). The term 'auto-expires' weakly hints at ttlSeconds but provides no meaningful semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Create' and the resource 'temporary email address', and adds behavioral note 'auto-expires'. Among siblings like get/list/wait, this creation tool is distinctly identified.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives (e.g., when to create vs get or list). The description lacks context about prerequisites or scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_tempmail_getCInspect

Get full message content with extracted verification links/codes

ParametersJSON Schema

Name	Required	Description	Default
`mailboxId`	Yes
`messageId`	Yes
`includeRaw`	No

Tool Definition Quality

C2.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description bears full responsibility. It only states it gets content and extracts codes, but does not disclose whether the operation is read-only, has rate limits, requires authentication, or what side effects occur. A 2 is appropriate as minimal disclosure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single concise sentence that is front-loaded with purpose. No fluff, but it could benefit from a second sentence about parameters or usage. Still efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is provided, and the description does not explain the return format or what 'extracted verification links/codes' means. For a tool with 3 parameters and no output schema, the description is incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, meaning the schema parameters have no descriptions. The tool description does not mention any parameters or add meaning to mailboxId, messageId, or includeRaw. With 3 undocumented parameters, the description fails to compensate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Get full message content with extracted verification links/codes' clearly states the verb (get), resource (full message content), and specialized output (verification links/codes). It distinguishes itself from siblings like agent_tempmail_list (which lists messages) and agent_tempmail_create (which creates mailbox).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives or prerequisites. It does not mention that the user must first have a mailboxId and messageId from list/create tools, nor does it explain when to use this vs agent_tempmail_wait.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_tempmail_listCInspect

List received messages in a mailbox

ParametersJSON Schema

Name	Required	Description	Default
`after`	No
`limit`	No
`mailboxId`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are present, so the description must disclose behavior. It does not mention whether messages are consumed, pagination, or ordering. The description is too sparse.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single short sentence. While concise, it omits essential information, making it insufficiently informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Without an output schema, the description should at least mention that it returns a list of messages. It fails to do so, leaving the agent without key context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description does not explain any parameters (after, limit, mailboxId). The agent has no help understanding what these parameters mean.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'List received messages in a mailbox' clearly states the verb and resource. It distinguishes from sibling tools like agent_tempmail_create and agent_tempmail_get. However, it could be more specific about scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like agent_tempmail_get or agent_tempmail_wait. No prerequisites or context provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_tempmail_waitCInspect

Wait for an incoming message (long polling, max 60s)

ParametersJSON Schema

Name	Required	Description	Default
`mailboxId`	Yes
`fromContains`	No
`timeoutSeconds`	No
`subjectContains`	No

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description mentions 'long polling' and a maximum duration of 60 seconds, which hints at behavior but is insufficient. With no annotations, the description should disclose that this is a blocking operation, what happens on timeout or when a message arrives, and any side effects. The current text is too vague.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise (one sentence), which is good for readability. However, it lacks structure and essential details. It could be expanded slightly while remaining concise to include behavior and parameter contexts.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters and no output schema or annotations, the description is incomplete. It does not explain filtering behavior (fromContains, subjectContains), timeout semantics, return format, or error handling. A more comprehensive description is needed for effective agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameter descriptions are provided in the description (0% schema coverage). While the schema shows parameters like mailboxId, fromContains, subjectContains, and timeoutSeconds, the description adds no additional meaning. The mention of 'max 60s' contradicts the default timeoutSeconds of 30, causing confusion.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'wait for an incoming message' with long polling and a maximum timeout of 60 seconds. It uses a specific verb 'wait' and resource 'incoming message'. While it does not explicitly distinguish from sibling tools like agent_tempmail_get, the action of waiting is distinct from creating, getting, or listing messages.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. There is no mention of prerequisites, alternatives, or conditions for use. The description simply states what the tool does without any context about appropriate scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_trust_batchBInspect

Get trust scores for multiple subjects in one call (max 100)

ParametersJSON Schema

Name	Required	Description	Default
`subjects`	Yes

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are present, so the description bears full responsibility. It only adds the max 100 limit but omits traits like read-only nature, error handling, or partial failure behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, front-loading key information (batch, max limit), with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite simplicity, the description lacks return value details, parameter format, and behavioral context. Given no output schema or annotations, it is insufficient for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0% and the description provides no explanation of the 'subjects' parameter structure or the meaning of 'type' and 'value' fields, leaving the agent without necessary semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Get trust scores for multiple subjects in one call (max 100)', specifying verb, resource, and scope. It distinguishes from sibling 'agent_trust_score' which likely handles single subjects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use when multiple subjects are needed and notes a max of 100, providing context. However, it does not explicitly exclude single-subject calls or mention alternatives beyond the implied distinction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_trust_feedbackCInspect

Submit feedback about an agent/wallet (positive or negative)

ParametersJSON Schema

Name	Required	Description	Default
`rating`	Yes
`category`	Yes
`evidence`	No
`subjectType`	Yes
`subjectValue`	Yes

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must disclose behavioral traits, but it only indicates the action (submit feedback) without explaining effects, persistence, anonymity, or rate limits. The impact on trust scores is implied but not stated.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, which is concise but lacks necessary detail. It is not verbose but sacrifices completeness for brevity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 5 parameters (4 required), enums, a nested object, and no output schema, the description is insufficient. It does not explain return values, behavior, or provide examples, leaving significant gaps for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It mentions 'positive or negative', hinting at rating, but does not explain the rating scale, categories, evidence format, or subjectType values. Adds minimal value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description specifies submitting feedback about an agent or wallet, which distinguishes it from sibling tools like agent_trust_score (retrieves score) and agent_trust_batch (batch operations). However, it omits subject types domain and agent_card_url, making it slightly incomplete.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is given on when to use this tool versus alternatives (e.g., agent_trust_score). There are no prerequisites or context for appropriate use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_trust_scoreCInspect

Get trust score for a wallet, agent card URL, or domain

ParametersJSON Schema

Name	Required	Description	Default
`subjectType`	Yes
`subjectValue`	Yes
`includeDetails`	No

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are present, so the description carries the full burden. It fails to disclose behavioral traits such as idempotency, rate limits, required permissions, or how missing subjects are handled. This lack of transparency is a significant gap for a tool that likely has side effects or data dependencies.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence with no redundant words. It is well-structured and easy to read, though its brevity limits its informativeness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of output schema, annotations, and parameter descriptions, the description is incomplete. It does not explain the return format (e.g., score range), error handling, or any constraints. Important context about the trust score's meaning and interpretation is missing.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, meaning no parameter descriptions exist in the schema. The description only reiterates the enum values of subjectType without explaining subjectValue or includeDetails. It adds minimal value beyond the schema's own enum labels and does not help an agent understand the parameters' semantics or valid formats.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get' and the resource 'trust score' along with acceptable subject types (wallet, agent card URL, domain). It is easily understandable and distinguishes itself from related siblings like agent_trust_batch and agent_trust_feedback by focusing on a single subject.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No usage guidance is provided. The description does not indicate when to use this tool versus alternatives (e.g., agent_trust_batch for batch queries or agent_trust_feedback for providing feedback), nor does it mention any prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_webhook_createCInspect

Create a webhook endpoint that relays requests to your agent

ParametersJSON Schema

Name	Required	Description	Default
`agentId`	No
`pushUrl`	No
`ttlSeconds`	No
`description`	No
`deliveryMode`	Yes
`transformRules`	No

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It only states the tool creates the endpoint but does not disclose authentication needs, rate limits, or what happens after creation (e.g., how requests are relayed, if there is a setup process).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief but at the cost of missing critical details for a 6-parameter tool. It is not properly front-loaded with essential info.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (6 parameters, nested objects, no output schema), the description fails to explain return values or parameter behavior, leaving the agent under-informed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description adds no information about parameters like agentId, pushUrl, ttlSeconds, deliveryMode, etc. The agent must guess their meanings from names alone, which is insufficient for correct invocation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a webhook endpoint that relays requests to the agent, distinguishing it from sibling tools like agent_webhook_list_requests (listing), agent_webhook_poll (polling), and agent_webhook_replay (replaying).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives, such as when to poll vs. use HTTP push. No prerequisites or context for usage are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_webhook_list_requestsCInspect

List requests received by a webhook endpoint

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`offset`	No
`endpointId`	Yes

Tool Definition Quality

C2.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It implies a read-only operation but does not disclose rate limits, authentication requirements, pagination behavior (though limit/offset exist in schema), or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single short sentence that is front-loaded and concise. It contains no unnecessary words, but could be slightly expanded to add value without becoming verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and 0% schema coverage, the description is insufficient. It does not describe the return format, pagination details, or what constitutes a 'request' in the response.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description should explain the parameters. It does not mention endpointId, limit, or offset, leaving the user to infer from parameter names alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('list') and the resource ('requests received by a webhook endpoint'). It distinguishes this tool from other webhook-related siblings like agent_webhook_create, agent_webhook_poll, and agent_webhook_replay.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives such as agent_webhook_poll or agent_webhook_replay. The description does not specify usage context, prerequisites, or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_webhook_pollCInspect

Poll for new webhook requests (long polling, max 60s)

ParametersJSON Schema

Name	Required	Description	Default
`after`	No
`limit`	No
`timeout`	No
`endpointId`	Yes

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that it uses long polling and has a 60s maximum timeout, which are behavioral traits beyond the schema. However, no annotations exist, and the description does not clarify whether it is read-only, what happens on timeout, or if it blocks.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely short—one sentence—but lacks essential details. While no words are wasted, the brevity comes at the cost of completeness. A middle score is appropriate as it is clear but under-specified.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 4 parameters, no param descriptions, and no output schema, the description is severely incomplete. It does not explain parameters, return values, or how to use endpointId, leaving the agent with insufficient context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description should explain the parameters, but only indirectly references timeout via 'max 60s'. The meaning of 'after', 'limit', and especially 'endpointId' are not conveyed, leaving the agent to guess.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action (poll) and resource (new webhook requests), with added detail about long polling and a max 60s timeout. However, does not explicitly distinguish from sibling agent_webhook_list_requests, which could be used for historical views.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like agent_webhook_list_requests or agent_webhook_replay. The description implies real-time polling but does not specify context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_webhook_replayCInspect

Replay a stored webhook request

ParametersJSON Schema

Name	Required	Description	Default
`toUrl`	No
`requestId`	Yes
`endpointId`	Yes

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so the description must fully disclose behavior. It only says 'Replay' but does not explain whether it modifies state, creates new entries, or requires authentication. Minimal behavioral context beyond the verb.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no fluff, front-loaded purpose. However, it is too terse; conciseness should not sacrifice necessary detail. Scores are adequate but not excellent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given lack of annotations, output schema, and parameter descriptions, the description is incomplete. An agent needs more information about the replay action, return value, and how to obtain the stored request.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% with no parameter descriptions. The description does not explain the meaning of endpointId, requestId, or toUrl. For a tool with three parameters, this is a critical gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Replay a stored webhook request' clearly states the specific verb 'Replay' and the resource 'stored webhook request'. It distinguishes from sibling tools like agent_webhook_create which creates a webhook, and agent_webhook_list_requests which lists requests.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like agent_webhook_list_requests or agent_webhook_poll. Missing prerequisites such as needing a stored webhook request from previous listing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

alerts_create_watchAInspect

Create a saved watch: when a NEW matching event appears in a ledger, a notification is pushed to your destination. Filters: ledger (optional), keyword and/or entity (case-insensitive title substring). At least one filter is required. Backfill never fires.

ParametersJSON Schema

Name	Required	Description
`entity`	No	Additional case-insensitive substring (e.g. company name)
`ledger`	No	Ledger key, e.g. 'sanction' (omit for all ledgers)
`keyword`	No	Case-insensitive substring of the item title
`destinationType`	Yes
`destinationTarget`	Yes	Webhook URL / relay endpoint / email address

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses key behavior: only fires on NEW matching events, backfill never fires, notifications pushed to destination. No annotations provided, so description carries full burden, and it does so adequately.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, each focused: first defines purpose, second details filters and backfill. No waste, front-loaded with essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, filters, backfill, and notification. No output schema, but the description does not explain destination enums or management. Adequate for a creation tool given schema covers enums.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds meaning beyond schema by clarifying filter requirements (at least one of keyword/entity), case-insensitivity, and optional ledger. With 80% schema coverage, the description enhances understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it creates a saved watch that pushes notifications for new matching events. It distinguishes from sibling 'alerts' tools (delete, list) but does not explicitly contrast with other watch creation tools like sanction_watch or grant_watch.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains filters (ledger, keyword, entity) and requires at least one filter. It implies use for notifications on new events but does not explicitly state when not to use or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

alerts_delete_watchAInspect

Deactivate a saved watch by id (soft delete; stops future notifications).

ParametersJSON Schema

Name	Required	Description	Default
`watchId`	Yes

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses soft delete and notification cessation, which is good for a simple tool without annotations. However, lacks mention of auth requirements or rate limits, which would be useful for full transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One sentence, under 20 words, front-loaded with the action and key behavioral details. No fluff; every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple 1-parameter tool with no output schema, the description covers the action and effect. However, lacks details like reactivation possibility or return value, but acceptable given the simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single required parameter (watchId) is self-evident from the schema, but the description does not explain what it is, its format, or how to obtain it. With 0% schema description coverage, the description should compensate but barely does.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool deactivates a saved watch by ID, includes the nuanced behavior of soft delete and stopping notifications, and distinguishes itself from sibling tools like alerts_create_watch and alerts_list_watches.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied usage when wanting to stop a watch, but no explicit guidance on when to use versus alternatives or prerequisites. Given the many watch-related siblings, explicit when/why not would help.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

alerts_list_watchesBInspect

List the calling user's saved watches.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided; description is minimal. It implies a non-destructive read operation returning a list, but doesn't detail what constitutes a watch, potential pagination, or ordering. Given simplicity, this is adequate but could be more explicit.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded, no wasted words. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a zero-parameter list tool without output schema, description is minimally complete. It could benefit from clarifying the return format or scope (e.g., 'Returns an array of watch objects' or 'Includes all active alert watches').

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters exist (0 params), schema coverage 100%. Baseline 4 applies as description doesn't need to add param info.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states it lists saved watches, clear verb+resource. Differentiates from alerts_create_watch and alerts_delete_watch by listing versus crud. However, it doesn't explicitly clarify that these are alerts watches (vs other watch types), but name implies scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives. Does not mention prerequisites or typical use case, leaving agent to infer context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bid_watch_getCInspect

Get a bid notice detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations to rely on, and the description only mentions returned fields. Does not disclose behavioral aspects such as authentication requirements, error handling for invalid itemId, or whether the operation is read-only. For a simple get, minimal transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise with two sentences and no extraneous information. It could be slightly improved by including parameter details, but it is not verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema), the description is adequate but incomplete. It states what is returned but does not explain other potential fields, error cases, or usage context. With no annotations, more detail would help.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter (itemId) has no description in the schema, and the description does not clarify its format, expected values, or how it relates to the tool's functionality. Schema coverage is 0%, and the description adds no parameter meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves a bid notice detail plus full event timeline, specifying returned fields. It distinguishes from siblings like bid_watch_search and bid_watch_timeline by combining both detail and timeline.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus similar tools like bid_watch_timeline or bid_watch_search. No prerequisites or alternatives mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bid_watch_recent_changesAInspect

Recent appearance / deadline-move / close / cancel / award events across all bid notices since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`entity`	No

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are present, so the description carries the burden. It states the tool returns events with firstSeenAt and ledgerVerified, indicating a read operation. However, it does not disclose other behavioral aspects like idempotency, rate limits, or authentication requirements. There is no contradiction with annotations (none provided).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no redundancy. It front-loads the core functionality and includes key details efficiently. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and low schema coverage, the description is incomplete. It does not specify the response format, pagination, or error handling. The return values are only partially described. For a tool with this complexity, more detail is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description should explain all parameters. It only explains 'since' (the timestamp) but does not describe 'limit' or 'entity'. This leaves a significant gap in parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it tracks recent events (appearance, deadline-move, close, cancel, award) across all bid notices since a given timestamp, and specifies each item includes firstSeenAt and ledgerVerified. This distinguishes it from siblings like bid_watch_get (specific notice) and bid_watch_search (search by criteria).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for monitoring recent changes since a timestamp but does not explicitly state when to use this tool versus alternatives like bid_watch_search or bid_watch_timeline. No when-not or alternative guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bid_watch_searchCInspect

Search Japanese public-procurement bid notices (kkj.go.jp). Each hit includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No
`since`	No
`entity`	No	調達機関 (partial match)
`status`	No
`bidType`	No	一般競争入札 / 指名 / 随意等

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description must disclose behavioral traits. It only mentions output fields but lacks information on authentication, rate limits, error behavior, or whether it's read-only. Minimal disclosure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose and a useful output detail. No wasted words, but could expand slightly on usage without becoming verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Tool has 6 parameters (none required) and no output schema or annotations. Description omits explanation of query, limit, since, entity, status, bidType. Output only mentions two fields, leaving the rest unspecified. Incomplete for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 33% (from context). Description does not explain any parameters; it only mentions output fields. With low coverage, description should compensate but fails to add meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool searches Japanese public-procurement bid notices from kkj.go.jp, mentions key result fields (firstSeenAt, ledgerVerified), and distinguishes it from sibling tools like bid_watch_get or bid_watch_recent_changes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like bid_watch_get or watch tools for other domains. No exclusions or context provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bid_watch_timelineBInspect

Time-ordered events only for a bid notice (the differentiator: when it appeared, deadline moved, closed, was cancelled or awarded). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses the tool returns events with 'firstSeenAt and ledgerVerified' but does not mention read-only behavior, authentication needs, rate limits, or error handling (e.g., invalid itemId).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the core purpose and key differentiators. Every word adds value without repetition or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description covers purpose and key fields but lacks details on return format, pagination, or error scenarios. It is adequate but could be more complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has one parameter (itemId) with 0% coverage, meaning no description in schema. The tool description does not explain what itemId represents beyond the context of 'bid notice', missing format, example, or required characteristics. This is insufficient compensation for low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns 'time-ordered events only for a bid notice' and lists specific event types (appearance, deadline moved, closed, cancelled, awarded). This distinguishes it from sibling tools like bid_watch_get (likely details) and bid_watch_recent_changes (likely summary).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for timeline retrieval with 'the differentiator' phrasing but does not explicitly state when to use this over alternatives or when not to use it. No exclusion criteria or alternative tool names are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bid_watch_verify_ledgerAInspect

Verify the hash-chain integrity of a bid notice (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description explains that the tool checks hash-chain integrity for tamper detection and lists the output fields. It implies a read-only operation, but since annotations are absent, it largely relies on the description for transparency. Additional details like required permissions or rate limits are missing, but the core behavior is clear.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that front-loads the purpose and efficiently lists the return values. Every word earns its place, and there is no repetition or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema), the description covers purpose and output, but lacks an explanation of the itemId parameter and any prerequisites. It does not mention error cases or when this verification is valid, making it moderately complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter, itemId, is a string with no description in the schema (0% coverage). The description does not explain what itemId represents (e.g., the bid notice ID), leaving the agent without necessary guidance. This is a significant gap for a required parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool verifies hash-chain integrity for tamper detection, and lists the return fields. It specifies 'bid notice', distinguishing it from other verify_ledger tools like grant_watch_verify_ledger or landprice_watch_verify_ledger.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide any guidance on when to use this tool versus alternatives, such as bid_watch_get or other verification tools. It neither mentions prerequisites nor scenarios where this tool is preferred.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

carbon_estimate_compute_emissionsAInspect

Estimate electricity CO2e (kg) from energy use (kWh) and a regional grid-intensity factor (defaults to the IEA world average). Pure compute; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`kWh`	Yes	Electricity consumed in kWh (>= 0)
`region`	No	Grid region (default: global).

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that the call is 'Pure compute; price 0.0 (free)', implying no side effects or costs. However, it does not mention rate limits, caching, or whether results are deterministic. Given the lack of annotations, the transparency is adequate but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise at two sentences, with the primary action and key details front-loaded. Every word serves a purpose, and no redundant information is present.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite the lack of an output schema, the description covers the essential return information (CO2e in kg). For a simple computation tool, this is largely sufficient. However, a brief mention of the output format (e.g., single number) would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the baseline is 3. The description adds value by clarifying the default regional intensity ('defaults to the IEA world average') and specifying the output unit ('CO2e (kg)'), which enhances understanding beyond the schema's enum and type descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Estimate'), resource ('electricity CO2e'), and key inputs ('kWh', 'region'). It distinguishes itself from sibling carbon tools by specifying 'Pure compute; price 0.0 (free)', indicating it's a simple calculation rather than a real-world API call.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for electricity emissions estimation but provides no explicit guidance on when to use this tool versus alternatives like carbon_estimate_shipping_emissions or carbon_estimate_travel_emissions. No exclusions or context about prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

carbon_estimate_emission_factorAInspect

Look up the published emission factor (value, unit, category) for a named activity key (e.g. shipping_air, electricity_global, travel_car, gasoline, beef). Pure compute; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`activity`	Yes	Canonical activity key (e.g. shipping_air, travel_car, gasoline)

Tool Definition Quality

A3.5/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so the description must fully disclose behavior. It states the tool is 'pure compute' and free, but omits details on error handling for invalid activity keys, authentication requirements, or rate limits. The behavioral disclosure is minimal.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise—two sentences with no filler. It front-loads the action ('Look up the published emission factor') and efficiently provides examples and pricing.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description mentions the return fields (value, unit, category) but does not elaborate on the response structure. For a simple lookup tool, this is adequate but could be more informative about expected output format.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with the activity parameter already described as a canonical key. The tool description adds extra examples (beef, electricity_global) beyond the schema, providing additional context for valid values.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: looking up published emission factors for a named activity key. It provides specific examples (shipping_air, electricity_global, etc.) and distinguishes itself from sibling tools like carbon_estimate_compute_emissions by focusing on a single factor lookup.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool over alternatives. While it mentions 'Pure compute; price 0.0 (free)', it does not describe when to choose this vs. other carbon estimation tools (e.g., compute_emissions, offset_estimate).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

carbon_estimate_offset_estimateAInspect

Estimate voluntary-market offset cost (USD) and tree- / forest-year equivalents for a given kg CO2e, using published constants. Pure compute; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`kgCO2e`	Yes	Emissions to offset, in kg CO2e (>= 0)

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It explicitly states 'Pure compute; price 0.0 (free)', which discloses key behavioral traits (no side effects, no cost). However, it does not mention rate limits or potential data sources, but for a simple compute tool, this is sufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences, each providing essential information. It is front-loaded with the main action and resource, and every sentence earns its place. No superfluous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description should clarify return values. It mentions 'offset cost (USD) and tree- / forest-year equivalents', which adequately informs the agent of the output types. However, it could be more explicit about the exact structure (e.g., numeric values, possibly an object). Still, it is reasonably complete for a simple compute tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% for the single parameter 'kgCO2e', with a clear description. The tool description adds value by explaining what the output will be (offset cost and equivalents), but does not elaborate on the parameter beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool estimates voluntary-market offset cost and tree/forest-year equivalents for a given kg CO2e. The verb 'estimate' and resource 'offset cost' are specific, and it naturally distinguishes from sibling tools like carbon_estimate_compute_emissions which compute emissions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by stating 'for a given kg CO2e' but does not provide explicit guidance on when to use this tool versus alternatives, nor does it mention exclusions or prerequisites. The 'Pure compute; price 0.0 (free)' note gives some context but is not enough for clear usage differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

carbon_estimate_shipping_emissionsBInspect

Estimate freight CO2e (kg) from weight, distance and transport mode using embedded GLEC/DEFRA-order emission factors. Pure compute; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Freight transport mode.
`weightKg`	Yes	Shipment weight in kilograms (>= 0)
`distanceKm`	Yes	Transport distance in kilometres (>= 0)

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description bears full burden. States 'Pure compute; price 0.0 (free)' indicating no side effects and cost. However, it does not disclose behavior for edge cases (e.g., zero weight) or accuracy of estimates.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise: one functional sentence plus a short note on pricing. Every word adds value; no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple compute tool with all parameters described in schema, the description is mostly complete. It mentions output (CO2e in kg) and that it uses GLEC/DEFRA factors. Could indicate return format more explicitly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description reiterates inputs (weight, distance, mode) but adds no new meaning beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool estimates freight CO2e using specific inputs and emission factors. It specifies verb and resource, but does not explicitly differentiate from sibling tools like carbon_estimate_compute_emissions or carbon_estimate_travel_emissions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. No mention of prerequisites, limitations, or scenarios. Only states it's free and pure compute.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

carbon_estimate_travel_emissionsAInspect

Estimate passenger-travel CO2e (kg) from distance and travel mode using embedded per-passenger-km factors. Pure compute; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`mode`	Yes	Passenger travel mode.
`distanceKm`	Yes	Travel distance in kilometres (>= 0)

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, description adds value by stating 'Pure compute' (no side effects) and 'price 0.0 (free)'. Discloses that it uses 'embedded per-passenger-km factors' for methodology. Adequate for a simple compute tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with core purpose, additional behavior note appended. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks output schema, and description does not specify return structure (e.g., response is a number). While unit 'kg CO2e' is provided, completeness is only adequate for a straightforward tool; missing response details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters with descriptions. Description merely reiterates 'distance and travel mode' without adding new semantic meaning, meeting baseline expectation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states verb 'Estimate' and resource 'passenger-travel CO2e (kg)' with inputs distance and mode. Distinguishes from sibling carbon tools like shipping_emissions by specifying 'passenger-travel'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives. Mentions 'Pure compute; price 0.0 (free)' but does not direct agent to use for travel estimates versus other carbon tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

commerce_catalog_agent_readiness_scoreAInspect

Agentic-commerce readiness score (0-100) for how well a product is structured for autonomous agents, with a transparent rationale, from a url, raw html, or inline product. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`url`	No	Product page URL to fetch (one of url / html / product)
`html`	No	Raw page HTML to parse (one of url / html / product)
`product`	No	Inline normalized product object (one of url / html / product)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It discloses read-only and free pricing, but does not describe output structure (e.g., response format), rate limits, or authentication requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that is front-loaded with core purpose and then specifies inputs and key traits. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 3 parameters fully documented in schema and no output schema, the description covers the tool's purpose, input flexibility, read-only nature, and transparency of rationale. Missing explicit output structure slightly reduces completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description groups the three parameters as input sources but adds no additional syntax, constraints, or usage details beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool computes a readiness score (0-100) for product structure for autonomous agents, with inputs from URL, HTML, or inline product. It distinguishes from sibling commerce_catalog tools (e.g., product_extract, price_compare) by its scoring focus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage via specifying input sources and notes read-only/free nature, but does not explicitly state when to use this tool vs. alternatives like product_extract or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

commerce_catalog_availability_checkAInspect

Resolve product stock availability to a coarse signal (in_stock / out_of_stock / limited / preorder / unknown), from a url, raw html, or inline product. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`url`	No	Product page URL to fetch (one of url / html / product)
`html`	No	Raw page HTML to parse (one of url / html / product)
`product`	No	Inline normalized product object (one of url / html / product)

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description explicitly states 'Read-only; price 0.0 (free),' which is good for transparency given no annotations. However, it does not disclose behavior for unreachable URLs, invalid input, or response structure beyond the coarse signal categories. More detail about error handling or default behavior would improve the score.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that packs key information (purpose, input types, output signal, behavioral traits). It could be slightly more structured (e.g., separate sentences for input constraints), but it is front-loaded with the most important action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has three parameters and no output schema, the description lacks critical details about the return format (e.g., JSON structure, whether it returns a string or object). It also does not specify how the output signal is presented or any constraints on the product object input. This leaves the agent guessing about how to parse the result.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for each parameter. The description adds high-level context ('from a url, raw html, or inline product') but does not provide additional semantic details, such as expected formats, object structures, or how to choose among the three inputs. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description specifies the tool's exact function: resolving stock availability into a coarse signal with enumerated output values (in_stock, out_of_stock, limited, preorder, unknown). It lists the three input sources (url, html, inline product), distinguishing it from sibling tools like commerce_catalog_product_extract or commerce_catalog_price_compare.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives, nor when not to use it. The description implies use for stock availability checking, but it would benefit from context like 'Use this to check product availability; for full product details, see commerce_catalog_product_extract.'

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

commerce_catalog_catalog_validateAInspect

Validate product-feed completeness: which required fields are present or missing and a completeness score, from a url, raw html, or an inline product object. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`url`	No	Product page URL to fetch (one of url / html / product)
`html`	No	Raw page HTML to parse (one of url / html / product)
`product`	No	Inline normalized product object (one of url / html / product)

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, description carries full burden. Discloses read-only and free status, but lacks detail on rate limits, auth requirements, or error handling. Provides basic transparency but not exhaustive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no fluff. Front-loaded with action and output. Highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Describes output (fields present/missing and score) despite no output schema. Clarifies input exclusivity. For a simple validation tool, it is sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good descriptions. The description reinforces mutual exclusivity of parameters, adding minimal extra value beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action (validate), target (product-feed completeness), and output (present/missing fields and score). Distinguishes from sibling tools like commerce_catalog_product_extract by focusing on validation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Specifies that inputs can be url, raw html, or inline product object, and mentions it's read-only and free. However, does not explicitly state when NOT to use it or provide alternatives like product_extract for extraction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

commerce_catalog_price_compareAInspect

Compare a set of offers and return the cheapest, most expensive, spread and per-offer ranking. Requires a non-empty offers array. Pure; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`offers`	Yes	Non-empty list of offers to compare.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description takes on the burden of transparency. It states the tool is 'pure' (no side effects) and has 'price 0.0' (free), which are helpful behavioral cues. It does not detail error handling or exact output structure, but the mentioned traits are sufficient for a read-only operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences that convey purpose, prerequisite, and key behavioral attributes. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple input (one parameter) and no output schema, the description gives a reasonable overview of what is returned (cheapest, most expensive, spread, ranking). It could be improved by specifying the return format precisely, but it is sufficient for an agent to understand the tool's function.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema covers 100% of the parameter descriptions, so the baseline is 3. The description adds the requirement that the array be non-empty, which is an important constraint not fully captured by the schema. However, it does not elaborate on the individual properties beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool compares offers and returns specific results (cheapest, most expensive, spread, ranking). It uses a specific verb ('compare') and resource ('offers'), and it's distinct from sibling tools like commerce_catalog_availability_check or product_extract.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear prerequisite ('Requires a non-empty offers array') and notes the tool is free and pure. However, it doesn't explicitly state when to use this tool over alternatives, though in context it's the only price comparison tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

commerce_catalog_product_extractAInspect

Extract a normalized product (name, price, currency, availability, brand, GTIN, ...) from Schema.org / JSON-LD markup. Provide a url to fetch or raw html. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`url`	No	Product page URL to fetch (one of url / html)
`html`	No	Raw page HTML to parse (one of url / html)

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the burden. It declares read-only and free behavior, which is helpful. However, it does not disclose error handling, rate limits, or what happens for invalid markup. The behavioral disclosure is minimal but accurate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise—two short sentences. It front-loads the verb and resource, includes key details (fields extracted, input requirements), and contains no unnecessary information. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having no output schema, the description only partially describes the output (list of fields with ellipsis). It doesn't explain the return format (e.g., JSON), error conditions, or what happens if no markup is found. Given the tool's simplicity, this is adequate but not fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema covers 100% of parameters with descriptions, but the description adds the important constraint that exactly one of url or html should be provided (mutual exclusivity). This semantic guidance is not present in the schema, adding value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool extracts a normalized product from Schema.org/JSON-LD markup, listing specific fields. It distinguishes from other commerce catalog tools (e.g., availability_check, price_compare) by focusing on structured data extraction. However, it could be more explicit about its uniqueness among siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context for use (when you have a URL or HTML with markup) and notes it is read-only and free. It does not explicitly state when not to use this tool or suggest alternative tools, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

company_registry_company_profileAInspect

Fetch a company profile (name, status, incorporation / dissolution dates, type, address) by jurisdiction + company number via OpenCorporates. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`jurisdiction`	Yes	Jurisdiction code (e.g. gb, us_de)
`companyNumber`	Yes	Registry company number

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the burden. It states 'Read-only' and 'price 0.0 (free)', indicating safe, free usage. However, it lacks details on rate limits, authentication needs, error handling (e.g., company not found), or data source caching, which would enhance transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that includes the core purpose, source, method, and cost. Every element is necessary, and there is no redundant or verbose language.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given there is no output schema, the description provides a useful list of returned fields (name, status, dates, type, address). It mentions the source (OpenCorporates) and cost. However, it could mention the response format or additional fields for completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema covers both parameters fully (100% coverage). The description simply reiterates the parameters ('by jurisdiction + company number') without adding new semantic meaning or constraints beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Fetch a company profile') and specifies the resource (company profile with fields like name, status, dates, type, address). It distinguishes from sibling tools like company_registry_search_company and company_registry_officers by requiring both jurisdiction and company number for a direct profile lookup.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when you have a specific company number and jurisdiction, and it notes the tool is read-only and free. However, it does not explicitly mention alternatives or when not to use it, though the context of sibling tools provides some differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

company_registry_jurisdiction_infoAInspect

Look up metadata for a jurisdiction code (name, registry details). Pure compute; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`code`	Yes	Jurisdiction code (e.g. gb, us_de)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided; description says 'Pure compute' indicating no side effects, but lacks details on error handling, rate limits, or behavior for invalid codes.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, directly to the point, no unnecessary words. Excellent conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple lookup tool with one parameter and no output schema, the description covers the purpose and pricing. Could mention the return type but is largely complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single parameter, and the description restates the same example format. No additional meaning is added beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'look up metadata for a jurisdiction code (name, registry details)', which distinguishes it from sibling tools like company_profile or officers that focus on specific company data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Mentions 'Pure compute; price 0.0 (free)' but does not specify when to use this tool over others (e.g., when you need jurisdiction metadata vs company search). Usage is implied but not explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

company_registry_officersAInspect

List a company officers (name, position, start / end dates, current flag) by jurisdiction + company number via OpenCorporates. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`jurisdiction`	Yes	Jurisdiction code (e.g. gb, us_de)
`companyNumber`	Yes	Registry company number

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It states the tool is read-only and free, which are key behavioral traits. However, it does not disclose potential rate limits, pagination, data freshness, or authentication requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is front-loaded with the action and key details. Every word contributes meaning—no filler or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (2 parameters, no output schema), the description adequately covers what the tool does and its inputs. It lists the output fields (name, position, dates, current flag), which is helpful. Missing details on output structure or limits are minor given the tool's simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema coverage is 100%, so the schema already documents both parameters. The description adds context like 'e.g. gb, us_de' for jurisdiction, but this is also in the schema's description. No additional semantics beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists company officers with specific fields (name, position, start/end dates, current flag) and specifies required inputs (jurisdiction + company number). It distinguishes itself from sibling tools like company_registry_company_profile by focusing on officers.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions required inputs ('by jurisdiction + company number') and declares the tool as read-only and free. However, it does not provide explicit guidance on when to use this tool versus alternatives (e.g., search_company) or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

company_registry_search_companyAInspect

Search for companies by name (optionally scoped to a jurisdiction) via the OpenCorporates open API. Returns candidate companies for disambiguation. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`name`	Yes	Company name to search (partial match)
`limit`	No	Max candidates to return
`jurisdiction`	No	Optional jurisdiction code (e.g. gb, us_de)

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Explicitly states 'Read-only; price 0.0 (free)', which covers safety and cost. With no annotations provided, this disclosure is valuable. Could mention rate limits or pagination, but acceptable for a simple search.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first states purpose and API source, second adds key context (returns candidates, read-only, free). No redundancy, efficiently front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 3-param search tool with no output schema or nested objects, the description covers purpose, scope, and safety. Missing details like result format or error handling, but these are predictable for a search tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions. Description adds little beyond 'optionally scoped to a jurisdiction' and 'partial match' which mirrors schema. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Search for companies by name (optionally scoped to a jurisdiction)' with a specific verb and resource. Distinguishes from sibling tools like company_profile and officers by focusing on disambiguation search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implicitly suggests use for initial disambiguation, but no explicit guidance on when to use this vs alternatives or when not to use it. Mentions read-only and free, which is helpful for safety but not comparative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

company_registry_validate_numberAInspect

Validate a company-registration number against the expected format for its jurisdiction. Pure compute; returns valid flag, normalized value and reason. Price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`jurisdiction`	Yes	Jurisdiction code (e.g. gb, us_de, au)
`companyNumber`	Yes	Company number to validate

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It declares 'Pure compute' meaning no side effects, and mentions price 0.0 (free). It does not detail rate limits or idempotency, but the core behavior is well disclosed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three short sentences with no redundancy. Every sentence adds value: purpose, behavior, and return info. Highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description mentions the return values (valid flag, normalized value, reason). It covers what the tool does, its inputs, its side-effect-free nature, and pricing. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and includes descriptions for both parameters. The description adds no new parameter-level details beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Validate') and resource ('company-registration number'), and clarifies the jurisdictional context. It clearly distinguishes from sibling tools like company_profile or search_company.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states it's 'Pure compute' and free, implying safe repeated use, but does not explicitly say when to use it instead of alternatives or when not to use it. Context is clear but no exclusions or alternatives mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_authenticity_ai_likelihoodAInspect

Heuristic likelihood (0-100) that a passage of text is AI-generated, from lexical-diversity and burstiness signals with a transparent rationale. Pure, no network; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`text`	Yes	Text passage to score

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description adds some behavioral context: it's heuristic, free, and has no network dependency. But it doesn't disclose potential limitations, such as text length constraints or the format of the 'transparent rationale' in the output.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (two sentences) and front-loaded with the core purpose and output range. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema and no annotations. The description mentions the output is a 0-100 likelihood and 'transparent rationale', but doesn't fully describe the response structure. For a simple tool, it's somewhat complete but could specify the exact output fields.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage, so baseline is 3. The description adds meaning by explaining the tool uses lexical-diversity and burstiness signals, beyond the schema's simple 'Text passage to score' description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool gives a heuristic likelihood (0-100) of AI-generated text using lexical-diversity and burstiness signals. This distinguishes it from sibling tools like watermark_detect or provenance_check, which focus on other authenticity aspects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the tool is 'Pure, no network; price 0.0 (free)', indicating it's a local, free heuristic. However, it doesn't explicitly state when to use this tool versus alternatives like C2PA inspection or domain reputation checks, but the context is clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_authenticity_c2pa_inspectAInspect

Detect a C2PA / Content Credentials manifest by scanning the raw image bytes for JUMBF / C2PA markers and lifting any human-readable claim generator. Provide an image url. Read-only; one HTTPS GET; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	Image URL to inspect

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that the tool is read-only, performs one HTTPS GET, and is free. With no annotations, this adds essential behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first explains core action, second adds usage and behavioral notes. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers input requirements but does not describe output format or contents of the manifest. Suitable for a simple input schema but lacks completeness for a scanning tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter 'url' with schema description 'Image URL to inspect'. The description repeats this, adding no extra meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool detects C2PA/Content Credentials manifests by scanning raw image bytes for JUMBF markers and lifting claim generators. Distinct from sibling tools like watermark_detect or ai_likelihood.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides basic instruction ('Provide an image url.') but does not explicitly specify when to use this vs. alternatives or when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_authenticity_domain_reputationBInspect

Transparent heuristic reputation score (0-100) for a domain combining age, TLS validity, DNS / MX and SPF / DMARC signals via free HTTPS (DoH / RDAP / CT). Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes	Domain name to score

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses read-only nature and zero cost, and mentions it uses free HTTPS endpoints. However, with no annotations, the description carries the full burden; it does not address rate limits, error handling, or data freshness.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One sentence, front-loaded with key information (heuristic, score range, signals used, read-only, free). Every word adds value; no wasted text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description provides sufficient context: what the score represents and the signals considered. It could mention the return format explicitly, but the score range is given.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter 'domain' with a schema description. Schema coverage is 100%, so the description adds no extra meaning beyond what the schema provides. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it computes a heuristic reputation score (0-100) for a domain using various signals. However, it does not differentiate from the sibling tool domain_intel_reputation, which could cause confusion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Mentions it is read-only and free, but provides no guidance on when to use this tool versus alternatives like domain_intel_reputation or other reputation tools. Lacks explicit when-to-use or when-not-to-use context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_authenticity_provenance_checkAInspect

Combined media-provenance verdict for an image url: C2PA presence plus the hosting domain reputation, with rationale. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	Image URL to check

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must carry the burden. It discloses it is read-only and free (price 0.0), which is helpful. However, it does not mention rate limits, authentication requirements, or behavior for invalid URLs, leaving gaps in expected behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that front-loads purpose and includes key constraints (read-only, free). Every part is essential, and there is no wasted wording.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With one parameter, no output schema, and no annotations, the description covers purpose, cost, and read-only nature. It lacks details on the return format (e.g., structure of verdict and rationale), but the simplicity of the tool reduces the need for extensive context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description adds minimal extra value for parameters. It confirms the URL is an image URL, but the schema already states 'Image URL to check'. No parameter-specific guidance beyond the schema is provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides a combined Media-provenance verdict for an image URL, combining C2PA presence and domain reputation with rationale. This distinguishes it from sibling tools like content_authenticity_c2pa_inspect and content_authenticity_domain_reputation, which only cover one aspect.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for a quick combined verdict, but does not explicitly state when to use this tool versus the separate C2PA or domain reputation tools. No exclusions or alternatives are mentioned, leaving the agent to infer context from sibling names.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_authenticity_watermark_detectAInspect

Best-effort watermark and provenance-marker detection from the raw image bytes (e.g. C2PA / IPTC / XMP markers). Provide an image url. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	Image URL to inspect

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description adds 'Best-effort', 'Read-only', and 'price 0.0 (free)' – useful but incomplete. It does not disclose behavior on errors, missing watermarks, or output format, which would help an agent anticipate tool results.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no extra words. The first states the function, the second provides usage instructions and cost. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description covers the essential purpose and input. It could be improved by hinting at the output (e.g., 'returns detected markers or confidence'), but overall it is sufficiently complete for its simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already fully describes the 'url' parameter as 'Image URL to inspect' (100% coverage). The description adds 'raw image bytes' context, but that is marginal – the baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool does 'watermark and provenance-marker detection' from image bytes, using examples like C2PA/IPTC/XMP. This distinguishes it from siblings like content_authenticity_ai_likelihood and content_authenticity_c2pa_inspect which focus on AI likelihood or full C2PA metadata.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description tells the user to 'Provide an image url' and notes 'Read-only; price 0.0 (free)'. While it implicitly guides usage for watermark detection, it does not explicitly contrast with alternatives like content_authenticity_c2pa_inspect, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

domain_intel_dns_lookupAInspect

Resolve A / AAAA / MX / TXT / NS / CNAME records for a domain via DNS-over-HTTPS (Cloudflare 1.1.1.1 JSON). Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`types`	No	Subset of record types to resolve (default: all).
`domain`	Yes	Domain name (bare host; URLs/trailing dots tolerated)

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description must fully disclose behavior. It states the tool is read-only and free, and mentions the backend (Cloudflare 1.1.1.1). However, it does not cover error handling, response format, or any potential rate limits, leaving gaps in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the core purpose and support information. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema, the description does not explain the return value format (e.g., list of records). While the tool is relatively simple, this omission reduces completeness. The description could include a note about the response structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%; both parameters (domain and types) are described in the schema. The description adds no extra meaning beyond what the schema provides, so it meets the baseline but does not exceed it.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Resolve' and the resource 'A / AAAA / MX / TXT / NS / CNAME records for a domain'. It distinguishes from sibling tools like domain_intel_whois_summary by specifying the exact record types and the backend (Cloudflare DNS-over-HTTPS).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the tool is read-only and free, providing some context for safe usage. However, it does not explicitly guide when to use this tool versus alternative domain intelligence tools (e.g., domain_intel_reputation) or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

domain_intel_email_auth_checkAInspect

Parse SPF / DMARC / DKIM presence and policy from TXT records (via DoH). SPF all-qualifier (strict/softfail/neutral/pass), DMARC p= policy + pct + rua, and best-effort DKIM selector probing. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes	Domain name to check
`dkimSelectors`	No	Optional DKIM selectors to probe (default: common selectors).

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses read-only, free, and use of DoH. It also mentions 'best-effort DKIM selector probing', which is a behavioral trait. However, it does not detail rate limits, error conditions, or response format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two clear, front-loaded sentences. The first sentence states the core action and method; the second provides key specifics (read-only, free, extraction details). No superfluous text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 2 parameters and no output schema. The description covers inputs, behavior, and pricing, but is missing what the return value looks like (e.g., JSON structure with SPF/DMARC/DKIM results). Given the lack of output schema, this is a notable gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema already describes both parameters. The description adds minimal value: it restates the domain parameter and implies default behavior for dkimSelectors ('best-effort'). However, it does not explain the format of domain or selector values.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it parses SPF/DMARC/DKIM from TXT records via DoH, listing specific extracts (SPF all-qualifier, DMARC policy, DKIM selectors). It distinguishes from sibling tools like domain_intel_dns_lookup by being specifically about email authentication.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for email authentication checks and mentions read-only and free, but does not explicitly state when to use this tool versus alternatives like general DNS lookup or reputation checks. No when-not-to-use or alternative suggestions are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

domain_intel_reputationAInspect

Transparent heuristic reputation score (0-100) combining domain age, TLS validity and email-auth strictness (SPF/DMARC) plus an MX signal. Every contribution is returned in rationale (no black box). Informational only, not a security guarantee. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes	Domain name to score

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses read-only, free, and that every contribution is returned in 'rationale' (no black box). No annotations provided, so description carries full burden and meets it well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences front-load key information: score range, components, rationale, limitations, and cost. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, description adequately explains return value (rationale with contributions). Slightly lacking explicit output format details, but sufficient for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Single parameter 'domain' is fully described in schema (100% coverage). Description adds no additional parameter semantics beyond schema, so baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly defines tool as heuristic reputation score (0-100) combining domain age, TLS, email-auth, and MX signals. Distinguishes from siblings like domain_intel_dns_lookup and domain_intel_email_auth_check by being a composite score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

States 'Informational only, not a security guarantee' and 'Read-only; price 0.0 (free)', guiding appropriate use. However, lacks explicit alternatives or when-not-to-use scenarios, though sibling tools are available for deeper checks.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

domain_intel_tls_cert_infoAInspect

TLS certificate issuer / validity / SAN summary sourced from public Certificate-Transparency logs (crt.sh JSON). Picks the freshest leaf and reports currentlyValid + daysUntilExpiry. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes	Domain name to inspect

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses the tool is read-only and free, which is helpful. However, it does not explain behavior on invalid domains, rate limits, or error cases. Annotations are absent, so the description carries the full burden but lacks some important behavioral details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loads the core purpose, and includes essential details (source, output fields, cost). No extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema), the description covers input, source, and key output fields. It is mostly complete, but could mention error handling or response format for full completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with a description for the 'domain' parameter. The description adds value by explaining the source (crt.sh) and what the tool reports, going beyond the schema's minimal description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool provides TLS certificate issuer, validity, and SAN summary from crt.sh. It specifies it picks the freshest leaf and reports currentlyValid and daysUntilExpiry. This distinguishes it from other domain_intel tools like dns_lookup or whois_summary.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description clearly implies the tool is for inspecting TLS certificates, but does not explicitly state when to use it versus alternatives. No exclusions are mentioned, making it clear but not perfectly guided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

domain_intel_whois_summaryAInspect

Domain age / registrar / expiry summary via RDAP over HTTPS (no legacy port-43 WHOIS). Returns registeredAt, expiresAt, ageDays, registrar, status and nameservers. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`domain`	Yes	Domain name to look up

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description explicitly says 'Read-only; price 0.0 (free)' and mentions 'RDAP over HTTPS (no legacy port-43 WHOIS)', providing useful behavioral context. However, it lacks details on potential limitations like rate limits or handling of unregistered domains.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: purpose, return fields, and safety/cost. Every sentence is informative without redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the tool's input, output fields, and behavioral traits adequately for a simple lookup. It could mention error handling or response format, but given the lack of output schema, it is reasonably complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter 'domain' is described in the schema, and the description does not add extra meaning beyond what the schema already provides. Schema coverage is 100%, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides a 'Domain age / registrar / expiry summary' using RDAP over HTTPS. It lists the exact return fields, distinguishing it from sibling tools like domain_intel_dns_lookup (DNS records) and domain_intel_reputation (reputation scores).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implicitly defines usage by listing the returned data (registeredAt, expiresAt, etc.), suggesting it is the tool for basic registration info. However, it does not explicitly state when to use alternatives or exclude cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

entity_searchAInspect

Search a company/person name across ALL ledgers (sanctions, licenses, recalls, pharma, bids, subsidies, grants, public comments, ordinances, ToS, real-estate, land-price). Returns hits grouped per ledger, each with matchedField, a summary, detailUrl and ledgerVerified (hash-chain integrity). ledgerVerified proves the records were not altered after they were recorded here — NOT the truth of the underlying data.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Max hits per ledger (1-100)
`query`	Yes	Entity name (company / person), partial match

Tool Definition Quality

A4.2/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It explains behavior beyond schema: returns hits grouped per ledger with fields (matchedField, summary, detailUrl, ledgerVerified) and clarifies ledgerVerified's meaning (proves record integrity, not data truth). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words. First sentence front-loads purpose and scope; second sentence clarifies ledgerVerified. Efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description explains return structure (grouped per ledger with fields). It lacks mention of pagination or total limits across ledgers, but overall is quite complete for a search tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (both parameters have descriptions). Description adds no extra meaning beyond what schema provides; baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it searches a company/person name across ALL listed ledgers, distinguishing it from sibling tools that search within a single ledger (e.g., sanctions_watch_search). The verb 'search' and resource 'all ledgers' are specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for broad cross-ledger searches but does not explicitly state when to use this over a ledger-specific search tool. No 'when-not' or alternatives are named.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

freshness_listAInspect

List the calling user's freshness webhook subscriptions (the stored secret is never returned).

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It discloses that the stored secret is never returned, which is a useful behavioral detail. However, it lacks other behavioral info like pagination or response format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with a parenthetical note, very concise and front-loaded. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no parameters and no output schema, the description covers the core purpose and a key behavioral note. Could be more complete by hinting at the output structure, but still adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 0 parameters with 100% coverage. Baseline for 0 parameters is 4. Description adds no parameter info as none are needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (list) and the resource (the calling user's freshness webhook subscriptions), with an additional note about the secret not being returned. It distinguishes from sibling tools like freshness_subscribe and freshness_unsubscribe.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for listing existing subscriptions, but does not explicitly state when to use this tool vs alternatives (e.g., when to subscribe/unsubscribe). No exclusions or guidance on when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

freshness_subscribeAInspect

Subscribe a machine/AI pipeline to ledger-change notifications: when a matching record changes, a signed "stale" event (with an F-037 receipt proving the change) is POSTed to your callback_url so your RAG/index can re-index. Filters: ledger (optional), entity and/or topic (case-insensitive title substring). At least one filter is required. Body is HMAC-signed with your secret (X-Receipt-Signature). Backfill never fires. Price 0.0.

ParametersJSON Schema

Name	Required	Description
`topic`	No	Additional case-insensitive substring
`entity`	No	Case-insensitive substring of the item title
`ledger`	No	Ledger key, e.g. 'sanction' (omit for all ledgers)
`secret`	Yes	Shared secret used to HMAC-sign the POST body
`callbackUrl`	Yes	HTTPS endpoint the stale event is POSTed to
`jurisdiction`	No	Jurisdiction code (default 'jp')

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It details the signed POST event, the F-037 receipt, and that backfill never fires. Price info is included. Lacks rate limits or retry behavior but covers core behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately sized and front-loaded with the main purpose. Each sentence adds value (subscription, filters, signing, backfill, price). Could be slightly tightened but effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and 6 parameters, the description covers the essential use case, filter constraints, and security. It doesn't detail the response format, but for a subscription creation tool, it's sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by clarifying the combined filter requirement (at least one of entity/topic), HMAC signing, and callbackUrl's role. This goes beyond the schema's individual descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with 'Subscribe a machine/AI pipeline to ledger-change notifications', which is a specific verb+resource. It clearly distinguishes this tool from siblings like 'freshness_list' and 'freshness_unsubscribe'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains that at least one filter (entity/topic) is required, describes the signing mechanism, and notes that backfill never fires. It provides context for use without explicitly listing alternatives, which is adequate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

freshness_unsubscribeAInspect

Deactivate a freshness webhook subscription by id (soft delete; stops future deliveries).

ParametersJSON Schema

Name	Required	Description	Default
`subscriptionId`	Yes

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It discloses the soft delete behavior and that future deliveries stop. However, it does not mention whether the action is reversible or any side effects, which would add value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that front-loads the purpose and includes key details (soft delete, stops deliveries). No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema), the description covers the essential aspects: action, identifier, and effect. It is complete enough for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description should compensate. It only mentions 'by id' but does not explain what subscriptionId is or how to obtain it, failing to add meaningful guidance beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (deactivate), the resource (freshness webhook subscription), and the method (by id, soft delete). It distinguishes from sibling tools like freshness_subscribe (creation) and freshness_list (listing).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use—to stop deliveries of a freshness webhook subscription. It does not explicitly state when not to use or provide alternatives, but given the narrow scope, the guidance is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

geo_intel_bbox_centerAInspect

Compute the center point plus width/height (km) of a geographic bounding box. Pure math; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`maxLat`	Yes	Bounding-box maximum latitude
`maxLon`	Yes	Bounding-box maximum longitude
`minLat`	Yes	Bounding-box minimum latitude
`minLon`	Yes	Bounding-box minimum longitude

Tool Definition Quality

A3.5/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It mentions 'Pure math; price 0.0 (free)' but lacks details on output format, error handling, or constraints (e.g., valid lat/lon ranges).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no waste. Front-loaded with the action verb and resource.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Minimal but sufficient for a simple math tool. Missing return structure details (e.g., how center and dimensions are formatted). Without output schema, more detail would help.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for each parameter. The description adds no extra meaning beyond the schema, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Compute the center point plus width/height (km) of a geographic bounding box' – a specific verb and resource. It distinguishes from sibling geo tools like geocode or distance.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implicitly suggests usage for bounding box center calculation, but no explicit when-to-use or when-not-to-use guidance, nor mention of alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

geo_intel_distanceAInspect

Great-circle (haversine) distance between two lat/lon coordinates, in kilometres and miles. Pure math; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`lat1`	Yes	First point latitude
`lat2`	Yes	Second point latitude
`lon1`	Yes	First point longitude
`lon2`	Yes	Second point longitude

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses that the computation is pure math, free, and involves no external resources. This is transparent and sufficient for a deterministic function with no side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that efficiently conveys the core purpose, algorithm, units, and pricing. No redundant or extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple mathematical tool with fully described parameters and no output schema, the description provides essential context: method (great-circle), units, and cost. Minor improvement could mention decimal degrees assumption, but overall complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with each parameter having a basic description (e.g., 'First point latitude'). The description adds context about the algorithm (haversine) and output units, but does not significantly enhance parameter understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool computes great-circle (haversine) distance between two lat/lon coordinates, with output in kilometres and miles. This specific verb+resource combination distinguishes it from sibling geo tools like geocoding or timezone lookups.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for distance calculations but does not explicitly state when to use this tool vs alternatives or mention exclusions. The 'Pure math; price 0.0 (free)' gives some context but lacks explicit usage guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

geo_intel_geocodeAInspect

Forward-geocode a place / address query to coordinates via the free OpenStreetMap Nominatim API. Returns ranked results with lat/lon and display name. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	Max results to return
`query`	Yes	Place name or address to geocode

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses the API source, read-only nature, and return format, but does not mention rate limits, error conditions, or coordinate system details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences front-load the core purpose and output. No extraneous information; every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity, the description covers the essential functionality and output. However, no output schema exists and details like error handling or pagination are omitted. Still, it is largely complete for a basic geocoding tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds 'ranked results' context but does not significantly enhance meaning beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool forward-geocodes a place/address query to coordinates using OpenStreetMap Nominatim API, and returns ranked results with lat/lon and display name. It distinguishes from sibling tools like reverse_geocode and distance.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for forward geocoding and notes it's read-only and free, but does not explicitly state when to use versus alternatives, nor provide exclusions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

geo_intel_reverse_geocodeBInspect

Reverse-geocode a lat/lon coordinate to the nearest address / place via OpenStreetMap Nominatim. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`lat`	Yes	Latitude (-90..90)
`lon`	Yes	Longitude (-180..180)

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that the tool is read-only and free (price 0.0), which is helpful for safety and cost considerations. However, it does not describe other behavioral traits such as rate limits, error handling, response format, or what happens when the coordinate cannot be reverse-geocoded. Since no annotations are provided, the description carries the full burden for behavioral disclosure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise—two short sentences—and every word adds value. The first sentence states the core functionality, and the second adds important safety/cost context. There is no fluff or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of an output schema and the simplicity of the tool (two numeric inputs), the description provides adequate high-level context: what it does, what data source it uses, and its cost. However, it omits details about the returned data structure (e.g., address format, fields included) and error handling, which could hinder an agent without additional documentation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Both parameters (lat and lon) are documented in the input schema with descriptions (range constraints). The description does not add any additional meaning beyond the schema; it only provides context about the data source (Nominatim). With 100% schema coverage, the baseline is 3, and the description does not exceed that.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool reverse-geocodes a lat/lon coordinate to the nearest address or place using OpenStreetMap Nominatim. The verb 'reverse-geocode' combined with the resource 'lat/lon coordinate' is specific and unambiguous. However, it does not explicitly distinguish itself from the sibling tool 'geo_intel_geocode' (forward geocoding), so it lacks explicit sibling differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It does not mention scenarios where reverse geocoding is appropriate, nor does it contrast with the forward geocoding sibling 'geo_intel_geocode'. The agent is left to infer usage from the tool name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

geo_intel_timezoneAInspect

Resolve the IANA timezone, abbreviation and current UTC offset for a lat/lon coordinate via the free Open-Meteo API. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`lat`	Yes	Latitude (-90..90)
`lon`	Yes	Longitude (-180..180)

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses read-only and free pricing, but lacks details on rate limits, error handling, or output format. For a simple API, this is minimally adequate but could be more informative.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence plus a short note, with no wasted words. It is front-loaded and efficiently conveys purpose, source, and constraints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 required params, no output schema), the description covers the core behavior and expected return fields. It does not explain what happens on errors or invalid inputs, but is complete enough for standard use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with minimal descriptions. The description adds meaning by specifying the resolution items (IANA timezone, abbreviation, UTC offset), which are not in the schema, helping the agent understand the output.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it resolves IANA timezone, abbreviation, and UTC offset from lat/lon coordinates, using the free Open-Meteo API. This distinguishes it from sibling geo tools like geocoding or distance calculation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when timezone data for coordinates is needed, and notes it's read-only and free. It doesn't explicitly exclude alternatives, but the purpose is clear and siblings are non-overlapping, providing sufficient context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

grant_watch_getCInspect

Get a grant call detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must carry the full burden. It mentions what is returned (firstSeenAt and ledgerVerified) but does not disclose side effects, required permissions, rate limits, or whether it is read-only. The lack of transparency is a concern for a tool that likely performs a read operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is only two sentences, focusing on core information with no filler. It front-loads the action and resource, but could be slightly improved by combining the return info more efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and many sibling tools, the description is too minimal. It does not explain what a 'grant call' is, what the event timeline contains, or how this tool differs from related tools like grant_watch_timeline or grant_watch_recent_changes. More context is needed for correct selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It does not explain what 'itemId' represents (e.g., a grant ID or specific call identifier) or provide any constraints or format. The parameter remains opaque.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get') and the resource ('grant call detail plus full event timeline'). It also mentions specific return values, distinguishing it from general search. However, it does not fully differentiate from the sibling 'grant_watch_timeline' which might also provide an event timeline.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like grant_watch_search or grant_watch_timeline. The description assumes the user knows the context of 'grant call' and when to retrieve details.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

grant_watch_recent_changesAInspect

Recent appearance / deadline-move / close / close-early events across all grant calls since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`field`	No
`limit`	No
`since`	Yes
`funder`	No

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided. The description mentions the types of events and included fields (firstSeenAt, ledgerVerified), which gives some behavioral insight. However, it does not explicitly state that the tool is read-only, nor does it disclose any potential side effects, rate limits, or pagination behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is concise and efficiently communicates the tool's purpose and key output fields. It is appropriately front-loaded but could benefit from bullet points for the event types.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

There is no output schema, so the description partially compensates by mentioning included fields (firstSeenAt, ledgerVerified). However, it does not describe the full return structure (e.g., event type, grant call identifier) or address pagination via the 'limit' parameter. The sibling context helps differentiate the domain but not the tool's placement among grant_watch_* tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%. The description only explains the 'since' parameter as an ISO8601 timestamp. It does not explain the 'field', 'limit', or 'funder' parameters, leaving a significant gap in parameter understanding despite the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it lists recent changes (appearance, deadline-move, close, close-early) across all grant calls since a timestamp. This distinguishes it from sibling tools like bid_watch_recent_changes and other grant_watch tools such as grant_watch_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for monitoring recent grant call events since a timestamp, but does not provide explicit guidance on when to use this tool versus alternatives like grant_watch_search or grant_watch_timeline. No when-not or exclusion criteria are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

grant_watch_searchBInspect

Search Japanese research-grant calls-for-proposals. Each hit includes firstSeenAt and ledgerVerified (hash-chain integrity).

ParametersJSON Schema

Name	Required	Description
`field`	No	研究分野
`limit`	No
`query`	No
`since`	No
`funder`	No	配分機関 (JST/AMED/NEDO 等)
`status`	No
`amountMin`	No

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden for behavioral disclosure. It only states that results include firstSeenAt and ledgerVerified, but does not disclose whether the tool is read-only, any side effects, authentication needs, rate limits, or the absence of mutations. Implies search but lacks explicit safety guarantees.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences with no redundancy. The first sentence states the core purpose, and the second provides relevant output detail. Every word adds value, and it is appropriately front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 7 parameters, no required fields, and no output schema, the description is incomplete. It does not explain search behavior (e.g., pagination, filtering logic, defaults), result structure beyond two fields, or how to use parameters effectively. The agent would lack key context for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 29% (2 of 7 parameters have descriptions). The tool description adds no parameter-level information beyond reiterating the output fields. For undocumented parameters like query, since, status, amountMin, the agent receives no guidance on their meaning or valid values, failing to compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches Japanese research-grant calls-for-proposals, which is specific and distinguishes it from sibling tools like grant_watch_get (by ID) and other domain-specific watch/search tools. It also highlights unique output fields (firstSeenAt, ledgerVerified), adding clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives (e.g., grant_watch_get, grant_watch_recent_changes). The description does not mention prerequisites, exclusions, or context for selection, leaving the agent to infer usage from the tool name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

grant_watch_timelineAInspect

Time-ordered events only for a grant call (the differentiator: when it opened, deadline moved, closed, or closed early). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses included fields (firstSeenAt, ledgerVerified) but does not mention whether the tool is read-only, any side effects, pagination, or rate limits. The behavioral information is partial.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise (two sentences) and front-loaded with the core purpose. It wastes no words but could be better structured with parameter context. However, for its brevity, it serves its purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description is moderately complete. It specifies the events and fields but lacks detail on the parameter and output format. It is adequate but not thorough.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description does not mention the single parameter 'itemId' at all. With 0% schema coverage, the description should explain what itemId represents (e.g., the grant call identifier) but fails to do so, leaving the agent to infer from the tool name.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns 'Time-ordered events only for a grant call' and specifies the types of events (opened, deadline moved, closed, closed early). It distinguishes from sibling grant watch tools by emphasizing 'the differentiator' and mentioning included fields like firstSeenAt and ledgerVerified.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use when a timeline of key grant call events is needed, contrasting with other grant watch tools that provide current state or recent changes. However, it does not explicitly state when not to use this tool or mention any alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

grant_watch_verify_ledgerCInspect

Verify the hash-chain integrity of a grant call (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description must disclose all behavioral traits. It only states the operation (verify integrity) and lists return fields, but doesn't clarify if it's read-only, whether it modifies state, error conditions, or permission requirements. This is insufficient for full transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single clear sentence followed by a list of return fields. It is concise and front-loads the purpose. However, the return list could be formatted more cleanly (e.g., with commas or a structured form). Still, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has one required parameter, no output schema, and no annotations, the description provides the essential purpose and return fields. However, it lacks parameter explanation, error handling notes, and any usage context. It is minimally adequate but not comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0% because the description does not explain the single parameter 'itemId'. The schema just says it's a required string. Without description or context, the agent has no guidance on what value to provide (e.g., a grant ID, a hash, a ledger reference).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states 'Verify the hash-chain integrity of a grant call (tamper detection)', specifying a clear verb and resource. The return fields further clarify the tool's output. Among multiple sibling calls like 'grant_watch_get', 'grant_watch_recent_changes', etc., this is distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when or why to use this tool over alternatives. It doesn't mention prerequisites, typical use cases, or when not to use it. The description leaves the agent to infer usage from the name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kyb_reportAInspect

Turn a company/entity name into a structured English KYB / due-diligence report for foreign-inbound buyers. Aggregates cross-ledger hits (administrative sanctions, licenses, public bids, recalls, etc.) via entity_search, summarizes them (counts + deterministic risk_flags), and attaches an F-037 provenance receipt to EACH hit so the screening is auditable. No model-invented facts: every asserted fact originates from a ledger hit and carries a receipt. Informational only; not legal advice. Price 0.0.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes	Company / entity name (partial match across all ledgers)
`jurisdiction`	No	Jurisdiction code (default 'jp')

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses that facts are ledger-sourced with provenance receipts, no model-invented facts, and that it's not legal advice. It does not mention read-only behavior or rate limits, but the report generation context implies safety.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (4 sentences) with no wasted words. Every sentence adds value: purpose, method, integrity guarantee, disclaimer, and pricing. Front-loaded with the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately describes the output as a structured report with counts, risk_flags, and provenance receipts. It misses details on output format but covers the essential behavioral aspects.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% so baseline is 3. The description adds context about cross-ledger hits and partial matching, enhancing understanding beyond schema defaults. It also implies the jurisdiction parameter's relevance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool converts a company/entity name into a structured KYB report, specifying the use case (foreign-inbound buyers), and distinguishes from sibling tools like entity_search or sanctions_screen_entity by emphasizing the aggregated cross-ledger nature.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for aggregated due-diligence, but lacks explicit guidance on when not to use it compared to specific watch tools. However, the context of foreign-inbound buyers provides clear situational guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

landprice_watch_getAInspect

Get a 地価公示 standard-land record detail plus its full event timeline (price revisions, reissues, vanish events). Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must bear the burden. It mentions the return content (record detail, event timeline, firstSeenAt, ledgerVerified) but omits safety aspects like read-only guarantee, authentication requirements, or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences that front-load the core purpose and add relevant return fields. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description is nearly complete. It lacks parameter explanation, but otherwise covers the output sufficiently.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description does not explain the meaning or format of the itemId parameter. With 0% schema description coverage, the tool description should compensate, but it adds no value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get', the resource 'standard-land record', and includes the additional scope of 'full event timeline' and specific return fields. This distinguishes it from siblings like landprice_watch_search and landprice_watch_timeline.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool versus alternatives. While the purpose is clear, there is no guidance on when not to use it or mention of related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

landprice_watch_recent_changesCInspect

List landprice events observed after the given ISO8601 timestamp. Optional prefectureCode filter.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`prefectureCode`	No

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It only states the basic operation ('list events') but fails to disclose important behaviors such as ordering, pagination, event types (e.g., creation/update/deletion), or any side effects, authentication needs, or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that conveys the core action efficiently. However, omitting details like limit's purpose or output format makes it feel under-specified rather than truly concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and moderate complexity (3 parameters), the description is incomplete. It does not describe the return value structure (e.g., array of events with fields), making it difficult for an agent to correctly parse the response without additional context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must add meaning. It clarifies 'since' is ISO8601 and 'prefectureCode' is optional filter, but does not explain the 'limit' parameter beyond its default value, and prefectureCode format is unspecified.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List landprice events' after an ISO8601 timestamp, with an optional filter. This explicitly defines the action and resource, distinguishing it from other landprice_watch tools (get, search, timeline, verify_ledger).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lacks guidance on when to use this tool versus alternatives like landprice_watch_search or landprice_watch_get. No explicit when-to-use or when-not-to-use context is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

landprice_watch_searchAInspect

Search Japanese 地価公示 (MLIT 国土数値情報 L01) standard-land snapshots. Each hit is one standard point for a given year, ledgered for tamper-detection. Returns ledgerVerified per hit.

ParametersJSON Schema

Name	Required	Description
`year`	No	年度 (e.g. 2026)
`limit`	No
`query`	No	市区町村名 / 所在 / 利用現況部分一致
`areaCode`	No	5-digit prefecture+municipality code (e.g. "13101")
`registry`	No
`prefectureCode`	No	JIS X 0401 都道府県コード (e.g. "13")

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses ledgering for tamper-detection and that each hit returns ledgerVerified, which is key behavioral information for a search tool. With no annotations, the description carries the burden, and it does well to highlight data integrity features. However, it does not mention rate limits, authentication, or pagination behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the main purpose and adding key details about ledgering and return value. Every sentence adds value; there is no extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of output schema and annotations, the description leaves out details like return format, pagination handling, and parameter interaction (e.g., AND/OR logic). It mentions ledgerVerified but not other possible fields. While it covers the core purpose, it is incomplete for a search tool with 6 parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already describes 4 of 6 parameters (year, query, areaCode, prefectureCode) with full descriptions. The tool description does not add any parameter semantics beyond the schema. With 67% schema coverage, a baseline of 3 is appropriate; no additional meaning is provided for limit or registry.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches Japanese 地価公示 standard-land snapshots, specifying the data source (MLIT 国土数値情報 L01) and that each hit is a standard point for a given year with ledger verification. This distinguishes it from siblings like landprice_watch_get (single point) and landprice_watch_verify_ledger (verification).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for searching land price snapshots but does not explicitly differentiate when to use this tool versus alternatives like landprice_watch_get (for single point retrieval) or landprice_watch_timeline (for historical changes). No 'when to use' or 'when not to use' guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

landprice_watch_timelineBInspect

Timeline view of one standard-land point — all observation events with diff, observedAt and chain-hash entries. ledgerVerified is computed end-to-end against IDENTITY_SIGNING_SECRET.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must fully disclose behavior. It mentions 'ledgerVerified is computed end-to-end against IDENTITY_SIGNING_SECRET,' but does not clarify idempotency, side effects, authentication requirements, or output format. This provides only minimal insight into how the tool behaves.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences and mostly front-loaded with the primary purpose. The second sentence about ledgerVerified is somewhat tangential but brief. It is concise, though it could be more streamlined by integrating the verification detail less prominently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter tool with no output schema or annotations, the description explains the content of the timeline (observation events with specific fields) but does not describe the output structure, pagination, error conditions, or how to use ledgerVerified. It provides adequate context for a simple retrieval but leaves notable gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter, itemId, is required but has no description in the schema (0% coverage). The description says 'one standard-land point,' implying itemId identifies the point, but does not specify its format, source, or validation rules. This adds little meaning beyond the parameter name.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides a 'timeline view' of 'one standard-land point' including observation events with specific fields (diff, observedAt, chain-hash). This distinguishes it from siblings like landprice_watch_get (current state) and landprice_watch_search (find points), and the name already differentiates from other watch timeline tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for retrieving history of a single land price point but does not explicitly state when to use this tool versus alternatives like get or search. No exclusions or prerequisites are mentioned, leaving the agent to infer the use case.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

landprice_watch_verify_ledgerAInspect

Verify the hash-chain integrity of a landprice record (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description clearly explains return values (chainValid, brokenAt, etc.) and the read-only nature of verification. It does not mention permissions or rate limits, but for a simple verify tool this is sufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence plus return field list. No redundancy, each part adds value. Easy to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a one-parameter tool with no output schema, the description covers the action and output fields. Lacks detail on return field semantics or potential errors, but overall adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%; the parameter itemId is undocumented. The description only implies its purpose ('of a landprice record') but provides no format, constraints, or additional meaning. The agent must infer that itemId is the record identifier.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action ('verify the hash-chain integrity') and the object ('landprice record'), with tamper detection context. Lists specific return fields, distinguishing it from sibling verify_ledger tools for other record types.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies when to use (when integrity of a landprice record needs checking) but gives no explicit guidance on when not to use or alternatives. With many sibling verify_ledger tools, more context on selection would aid the agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

license_watch_getBInspect

Get a license registration detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It mentions returning firstSeenAt and ledgerVerified, but does not disclose read-only nature, error handling, authentication needs, or limitations (e.g., pagination of timeline).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with front-loaded main action. However, the second sentence about return values could be integrated or expanded, but it is not verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, and the description only partially covers return values (two fields). It omits full response structure, error conditions, and edge cases, leaving the agent underinformed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage for the only parameter (itemId). The description does not explain what itemId represents (e.g., license ID, registration ID), failing to add meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it retrieves a license registration detail and full event timeline, using the verb 'Get' and specifying the resource. This distinguishes it from sibling tools like license_watch_timeline (timeline only) and license_watch_recent_changes (changes only).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs. alternatives like license_watch_timeline. The description implies use when both detail and timeline are needed, but lacks direct comparison or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

license_watch_recent_changesBInspect

Recent appearance / revoked / suspended events across all license ledgers since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`registry`	No

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses that the tool returns events (firstSeenAt, ledgerVerified), implying a read-only query. However, it does not explicitly state that it is non-destructive, nor does it mention rate limits, authorization needs, or pagination behavior. The transparency is adequate but not thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (two sentences) and includes a key detail about the return items. It could be slightly improved by front-loading the verb and resource, but overall it is efficient and avoids verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (no output schema, 3 params), the description provides a basic understanding but omits details about pagination, event structure beyond two fields, and the significance of 'registry'. It is minimally complete but leaves gaps for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description partially compensates by explaining the 'since' parameter as an ISO8601 timestamp. However, the 'limit' and 'registry' parameters are not described at all, leaving their meaning and constraints unclear. This is insufficient for an agent to use the tool effectively.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns recent appearance, revoked, or suspended events across license ledgers, filtered by timestamp. It specifies the resource (license ledgers) and action (watch recent changes), effectively distinguishing it from sibling watch tools for other domains (bids, grants, etc.). However, it could explicitly differentiate from other license_watch tools (e.g., license_watch_get).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. For example, it does not mention that license_watch_get is for current state or that this tool is for historical changes. No context on prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

license_watch_searchCInspect

Search Japanese license / registration ledgers (FSA menkyo: 金融商品取引業者, 預金取扱金融機関 …). Each hit includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No	業者名・登録番号・部分一致
`since`	No
`status`	No
`licensor`	No	許認可権者 (関東財務局長・内閣総理大臣（金融庁）等)
`registry`	No	名簿種別 (fsa-kinyushohin / fsa-ginkou など)

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden for behavioral disclosure. It only mentions that results include firstSeenAt and ledgerVerified, but does not state whether the operation is read-only, destructive, or any rate limits/auth requirements. While search implies read-only, the description is insufficient for a tool with no annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, concise, and front-loaded with the core purpose. Every part is relevant—no wasted words. It efficiently communicates the tool's domain and key output features.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (6 parameters, no output schema, many sibling tools), the description is incomplete. It lacks explanation of pagination, relationship to other license_watch tools (like get or recent_changes), and the meaning of return fields beyond the mention. An agent would need additional context to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 50% with descriptions for query, licensor, and registry. The description adds value by stating that hits include firstSeenAt and ledgerVerified, which clarifies output semantics. However, it does not elaborate on other parameters like limit, since, or status, nor does it compensate with additional context. Overall, it provides marginal improvement over the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches Japanese license/registration ledgers, with specific examples (FSA menkyo). It also mentions output fields (firstSeenAt, ledgerVerified). However, it does not differentiate from sibling tools like grant_watch_search or other domain-specific searches, leaving the agent unclear about when to use this versus similar tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives such as license_watch_get, license_watch_recent_changes, or other watch search tools. There are no when-to-use or when-not-to-use statements, making it difficult for an agent to select appropriately among many similar search tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

license_watch_timelineBInspect

Time-ordered events only for a license registration (the differentiator: when it appeared, when it was revoked / expired / suspended). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions it returns time-ordered events and lists two fields, but it does not disclose whether the operation is read-only, any prerequisites, pagination, or rate limits. The description is too sparse to fully inform an agent about side effects or usage constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at one sentence plus an additional clause, and the key differentiator is front-loaded. However, it could be slightly more structured (e.g., separate usage note) but remains effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description should enumerate typical response fields or event structure beyond the two mentioned. It lacks details on pagination, filtering, or how to interpret the timeline, leaving the agent underinformed about what to expect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter, itemId, has no description in the schema (0% coverage) and the tool description does not explain what itemId represents or how to obtain it. The agent cannot determine how to use this required parameter without additional knowledge.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns time-ordered events for a license registration, explicitly listing event types (appearance, revocation, expiration, suspension) and calling out the differentiator versus sibling tools. It distinguishes itself from tools like license_watch_get or license_watch_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use this tool (for timeline events) and contrasts it with other tools by calling out 'the differentiator'. However, it does not explicitly state when not to use or name alternative tools, leaving some room for ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

license_watch_verify_ledgerAInspect

Verify the hash-chain integrity of a license registration (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description provides transparency about return fields (chainValid, brokenAt, etc.) and the verification nature. However, it lacks details on potential side effects, authentication needs, or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with one main sentence and a clear list of return fields. No extraneous information, though structured formatting of return fields could improve readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the tool's purpose and output but omits explanation of the input parameter and broader context of the license registration. Given the tool's low complexity, it is marginally adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The sole parameter `itemId` is not described in the schema (0% coverage) and the description does not clarify its meaning or expected format. The agent lacks sufficient information to correctly populate this parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool verifies hash-chain integrity of a license registration for tamper detection, using specific verbs and resource. It distinguishes from sibling tools like license_watch_get or license_watch_search by focusing on verification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for tamper detection but does not explicitly state when to use this tool versus other license watch tools or alternative verification methods. No when-not or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

onchain_risk_address_riskAInspect

Heuristic address risk score (0-100) from on-chain reads (code, balance, nonce) plus bundled OFAC-style sanction and flagged lists, over public EVM JSON-RPC. Every contribution is returned in rationale. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`chain`	No	EVM chain (default: ethereum).
`address`	Yes	EVM address (0x-hex)

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses the tool is read-only, free, and returns every contribution in rationale. It explains the data sources (code, balance, nonce, sanction lists) and output (score 0-100). With no annotations, the description effectively communicates behavioral traits, though it could mention rate limits or idempotency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences—concise, front-loaded with the core purpose, and every sentence adds value. No redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description clarifies the output (score range and rationale) and input sources. It is complete for an agent to understand what the tool does and what to expect, without needing additional context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters (chain with enum, address). The description adds no additional semantic value beyond the schema, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool computes a heuristic risk score (0-100) for an address using on-chain data and sanction/flagged lists. It distinguishes from siblings like onchain_risk_approval_risk (approval-specific) and onchain_risk_sanctions_screen (pure sanctions) by combining multiple sources into a single risk score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for a general address risk assessment but does not explicitly state when to use this tool versus siblings like sanctions_screen_check_address or onchain_risk_token_safety. It mentions read-only and free status but lacks explicit when-to-use or when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

onchain_risk_approval_riskAInspect

ERC-20 approval exposure: scans Approval logs granted by the address and flags unlimited / active allowances, via eth_getLogs over public EVM JSON-RPC. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`chain`	No	EVM chain (default: ethereum).
`address`	Yes	EVM address (0x-hex)
`fromBlock`	No	Optional start block for the log scan (default: a recent window).

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description carries the transparency burden. It discloses read-only nature, zero cost, and underlying mechanism (eth_getLogs over public EVM JSON-RPC). However, it omits details like rate limits, block range limitations, or whether the scan is real-time or historical.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. Front-loads the key purpose ('ERC-20 approval exposure'). Every sentence provides value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, so the description should clarify return format or examples. It says 'flags unlimited/active allowances' but does not specify whether it returns a list, boolean, or summary. This leaves the agent guessing about the response structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema already documents all parameters. The description adds minimal value beyond implying the 'address' is the target. No additional explanation of the enum or optional fromBlock is given beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool scans ERC-20 Approval logs for unlimited/active allowances, using a specific verb ('scans', 'flags') and resource ('Approval logs'). It is distinct from siblings like address risk, contract verify, sanctions, and token safety.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions read-only and free, but does not provide explicit guidance on when to use this tool versus alternatives (e.g., onchain_risk_address_risk) or state any prerequisites or exclusions. Usage context is implied but not fully articulated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

onchain_risk_contract_verifyAInspect

Contract / bytecode summary: whether the address is a contract, bytecode size and any embedded CBOR metadata, via public EVM JSON-RPC and block-explorer / Sourcify HTTPS. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`chain`	No	EVM chain (default: ethereum).
`address`	Yes	EVM address (0x-hex)

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Despite no annotations, the description discloses that the tool is read-only and free, and specifies the data sources (public EVM JSON-RPC, block-explorer, Sourcify), providing adequate behavioral context beyond basic constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that front-loads the purpose and includes all essential information without unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 2 parameters, full schema coverage, and no output schema, the description sufficiently explains what the tool returns (contract status, bytecode size, CBOR metadata) and how it works (data sources), making it complete for its complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%; both parameters (chain and address) are adequately described in the schema. The description does not add new meaning beyond what the schema provides, so baseline score applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides a summary of whether an address is a contract, bytecode size, and embedded CBOR metadata via specific data sources. It distinguishes itself from sibling tools like onchain_risk_address_risk by focusing on contract verification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It implicitly indicates when to use (when needing contract info) and declares it is read-only and free. However, it does not explicitly state when not to use or list alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

onchain_risk_sanctions_screenAInspect

Screen an address against the bundled OFAC-style sanction list; returns the sanctioned flag, any matches and the list size. Pure list lookup. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`chain`	No	EVM chain (default: ethereum).
`address`	Yes	EVM address (0x-hex)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It discloses that the tool is read-only and free (price 0.0), which covers safety and cost. It does not mention rate limits or error handling, but for a simple list lookup, the transparency is adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise, consisting of two sentences that convey purpose, return values, and key traits (read-only, free). No extraneous information, every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately explains return values. It also notes cost and read-only nature. However, with many sibling sanctions tools, it could more clearly differentiate its use case, though it's still fairly complete for a simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage, so the description adds minimal value beyond the schema. It provides context about the sanction list but does not enhance understanding of the parameters beyond their schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool screens an address against a bundled OFAC-style sanction list and specifies the return values (sanctioned flag, matches, list size). It distinguishes itself from sibling tools by emphasizing it's a pure list lookup and free, which sets it apart from other sanctions screening tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context by stating it's a pure list lookup, read-only, and free, implying it's suitable for simple checks. However, it does not explicitly exclude scenarios or mention alternatives, such as when to use other sanctions tools instead.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

onchain_risk_token_safetyAInspect

ERC-20 token safety summary via on-chain eth_call reads: name / symbol / decimals / totalSupply, owner and ownership-renounced signal. Public EVM JSON-RPC. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`chain`	No	EVM chain (default: ethereum).
`address`	Yes	EVM address (0x-hex)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It explicitly states 'Read-only' and 'price 0.0 (free)', and mentions it uses 'Public EVM JSON-RPC'. This sufficiently discloses the read-only, cost-free nature and underlying mechanism. It does not contradict any annotations (none present). Minor improvements could include noting rate limits or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the tool's core purpose and capabilities. Every sentence adds value: first sentence defines the tool, second sentence notes it is public, read-only, and free. No extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description lists the fields returned (name, symbol, decimals, totalSupply, owner, ownership-renounced signal), which is sufficient for an agent to understand the output. It doesn't specify the exact output format or representation of the ownership-renounced signal, but overall it is complete enough for a tool with two parameters and a clear purpose.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters ('EVM chain (default: ethereum).' and 'EVM address (0x-hex)'). The description does not add additional semantics beyond what the schema already provides. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides an ERC-20 token safety summary using on-chain eth_call reads, listing specific fields (name, symbol, decimals, totalSupply, owner, ownership-renounced signal). This distinguishes it from sibling tools like onchain_risk_address_risk (broader address risk), onchain_risk_approval_risk (approvals), onchain_risk_contract_verify (source verification), and onchain_risk_sanctions_screen (sanctions). The verb 'safety summary' plus resource 'token' is specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use it (when needing ERC-20 token metadata and ownership renounced status) but does not explicitly state when not to use it or provide alternatives. However, the context is clear given the sibling tool names, and the description highlights it is for token safety specifically.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ordinance_watch_getCInspect

Get an ordinance detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

C2.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided; description does not state read-only nature, side effects, or authorization needs. Only mentions return fields but lacks broader behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no unnecessary content. Efficiently communicates core functionality.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, no annotations, and siblings, the description is too minimal. It omits prerequisites, error conditions, and relationships to other ordinance tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has one parameter (itemId) with 0% documentation coverage. Description does not explain what itemId refers to (e.g., ordinance ID format or source), failing to compensate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it retrieves an ordinance detail plus full event timeline, and mentions specific return fields. This differentiates it from siblings like ordinance_watch_timeline and ordinance_watch_search, though explicit differentiation is lacking.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives like ordinance_watch_timeline (which might return only timeline) or ordinance_watch_search. No context provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ordinance_watch_recent_changesAInspect

Recent appearance / amendment / repeal events across all ordinances since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`issuerCode`	No

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose behavioral traits. It mentions output fields (firstSeenAt, ledgerVerified) which is helpful. However, it does not state that it is read-only, whether pagination works, or any rate limits. The behavior is mostly transparent but lacks depth without annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences that serve distinct purposes. First sentence defines the scope and trigger, second adds output detail. No redundant words or repetition. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no output schema and low parameter coverage, the description is minimally adequate. It explains what the tool returns and the key parameter (since). However, it omits details on limit behavior, issuerCode usage, and possible empty results. Given the simplicity of the tool, it's acceptable but not fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, placing the burden on the description. The description only explains the 'since' parameter as an ISO8601 timestamp. The 'limit' (default 100) and 'issuerCode' (filter) are not described at all. This leaves ambiguity about how parameters affect results, especially the optional filter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns 'Recent appearance / amendment / repeal events' across all ordinances since a timestamp. It distinguishes from sibling tools like ordinance_watch_get (single entity), ordinance_watch_search (search), and ordinance_watch_timeline (chronological). The verb 'recent changes' combined with parameter 'since' makes the purpose precise.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates use 'since the given ISO8601 timestamp', which implies polling for new changes. However, it lacks explicit guidance on when not to use it (e.g., for historical data or single ordinance lookup) and does not mention alternative tools. The context from sibling names suggests alternatives, but no direct exclusion is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ordinance_watch_searchBInspect

Search Japanese national laws / ordinances (e-Gov 法令検索 v2; Stage 1 covers the national level only — 自治体例規 ships in a follow-up). Each hit includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No	法令名 / 法令番号 / 略称・部分一致
`since`	No
`status`	No
`issuerCode`	No	自治体コード JIS X 0401/0402 (国は不要)
`jurisdiction`	No	'国' (Stage 1 only)
`ordinanceType`	No	法律 / 政令 / 省令 / 勅令 / 規則 / 憲法 …

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It mentions returned fields (firstSeenAt, ledgerVerified) but does not describe pagination, default limit, query syntax, or read-only nature. It omits behavioral traits like rate limits or authentication needs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with purpose and scope. It is concise and logically ordered. However, the second sentence about hit fields could be integrated more naturally.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters, no output schema, and no annotations, the description is incomplete. It does not explain query syntax, status meanings, time filtering with since, or result ordering. There is a potential inconsistency: issuerCode hints at local use but description says stage 1 is national only.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description does not add meaning beyond the input schema. Schema coverage is 57%, with descriptions for query, issuerCode, jurisdiction, and ordinanceType. The description adds nothing about parameters, missing an opportunity to explain query syntax or parameter relationships.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches Japanese national laws/ordinances, specifies the data source (e-Gov 法令検索 v2), and distinguishes from siblings by indicating it is a search tool and noting Stage 1 covers only national level. The sibling tools include get, changes, timeline, and verify, which are different operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context about scope (national only) and hints at future support for local ordinances, but does not explicitly state when to use this tool versus alternatives like ordinance_watch_get. It lacks explicit 'when-not-to-use' guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ordinance_watch_timelineAInspect

Time-ordered events only for an ordinance (the differentiator: when it appeared / was amended / was repealed). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must carry the full burden. It discloses that events are time-ordered and include firstSeenAt and ledgerVerified, but omits details like ordering direction (ascending/descending), pagination, limits, or error behavior. This is adequate but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words. The description efficiently communicates the tool's purpose, differentiator, and key fields. It is front-loaded with the core functionality.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is simple but the description leaves gaps: it does not specify output ordering, event structure beyond two fields, or limits. Without an output schema, this omission hampers an agent's ability to interpret results correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter (itemId) is required but not described in the schema (0% coverage). The description implies it's the ordinance ID, but does not explicitly explain its format or how to obtain it. A more explicit description would add value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns time-ordered events for an ordinance, specifically when it appeared, was amended, or repealed. It distinguishes itself from other ordinance tools by emphasizing 'the differentiator' and mentions included fields, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies this is the timeline tool for ordinances but does not explicitly state when to use it versus other ordinance tools like get, search, recent_changes, or verify_ledger. There is no guidance on prerequisites or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ordinance_watch_verify_ledgerBInspect

Verify the hash-chain integrity of an ordinance record (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description carries the full burden. It describes the return fields but does not disclose whether the operation is read-only, permissions required, or any side effects. The tamper detection nature implies read-only, but not explicit.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise single sentence plus a list of return fields. No wasted words, though the return field list could be integrated.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lists return fields (no output schema) and explains purpose. Could elaborate more on the hash-chain mechanism or required permissions, but adequate for a simple verification tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter is itemId, described merely as 'of an ordinance record'. Schema coverage is 0%, so the description must compensate, but it adds minimal additional meaning (no format, example, or elaboration on what constitutes a valid itemId).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool verifies hash-chain integrity for ordinance records (tamper detection). Distinguishes from sibling verify_ledger tools via domain-specific language.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives like other verify_ledger tools. The domain is implied but not contrasted.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pharma_watch_getCInspect

Get a pharmaceutical record detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

C2.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are present, so the description bears full responsibility for behavioral disclosure. It partially fulfills this by stating the return includes firstSeenAt and ledgerVerified, but it omits whether the tool is read-only, what 'full event timeline' entails, or any required permissions. Significant gaps remain.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that directly states the action, object, and key output fields. No unnecessary words or repetition. Optimal conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one param, no nested objects, no output schema), the description provides the basic action and two return fields. However, it does not describe the return structure or other potential fields, leaving the agent partially informed. The absence of an output schema amplifies this gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has a single parameter (itemId) with 0% description coverage, and the description adds no meaning to it. The agent is left without guidance on how to obtain or format itemId. This fails to compensate for the schema deficiency.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool's function: retrieving a pharmaceutical record detail and its event timeline, and mentions specific return fields (firstSeenAt, ledgerVerified). This distinguishes it from sister tools like search or recent_changes, though it could be more explicit about unique value.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives such as pharma_watch_search or pharma_watch_timeline. Without explicit when-to-use or when-not-to-use instructions, an agent may select it incorrectly.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pharma_watch_recent_changesBInspect

Recent approval / NHI-listed / price-revised events across all pharmaceutical records since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`category`	No

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes the type of events (approval, NHI-listed, price-revised) and output fields (firstSeenAt, ledgerVerified), but does not disclose pagination behavior, authentication needs, rate limits, or whether it is read-only. Given no annotations, the description provides basic but insufficient behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (two sentences) and front-loaded with the core purpose. However, it sacrifices necessary detail for brevity, leaving gaps in parameter explanation and usage context.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 3 parameters (including an undocumented 'category' string), no output schema, and no annotations, the description is incomplete. It does not explain the category parameter, the default limit, error handling, or ordering of results, leaving the agent with significant ambiguity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description only indirectly references the 'since' parameter (ISO8601 timestamp) but does not explain the 'limit' parameter or the 'category' parameter at all. With 0% schema description coverage, the description adds minimal semantic value beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns recent approval, NHI-listed, and price-revised events across pharmaceutical records. It specifies the time constraint (since ISO8601 timestamp) and mentions output fields, effectively distinguishing it from siblings like pharma_watch_search and pharma_watch_timeline.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives such as pharma_watch_search or pharma_watch_get. The description implies use for time-based recent changes but does not contrast with other tools or state prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pharma_watch_searchBInspect

Search Japanese pharmaceutical approvals (PMDA) and NHI-listed drugs (MHLW yakka). Each hit includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No	販売名 / 成分名・部分一致
`since`	No
`status`	No
`category`	No	PMDA分野 (第１等) / MHLWセグメント (内用薬等)
`applicant`	No	製造販売業者 / メーカー

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, and the description only mentions that results include 'firstSeenAt and ledgerVerified'. It does not disclose behavioral traits such as rate limits, authorization requirements, or whether the operation is read-only. The description carries the full burden but falls short.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loads the core purpose, and avoids redundancy. Every sentence adds value: the first defines the action, the second details result contents. No fluff or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of annotations, output schema, and only 50% parameter coverage, the description is insufficient. It lacks details on parameter usage, search behavior, pagination, error handling, or any other context that would help the agent use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 50% (3 of 6 parameters have descriptions in the schema). The tool description does not add meaning to these or other parameters. It only provides a high-level statement, not compensating for the schema gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches Japanese pharmaceutical approvals (PMDA) and NHI-listed drugs (MHLW yakka), specifying the domain and data sources. The verb 'Search' and resource scope are explicit, and it distinguishes from sibling pharma_watch tools (get, timeline, etc.) and other domain-specific search tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. No explicit context, exclusions, or prerequisites are mentioned. The agent must infer usage from the tool name and domain alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pharma_watch_timelineBInspect

Time-ordered events only for a pharma record (the differentiator: when it was approved / NHI-listed / price-revised). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations exist, so the description must fully disclose behavior. It mentions return fields (firstSeenAt, ledgerVerified) and that it returns events, but omits side effects, authentication needs, error handling, or performance implications. This is insufficient for a tool with no annotation support.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words: first states purpose and differentiator, second highlights output fields. Information is front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple single-parameter timeline tool with no output schema, the description covers core purpose and key output fields. However, it lacks details on event structure, ordering direction, error responses, and edge cases, leaving gaps for an agent to infer.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 0% description coverage for the single parameter itemId. The description only implies itemId identifies the pharma record ('only for a pharma record') but adds no format, validation rules, or examples, providing minimal added value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it provides 'time-ordered events only for a pharma record' and specifies key event types (approval, NHI-listing, price revision). This clearly differentiates it from other pharma tools like get (single record) or recent_changes. However, it could explicitly contrast with sibling timeline tools for other domains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use when needing a chronological event log for a pharma record, and the differentiator hints at when it's appropriate (approval, listing, price events). But it provides no explicit guidance on when not to use it or what alternatives exist.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pharma_watch_verify_ledgerBInspect

Verify the hash-chain integrity of a pharma record (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description lists return fields (chainValid, brokenAt, etc.), which conveys its verification output. However, it does not disclose whether the tool is read-only, requires authentication, or has side effects, and annotations are absent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is one sentence with a list of return fields, no fluff or redundant information. It is front-loaded with the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, no output schema), the description is moderately complete. It mentions key return fields but does not explain how to obtain the itemId or any usage prerequisites.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description does not explain the 'itemId' parameter. It is not even mentioned in the description text, leaving the agent to infer its meaning from context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states exactly what the tool does: 'Verify the hash-chain integrity of a pharma record (tamper detection).' It clearly distinguishes from sibling tools like pharma_watch_get or search by specifying the verification purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives such as pharma_watch_get or search. The description does not state prerequisites or context for use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

price_oracle_convertAInspect

Convert an amount between any two supported currencies (crypto or fiat), routing each leg through its USD value. Returns the rate and converted amount. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`to`	Yes	Target currency (crypto ticker or fiat ISO code)
`from`	Yes	Source currency (crypto ticker or fiat ISO code)
`amount`	Yes	Amount to convert

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Despite no annotations, the description discloses key behaviors: it is read-only (no side effects), free (price 0.0), and routes through USD. This provides good transparency beyond what schema offers.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise (two sentences, 30 words) and front-loaded with the core purpose. Every sentence adds value with zero waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple 3-param tool without output schema, the description covers the conversion method, return info (rate and converted amount), and safety (read-only, free). It lacks detail on supported currencies or error handling but is sufficient for basic use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for each parameter. The description adds context about the USD routing but does not significantly enhance parameter meaning beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool converts an amount between any two supported currencies (crypto or fiat), which is a specific verb+resource combination. It implicitly distinguishes from sibling tools like price_oracle_crypto_price or price_oracle_fx_rate by focusing on conversion with an amount.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide guidance on when to use this tool versus alternatives (e.g., price_oracle_fx_rate for just a rate). There is no mention of when-not-to-use or preferred contexts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

price_oracle_crypto_priceAInspect

Live crypto spot price via the CoinGecko public API. Returns the price of symbol in the requested fiat quote. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`quote`	No	Fiat quote currency (ISO code; default: USD)
`symbol`	Yes	Crypto ticker (e.g. BTC, ETH)

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Mentions read-only and free (price 0.0). No annotations provided, so description carries full burden. Lacks details on potential rate limits or data freshness.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences covering core purpose and key attributes. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequately covers all aspects for a simple 2-parameter tool without output schema. Clear what it does, how to use, and cost.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so description adds little beyond schema. Mentions returning price in requested fiat quote, but schema already indicates default USD.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

States clear purpose: returns live crypto spot price in fiat. Specifies source (CoinGecko public API) and differentiates from siblings like price_oracle_convert or price_oracle_price_history.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies real-time usage via 'Live' but does not explicitly state when to use vs alternatives like price_oracle_convert. No when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

price_oracle_fx_rateAInspect

Fiat FX rate via the Frankfurter / ECB reference API. Returns the to units per one from unit plus the reference date. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`to`	Yes	Quote fiat currency (ISO code)
`from`	Yes	Base fiat currency (ISO code)

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description bears full burden. It discloses read-only nature, zero cost, and the data source (Frankfurter/ECB). However, it does not mention error handling, rate limits, or data freshness, leaving some uncertainty about behavior under edge cases.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences cover purpose, source, output format, and cost. No word is wasted, and critical information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has only two parameters and no output schema, the description covers essentials: data source, input meaning, output shape, and cost. It lacks explicit mention of response format (e.g., JSON), but is otherwise complete for a simple read tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema descriptions already define 'from' and 'to' as base and quote fiat currencies with ISO codes (100% coverage). The description adds value by explaining the return ratio ('to units per one from unit'), clarifying the semantic meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides fiat FX rates via the Frankfurter/ECB API and specifies the return format (to units per one from unit plus reference date). This distinguishes it from sibling tools like price_oracle_crypto_price or price_oracle_convert by explicitly focusing on fiat currencies.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions read-only and free, but does not explicitly state when to use this tool versus alternatives. Implicit differentiation via 'fiat' is present, but no direct guidance on when not to use it or which sibling to choose for conversions or crypto rates.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

price_oracle_price_historyAInspect

OHLC price history for a crypto symbol via the CoinGecko public API, with a min / max / change summary. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`days`	No	Look-back window in days (default: 7).
`symbol`	Yes	Crypto ticker (e.g. BTC, ETH)

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

States read-only and free, which is good for a no-annotation case, but lacks details like rate limits, response size, or any side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no filler. Front-loaded with key info: OHLC, source, summary, cost.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks output schema, so description should specify full return structure beyond min/max/change. Also doesn't mention default days (7) which is in schema but not description.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema already describes both parameters well (symbol and days). Description adds no extra meaning beyond the schema's own descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it provides OHLC price history with summary, and distinguishes from sibling tools like price_oracle_crypto_price (current price) and price_oracle_convert (conversion). Notes read-only and free aspect.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied use for historical data, but no explicit when-to-use or alternatives among siblings. Does not mention when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

price_oracle_stablecoin_pegAInspect

Stablecoin peg check: signed deviation (basis points) of the current price from the 1 USD target, with a banded status. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`symbol`	Yes	Stablecoin ticker.

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses read-only nature and that price field is 0.0 (free), adding useful context beyond schema. No annotations provided, so description carries full burden; it does well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One concise sentence with no unnecessary words. Front-loaded with key purpose and output details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description explains return includes deviation and banded status, and clarifies price is free. Sufficient for a simple check tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a single parameter fully described by enum and description. The description adds no extra semantic meaning beyond what schema provides, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks stablecoin peg, provides signed deviation in basis points from $1 target, and a banded status. It is distinct from sibling tools like price_oracle_crypto_price and price_oracle_fx_rate.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or when-not-to-use guidance, but the purpose is clear enough to infer usage context. Missing explicit alternatives or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pubcom_watch_getBInspect

Get a public-comment notice detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must disclose behavioral traits. It only states what is returned, not details like idempotency, error handling, or data freshness. Minimal transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that quickly communicates the core functionality, though it could include slightly more context without losing conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Without an output schema, the description should fully explain the return value. It mentions only two fields and vaguely refers to 'full event timeline', leaving the agent uncertain about the complete output structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% and the description adds no meaning to the only parameter 'itemId'. It does not specify what constitutes a valid ID or provide examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves a public-comment notice detail and full event timeline, and mentions specific return fields. This distinguishes it from sibling tools like pubcom_watch_search and pubcom_watch_recent_changes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when a specific item ID is known, but does not explicitly state when to use this vs alternatives or provide any exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pubcom_watch_recent_changesCInspect

Recent appearance / deadline-move / close / result-published events across all notices since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`agency`	No

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations exist, so description bears full burden. It discloses that events include 'firstSeenAt' and 'ledgerVerified' fields, but omits critical behavioral traits like destructive actions, authentication, rate limits, pagination behavior, or error scenarios.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first defines the action and key parameter, second describes output fields. Efficient and front-loaded, but could be slightly improved by briefly noting optional parameters.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 3 parameters, no output schema, and no annotations, the description is insufficient. It fails to explain optional parameters (limit, agency) and the full return structure, making it hard for an agent to use without additional context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. The description only explains the 'since' parameter (ISO8601 timestamp). 'limit' and 'agency' parameters are not mentioned, leaving the agent to infer or guess their purpose.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool retrieves recent events (appearance, deadline-move, close, result-published) across all notices based on a timestamp. It distinguishes from other pubcom tools like get, search, timeline, and verify_ledger, but does not explicitly contrast with these siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Indicates usage by providing a timestamp ('since') to filter events, implying when to use it (for recent changes). However, no when-not-to-use guidance or mention of alternative tools for specific notice lookups (e.g., pubcom_watch_get) is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pubcom_watch_searchAInspect

Search e-Gov public-comment notices. Each hit includes firstSeenAt and ledgerVerified (hash-chain integrity).

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No
`since`	No
`agency`	No	所管府省・行政機関
`status`	No

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It partially addresses response structure by noting that hits include firstSeenAt and ledgerVerified, but lacks details on pagination, error behavior, or implicit read-only nature.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: one for the main purpose and one for notable output fields. It is front-loaded and contains no redundant information, achieving maximum conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema and annotations, the description provides minimal but useful context about output fields. However, it omits details on parameter usage, required combinations, and response format, leaving gaps for an agent to use effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 20% schema description coverage, most parameters lack descriptions. The tool description does not elaborate on any parameter meanings beyond what is inferable from names. The agency parameter has a Japanese description, which may be unclear.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Search' and the resource 'e-Gov public-comment notices'. It distinguishes from siblings like 'pubcom_watch_get' by implying this tool returns multiple results. The mention of output fields adds specificity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains what the tool does but provides no guidance on when to use it versus alternatives (e.g., when to use pubcom_watch_get for a single notice). No prerequisites or when-not-to-use indications are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pubcom_watch_timelineAInspect

Time-ordered events only for a notice (the differentiator: when it opened, deadline moved, closed, or result was published). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided. Description mentions the tool returns events with fields firstSeenAt and ledgerVerified, suggesting read-only behavior. However, it does not disclose potential limitations (e.g., rate limits, data freshness, error handling when itemId is invalid). Without annotations, more detail would be beneficial.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a single sentence with parenthetical examples, making it efficient. However, the phrase 'the differentiator' is slightly unclear; front-loading key action and separating details could improve clarity. Still, it is appropriately sized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With one simple parameter and no output schema, the description covers the basic purpose and returned fields. However, it lacks information about pagination, ordering, date ranges, or any filtering capabilities. Given the simplicity, it is marginally adequate but could be more thorough.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% – the schema defines one required parameter 'itemId' but provides no description. The tool description does not elaborate on the parameter beyond being an identifier for the notice. No format, example, or constraints are given, so the description adds minimal value for parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'time-ordered events only for a notice' and lists specific event types (opened, deadline moved, closed, result published). It distinguishes from sibling tools like pubcom_watch_get which likely retrieves current state, and pubcom_watch_recent_changes which covers changes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied usage: use this tool to get a chronological event history for a specific notice. Does not explicitly state when not to use it or mention alternatives, but the sibling tool names provide implicit differentiation. Could be improved by directly contrasting with pubcom_watch_get and pubcom_watch_recent_changes.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pubcom_watch_verify_ledgerCInspect

Verify the hash-chain integrity of a notice (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It mentions the tool performs tamper detection and returns specific fields, but it does not address side effects, permissions, or any constraints beyond the return values.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences, front-loading the purpose and listing return fields efficiently. No unnecessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has only one required parameter and no output schema or annotations, the description partially fulfills completeness by explaining purpose and output fields. However, the lack of parameter description and usage context leaves gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds no meaning to the single parameter 'itemId' beyond what is in the input schema. Schema description coverage is 0%, so the description should have explained the parameter's purpose, but it did not.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('verify'), the resource ('hash-chain integrity of a notice'), and lists return fields. It is specific enough to understand the tool's function, though it does not explicitly differentiate from sibling verify_ledger tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The description lacks context for selecting it among similar sibling tools like bid_watch_verify_ledger or grant_watch_verify_ledger.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

realestate_watch_getBInspect

Get a real-estate transaction record detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It reveals that the tool is a read operation ('Get') and specific return fields, but does not disclose important behavioral details such as required authentication, potential rate limits, data freshness, or whether the timeline is paginated or truncated.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise—two sentences with the purpose front-loaded. However, it could be slightly more informative about the parameter without becoming verbose, so a 4 is appropriate.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema and rich sibling tools, the description should provide more context. It mentions return fields but does not describe the structure of the 'full event timeline' or any constraints (e.g., pagination). The lack of parameter explanation further reduces completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has one parameter (itemId) with no description, and schema description coverage is 0%. The description does not add any meaning to this parameter; it does not explain what itemId represents, its format, or how to obtain a valid ID. This leaves the agent without sufficient information to correctly invoke the tool.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Get a real-estate transaction record detail plus full event timeline.' It also specifies return fields (firstSeenAt and ledgerVerified), distinguishing it from siblings like realestate_watch_search (which likely returns lists) and realestate_watch_timeline (which may only return timeline).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives. The description implies it is used to retrieve a specific record by itemId, but does not mention prerequisites, alternatives, or situations where other tools (like realestate_watch_search) would be more appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

realestate_watch_recent_changesAInspect

Recent appearance / revised events across all real-estate records since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`areaCode`	No

Tool Definition Quality

A3.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, and the description only states that items include firstSeenAt and ledgerVerified. It does not disclose whether the tool is read-only, any rate limits, pagination behavior, or what happens if the timestamp is too far back. The behavioral scope is minimal.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, both essential: one states the core functionality and the other lists the key output fields. No wasted words, and the most important information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description provides some output structure (firstSeenAt, ledgerVerified) but lacks details on pagination, sorting order, or other fields. Without an output schema, more detail would help the agent understand the response format fully.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. The description only adds meaning for the 'since' parameter (ISO8601 timestamp). The 'limit' and 'areaCode' parameters are not mentioned, leaving their purpose unexplained.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns recent appearance/revised events for real-estate records since a timestamp, with specific fields (firstSeenAt, ledgerVerified). This distinguishes it from sibling tools like realestate_watch_get (single record), realestate_watch_search (search), realestate_watch_timeline (timeline), and realestate_watch_verify_ledger (verification).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies it is for events since a given ISO8601 timestamp, providing clear context. However, it does not explicitly mention when to use this tool versus search or timeline, nor does it state when not to use it, though the purpose is implicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

realestate_watch_searchBInspect

Search Japanese real-estate transaction prices (MLIT reinfolib XIT001). Each hit is a single trade snapshot, ledgered for tamper-detection. Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No	地区名 / 物件種別 / 都道府県名・部分一致
`period`	No	"YYYY-QN" (e.g. "2024-Q1")
`areaCode`	No	JIS X 0401 都道府県コード (e.g. "13")
`priceType`	No	"transaction" (Stage 1)

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries the burden. Mentions ledgering for tamper-detection and included fields (firstSeenAt, ledgerVerified), giving some behavioral insight. However, does not cover rate limits, authentication, or other potential side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three terse sentences: purpose, output nature, and notable fields. Every sentence adds value with no redundancy or unnecessary detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and no annotations, the description is reasonably complete: explains source, output as trade snapshots, and verification features. Could elaborate on pagination or parameter combination but overall adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 80%, so baseline is 3. Description does not add extra meaning to parameters beyond what schema provides. However, it clarifies output structure, which indirectly helps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

States it searches Japanese real-estate transaction prices from a specific source (MLIT reinfolib XIT001). Clear verb+resource, but does not explicitly distinguish from other watch search tools (e.g., bid_watch_search, landprice_watch_search).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives or prerequisites. Lacks explicit context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

realestate_watch_timelineAInspect

Time-ordered events only for a real-estate record (the differentiator: when it appeared / was revised). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It discloses that the tool returns time-ordered events and includes specific fields (firstSeenAt, ledgerVerified), but omits behavioral details such as pagination, sorting, rate limits, authentication requirements, or error handling. This is adequate but not thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with two clauses, efficiently conveying the core purpose and key differentiators. While it is not highly structured, it avoids verbosity and remains easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description partially explains return data (time-ordered events, specific fields). However, it does not mention whether events are paginated, how they are sorted, or any limits. For a simple one-parameter tool, this is minimally complete but has clear gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter (itemId) has 0% schema description coverage. The tool description does not explicitly explain what itemId represents or provide any additional semantic meaning beyond the implied 'identifier for the real-estate record'. The description fails to compensate for the lack of schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides 'time-ordered events' for a real-estate record, specifying the differentiator ('when it appeared / was revised') and mentioning included fields (firstSeenAt, ledgerVerified). This distinguishes it from sibling timeline tools by domain and from other realestate_watch tools by function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for retrieving the timeline/history of a record but does not explicitly state when to use this tool versus alternatives like realestate_watch_get, realestate_watch_recent_changes, or other timeline tools. No exclusion criteria or when-not-to-use guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

realestate_watch_verify_ledgerAInspect

Verify the hash-chain integrity of a real-estate record (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses return fields (chainValid, brokenAt, etc.) and that it checks tamper detection. With no annotations, provides good but not exhaustive transparency; lacks mention of authentication or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with clear purpose and return list. No wasted words; efficient and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequately explains purpose and return values despite missing output schema. Could mention that itemId references a real-estate record, but overall complete for a simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% and description does not describe the sole parameter 'itemId'. While the parameter is simple, the agent receives no guidance on what value to provide.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Verify the hash-chain integrity of a real-estate record (tamper detection)', specifying verb and resource. Distinguishes from other verify_ledger tools via domain (real estate).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied usage: verify a real-estate record's integrity. No explicit when/when-not or comparison to alternatives, though domain specificity helps differentiate from sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recall_watch_getCInspect

Get a recall detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose behavioral traits. It only mentions return fields; it does not address mutability, permissions, rate limits, or side effects. For a read operation, it lacks safety confirmation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very short (two sentences) and front-loads the core action. However, it could be slightly more informative without losing conciseness. Every sentence adds value, but the brevity leaves gaps.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of output schema, annotations, and parameter descriptions, the tool's description is incomplete. It does not explain the broader context of recall details, the structure of the timeline, or how this tool fits with sibling watch tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has a single required parameter (itemId) with 0% description coverage. The description does not explain what itemId represents or how to obtain it, leaving the parameter semantically opaque.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves a recall detail and a full event timeline. It specifies two returned fields (firstSeenAt, ledgerVerified), which adds specificity. However, it does not explicitly differentiate from sibling tools like recall_watch_search or recall_watch_timeline, though the verb 'get' implies retrieval by ID.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives such as recall_watch_search (for searching) or recall_watch_timeline (possibly for timeline-only). The description does not mention prerequisites, exclusions, or contextual scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recall_watch_recent_changesAInspect

Recent appearance / severity-escalated events across all recalls since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`agency`	No

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are absent, so the description must convey behavioral traits. It discloses that events are 'recent appearance/severity-escalated' and output includes 'firstSeenAt and ledgerVerified', indicating read-only information retrieval. However, it does not mention pagination, rate limits, or whether authentication is required, leaving gaps in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single front-loaded sentence that efficiently conveys the tool's purpose and key output fields. No superfluous text; every part earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema and no annotations, the description should provide richer context. It fails to specify the return format (list? array?), pagination behavior, or prerequisites (e.g., need a watch ID?). The brief mention of output fields helps but is insufficient for a complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so the description must compensate. It explains the 'since' parameter (ISO8601 timestamp) but does not describe 'limit' (default 100) or 'agency' (string). With 3 parameters and only one elaborated, the description adds marginal value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns 'Recent appearance / severity-escalated events across all recalls since a timestamp', specifying the verb (watch/recent changes), resource (recalls), and scope (all recalls). It distinguishes from siblings like recall_watch_get (single watch) and recall_watch_search (search recalls) by focusing on time-based change events.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool is for fetching changes after a given timestamp, but does not explicitly state when to use it vs alternatives like recall_watch_timeline or recall_watch_verify_ledger. No when-not or prerequisite guidance is provided, making the usage context clear but incomplete.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recall_watch_searchBInspect

Search Japanese product / food recall notices (consumer-affairs-agency aggregator). Each hit includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No	商品名 / 事業者名・部分一致
`since`	No
`agency`	No	所管 (消費者庁等)
`status`	No
`recallClass`	No	リコール区分 (返金／回収 / 回収命令 / 注意喚起等)

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are present, so the description bears full responsibility for behavioral disclosure. It mentions result fields but fails to describe read-only nature, rate limits, pagination, or authentication requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise—one sentence plus a result note—but could front-load the key action and resource. No redundancy, though the result mention could be integrated more efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description helpfully notes result fields. However, it lacks guidance on pagination, default behavior, or how to use the six parameters effectively, which would improve completeness for a search tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 50% coverage with descriptions in Japanese for some parameters, but the tool description adds no extra meaning to parameters. It only mentions result fields not in the input schema, leaving parameter semantics under-explained.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches Japanese product/food recall notices from a consumer-affairs-agency aggregator, with specific result fields. This distinctively sets it apart from sibling tools like recall_watch_get or recall_watch_timeline.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for searching recalls but does not explicitly differentiate when to use this versus other recall tools (e.g., get, timeline, verify). No alternatives are mentioned, leaving the agent to infer context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recall_watch_timelineAInspect

Time-ordered events only for a recall (the differentiator: when it appeared, when severity escalated, when it was completed). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions output fields (firstSeenAt, ledgerVerified) but does not disclose read-only nature, required authentication, rate limits, or full response behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, very concise and front-loaded with purpose. However, it lacks parameter explanation and full behavioral detail, which slightly reduces efficiency.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description should cover more. It identifies the tool's purpose and mentions two output fields but does not describe the full output or the parameter.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter itemId is not explained in the description. Schema description coverage is 0%, and the description adds no semantic context beyond the parameter name.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides time-ordered events for a recall, with specific examples (when it appeared, severity escalated, completed), which distinguishes it from sibling tools like recall_watch_get or recall_watch_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives context that this tool is for time-ordered events and mentions specific fields, but does not explicitly state when to use this versus alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recall_watch_verify_ledgerBInspect

Verify the hash-chain integrity of a recall record (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations exist, so the description carries full burden. It discloses return fields (chainValid, brokenAt, etc.) and implies a read operation, but lacks details on error behavior, authentication requirements, or rate limits. Adequate but not thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with a list of return fields, which is concise and front-loaded. However, it could be more structured with clear sections, but it avoids unnecessary verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description covers purpose and return fields. It lacks usage guidelines and parameter details, making it minimally viable but not fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, meaning the description must explain the parameter 'itemId'. The description only mentions 'recall record' but does not clarify that itemId is the record identifier or its expected format, adding minimal value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: verifying hash-chain integrity for tamper detection. It specifies the resource (recall record) and action (verify ledger), differentiating it from sibling tools like recall_watch_get or other verify_ledger tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. Given the presence of many sibling verify_ledger tools and related recall_watch tools, the absence of selection criteria leaves the agent without contextual direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanctions_screen_by_countryAInspect

Summarize sanction programs and entity counts associated with a country (name or ISO-3166 code) across the consolidated lists. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`country`	Yes	Country name or ISO-3166 alpha-2 code

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses read-only and cost-free nature, which is sufficient given no annotations. Does not mention potential side effects or response format, but for a summary tool this is adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no fluff, front-loaded with action. Every word earns its place. Ideal length.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, description provides sufficient high-level idea of output (sanction programs and entity counts). Could be more detailed, but complexity is low. Complete for its scope.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema already covers the parameter with description, so baseline is 3. Description adds no new semantic info beyond what is in schema, but it reinforces the purpose.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool summarizes sanction programs and entity counts for a country, using specific verb and resource. Distinguishes from sibling tools like sanctions_screen_entity which likely screens individual entities.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implicitly indicates usage for country-level overview, but does not explicitly contrast with sibling tools like sanctions_screen_entity or list_programs. The read-only and free note helps, but no direct when-to-use or when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanctions_screen_check_addressAInspect

Screen a physical or crypto address string against address entries in the consolidated sanctions lists. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`address`	Yes	Physical or crypto address to screen

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden for behavioral disclosure. It states the operation is read-only and free, and clarifies the scope ('address entries in the consolidated sanctions lists'). However, it does not describe the output format, error handling, or authentication requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, each adding value. It front-loads the purpose and includes essential behavioral indicators (read-only, free). No filler or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (1 parameter, no output schema), the description covers input and action but omits output details (e.g., what the result looks like). For a screening tool, an agent might benefit from knowing the return format, making the description slightly incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% (the 'address' parameter has a clear description). The tool description adds the context that the screening is against 'address entries in the consolidated sanctions lists,' but does not enhance the parameter's meaning (e.g., format or length constraints). Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Screen') and resource ('address entries in the consolidated sanctions lists'), clearly distinguishing it from sibling tools like sanctions_screen_entity (which screens entities by name) or onchain_risk_sanctions_screen (likely for on-chain addresses). It specifies both physical and crypto addresses.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use the tool (when you have an address to check against sanctions lists) but does not explicitly mention when not to use it or provide alternatives among siblings (e.g., sanctions_screen_by_country, sanctions_screen_entity). The 'read-only; free' note provides context but no exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanctions_screen_entityAInspect

Screen an entity name against public consolidated sanctions lists (OFAC SDN / UN / EU), fetched at request time. Returns scored fuzzy matches with programs and countries. Informational only, not legal advice. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`name`	Yes	Entity / person / vessel name to screen
`limit`	No	Max matches to return
`types`	No	Restrict to sanction subject types.
`minScore`	No	Minimum match score 0-100 (default heuristic threshold)

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that the tool is read-only, free (price 0.0), and uses fuzzy matching with scores. It does not mention rate limits, caching behavior, or what happens if the entity is not found. This is adequate but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (3 sentences) and front-loaded with the core action. It covers the main purpose, data sources, output type, and disclaimers without unnecessary fluff. Could be slightly improved by listing the supported sanction types inline.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 4 parameters and no output schema, the description provides reasonable context. It explains the sources, return nature, and pricing. However, it lacks details about error handling, response structure, or rate limits, which would be helpful for complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds minimal extra meaning beyond the schema: it mentions 'scored fuzzy matches' which relates to minScore, but it does not elaborate on the other parameters (limit, types) beyond what the schema already defines.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool screens an entity name against public consolidated sanctions lists (OFAC SDN / UN / EU) and returns scored fuzzy matches. It distinguishes itself from sibling tools like sanctions_screen_by_country or sanctions_screen_list_programs by specifying the scope (entity names) and the output (scored matches with programs/countries).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions 'Informational only, not legal advice' and that it is read-only and free, giving some context. However, it does not explicitly state when to use this tool versus alternatives like sanctions_screen_by_country or sanctions_screen_check_address, nor does it provide explicit prerequisites or caveats about data freshness beyond 'fetched at request time'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanctions_screen_list_programsAInspect

List sanction programs known across the consolidated lists, with per-program entity counts. Optionally filter by source. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`source`	No	Optional source id filter (e.g. ofac / un / eu)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It states read-only and free, which are key behavioral traits. No mention of rate limits or auth, but adequate for a simple list tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two highly informative sentences, front-loading core purpose, then optional filter and behavioral info. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one optional param and no output schema, the description covers purpose, filter, and behavioral traits. Could mention output format, but not critical.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema already describes the source parameter. The description merely reiterates the filter option without adding new meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists sanction programs with entity counts and optional source filtering, distinguishing it from screening tools like sanctions_screen_entity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use when a list of programs is needed, but lacks explicit guidance on when not to use or alternatives. The 'Read-only; price 0.0' adds some context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanctions_screen_list_sourcesAInspect

List the sanctions data sources queried (OFAC SDN / UN / EU) with metadata. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations exist, so description carries full burden. It states read-only and free (price 0.0), which are key behavioral traits. No other hidden behaviors likely.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One sentence with two clauses; concise and front-loaded. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With zero parameters and no output schema, the description fully informs what the tool returns (list of sources with metadata) and its cost/free status. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has no parameters, so description doesn't need to add param info. Baseline 4 for zero-param tool.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists sanctions data sources (OFAC SDN, UN, EU) with metadata. It distinguishes itself from sibling tools like sanctions_screen_entity by focusing on listing available sources rather than screening entities.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage: to view available sanctions lists before screening. It lacks explicit when-not or alternatives, but the purpose is clear enough for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanction_watch_getAInspect

Get a sanction detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must disclose behavioral traits. It states returns (firstSeenAt, ledgerVerified) and implies a read operation (Get), but does not mention idempotency, authorization, rate limits, or side effects. Adequate but not thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two short sentences with no redundancy. It front-loads the action and result. Could be slightly improved by integrating parameter info, but remains concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given one parameter, no output schema, and no annotations, the description covers the basic action and return values but lacks parameter explanation and usage context. It is minimally viable but has gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The sole parameter 'itemId' is not described in the schema (0% coverage) and the description does not explain its meaning or format. The agent must infer from the name, which is ambiguous ('itemId' could be any identifier).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get' and the resource 'sanction detail plus full event timeline', and distinguishes the tool from sibling watch tools by specifying 'sanction'. It also mentions specific return fields (firstSeenAt and ledgerVerified), making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives like sanction_watch_search or other watch_get tools. The context implies usage when a sanction itemId is known, but lacks when-not-to-use or sibling comparisons.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanction_watch_recent_changesCInspect

Recent appearance / lift events across all sanctions since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`regulator`	No

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses return fields but omits behavioral details like read-only nature, pagination, rate limits, or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences with front-loaded purpose and return field info. No wasted words, but could add parameter details without becoming verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a listing tool with no output schema, the description should explain pagination behavior and result structure. Missing these leaves the agent guessing about handling large result sets.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description only addresses the 'since' parameter implicitly. The 'limit' and 'regulator' parameters are unexplained, leaving the agent uncertain about pagination and filtering.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns recent appearance/lift events across all sanctions since a timestamp, specifying return fields. However, it does not explicitly differentiate from sibling watch tools like sanction_watch_search or sanction_watch_get.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance provided on when to use this tool vs. alternatives such as sanction_watch_search or sanction_watch_get. The description lacks context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanction_watch_searchAInspect

Search Japanese administrative sanctions (FSA jirei archive). Each hit includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No	被処分者名・部分一致
`since`	No
`status`	No
`regulator`	No	処分庁 (FSA など)
`sanctionType`	No	業務改善命令 / 業務停止等・部分一致

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It adds that each hit includes firstSeenAt and ledgerVerified, which gives some behavioral context. However, it does not disclose whether the operation is read-only, destructive, or any rate limits or authentication requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences with no fluff. It front-loads the core purpose and key output fields. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 6 parameters, no output schema, and no annotations, the description is too brief. It does not explain pagination, parameter interactions, or return format beyond two fields. A more complete description would address these gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 50% (only query and regulator have descriptions). The tool description does not add any additional meaning for the undocumented parameters (limit, since, status, sanctionType). It fails to compensate for the missing schema documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches Japanese administrative sanctions from the FSA jirei archive, with a specific verb and resource. It also mentions returned fields (firstSeenAt and ledgerVerified), which distinguishes it from sibling tools like sanction_watch_get (which likely retrieves a single sanction) and sanctions_screen_entity (global screening).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool is for searching Japanese FSA sanctions, but it does not provide explicit guidance on when to use this tool versus alternatives (e.g., sanctions_screen_entity for non-Japanese sanctions). No 'when not to use' or exclusions are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanction_watch_timelineBInspect

Time-ordered events only for a sanction (the differentiator: when it appeared and when it was lifted). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must fully disclose behavior. It states the tool includes 'firstSeenAt' and 'ledgerVerified' and is time-ordered, but it does not clarify if the operation is read-only, whether it requires special permissions, or what the response format is. This is insufficient for an agent to understand side effects or security context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The single-sentence description is concise and front-loaded with key terms. However, it could be structured more clearly by separating the purpose from the included fields. The brevity is acceptable given the tool's simplicity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of output schema and annotations, the description should cover return value structure and event types. It only mentions 'firstSeenAt' and 'ledgerVerified', leaving ambiguity about other possible event fields. The one-parameter tool is simple, but the description is too sparse for reliable agent invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter 'itemId' is a bare string in the schema with no description. The description also omits any explanation of what itemId represents (e.g., a sanction ID from a search) or its format. With 0% schema coverage, the description fails to add any meaning to the parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly defines the tool as returning time-ordered events for a single sanction, specifically events related to when it appeared and was lifted. It distinguishes from sibling timeline tools (e.g., bid_watch_timeline) by focusing on sanction-specific events.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies it is for obtaining chronological event data for a sanction, but it does not explicitly explain when to use this versus related tools like sanction_watch_get or sanction_watch_search. The phrase 'the differentiator' hints at uniqueness but lacks direct guidance on alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanction_watch_verify_ledgerAInspect

Verify the hash-chain integrity of a sanction record (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It states the tool checks integrity and lists return fields, but does not disclose side effects, permissions, rate limits, or whether the operation is read-only. Partial but incomplete transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that front-loads the verb and resource, followed by a list of return fields. No unnecessary words; every part is useful.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description covers the function and return fields. However, it lacks details on how to obtain itemId and any prerequisites, leaving some gaps for an agent unfamiliar with the domain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0% for the single required parameter 'itemId'. The description does not elaborate on the format, source, or expected value of itemId beyond the tool's name implying it is a sanction record ID. This is minimal addition over the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool verifies hash-chain integrity of a sanction record for tamper detection, listing specific return fields. It distinguishes from siblings by specifying 'sanction record', and within the sanction_watch family, it uniquely verifies ledger integrity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Usage is implied from the description, but no explicit guidance on when to use this versus other sanction_watch tools like get, search, or timeline. No alternatives or exclusions are mentioned, leaving the agent to infer context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

subsidy_watch_getCInspect

Get a subsidy program detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`programId`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description should disclose behavioral traits. It mentions what is returned but does not cover side effects, error handling, rate limits, authentication needs, or data freshness. Minimal transparency beyond the basic action.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very short (one sentence) and front-loaded with the action. However, it is too terse and does not include necessary details. It earns its place but could be more informative without sacrificing conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with one required parameter and no output schema, the description is incomplete. It does not explain what a subsidy program is, the structure of the event timeline, or the context of the returned fields. The agent lacks sufficient information to correctly interpret results.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 0% description coverage for the single parameter 'programId'. The description does not explain what 'programId' is, how to obtain it, or any constraints (e.g., format, required). This is a critical gap, as the agent has no additional meaning beyond the schema's bare type.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves a subsidy program detail and full event timeline, mentioning specific return fields ('firstSeenAt' and 'ledgerVerified'). It distinguishes itself from search and timeline siblings, though not explicitly, but the purpose is clear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like subsidy_watch_search, subsidy_watch_timeline, or subsidy_watch_verify_ledger. The description does not provide any contextual usage hints, leaving the agent to infer.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

subsidy_watch_recent_changesCInspect

Recent appearance / change / close events across all programs since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`category`	No
`issuerCode`	No

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so the description bears full responsibility. It discloses that events include firstSeenAt and ledgerVerified, but does not mention pagination, rate limits, auth requirements, or what happens to deleted events. The phrase 'across all programs' may mislead since optional filters exist.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One sentence, front-loaded with purpose and key output fields. However, it could benefit from a brief list of parameters or a note about optional filters.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having 4 parameters and no output schema, the description is brief. It lacks essential context about how to interpret events, pagination, or how filters work. Completness is low given the tool's complexity and lack of supporting metadata.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, yet the description does not explain any parameter (limit, since, category, issuerCode). It loosely references 'ISO8601 timestamp' which relates to `since`, but fails to specify format, default, or how other parameters affect results.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns recent appearance, change, and close events across all programs since an ISO8601 timestamp. It specifies the items include firstSeenAt and ledgerVerified, distinguishing it from sibling watch tools like get, search, timeline. However, the description does not mention the optional filters (category, issuerCode), slightly reducing clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this vs. alternative sibling tools (e.g., subsidy_watch_search, subsidy_watch_timeline). The description implies it's for recent changes, but does not state when not to use it or what alternatives exist.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

subsidy_watch_searchBInspect

Search the current state of subsidy programs. Each hit includes firstSeenAt and ledgerVerified (hash-chain integrity).

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No
`status`	No
`category`	No
`amountMin`	No
`issuerCode`	No	JIS X 0401/0402 自治体コード

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It adds one behavioral trait (each hit includes firstSeenAt and ledgerVerified for integrity). However, it does not mention whether the operation is read-only, costs, rate limits, or any side effects. It provides minimal but non-contradictory transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, both providing essential information: the purpose and a key feature. No redundant or wasted words. It is appropriately front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a search tool with six parameters, no output schema, and no annotations, the description is far from complete. It lacks explanation of parameters, pagination, error handling, or any context about the response format beyond two fields. A bigger description is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (one of six parameters has a description). The tool description does not add any parameter-level explanations, leaving the agent to infer meaning from names alone. This is insufficient given the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches subsidy programs, but does not distinguish from sibling watch_search tools (e.g., grant_watch_search, bid_watch_search). The verb and resource are specific, but sibling differentiation is missing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. No mention of prerequisites, use cases, or exclusions. The agent is given no context for selecting this tool over siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

subsidy_watch_timelineAInspect

Time-ordered events only for a program (the differentiator: when it appeared, changed, closed). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`programId`	Yes

Tool Definition Quality

A3.5/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description must cover behavioral traits. It minimally indicates a read operation by mentioning 'time-ordered events', but fails to disclose whether the operation is read-only, requires authentication, has rate limits, or any other behaviors. This is insufficient for a tool with no annotation support.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise at two sentences, front-loading the key differentiator and including important output fields (firstSeenAt, ledgerVerified). Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has no output schema and single parameter, the description should be more complete. It mentions output fields but does not clarify if there is pagination, ordering, or how to handle multiple events. The tool's behavior is under-specified, making it hard for an agent to use correctly without external knowledge.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description only adds the context that the tool is 'for a program', implying programId identifies a program. It does not explain the format, expected values, or any constraints of programId, leaving the agent to guess. The description adds very little semantic value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns time-ordered events for a program, explicitly differentiating it from sibling tools by mentioning 'only for a program' and specifying the event types (appeared, changed, closed). This meets the criteria for a specific verb+resource and sibling differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context by stating the tool is for obtaining event timeline data, which implies when to use it (e.g., for tracking changes over time). However, it does not explicitly name alternative sibling tools like subsidy_watch_get or subsidy_watch_recent_changes, so it lacks full exclusion guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

subsidy_watch_verify_ledgerCInspect

Verify the hash-chain integrity of a program (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`programId`	Yes

Tool Definition Quality

C2.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must fully disclose behavior. It lists return fields but does not state read-only nature, authentication needs, rate limits, or any side effects. This is minimal behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with a brief list of return fields. It is front-loaded and free of extraneous words, though it could be slightly more detailed on the parameter.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With one required parameter, no output schema, and no annotations, the description lacks essential context such as how to acquire programId and interpretation of return values. It is incomplete for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%; the parameter 'programId' has no description. The tool description does not explain what a programId is or how to obtain it, failing to add meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it verifies hash-chain integrity (tamper detection) for a program, and the tool name specifies 'subsidy_watch_verify_ledger', distinguishing it from sibling get/search/timeline tools. The verb 'verify' and resource 'ledger' with 'program' input are specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool vs alternatives like get or search, nor does it mention prerequisites for the programId. Usage context is only implied by the verification purpose.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

temporal_queryAInspect

Reconstruct what a ledger item (or items matching a query) officially said AS OF a past date T, by replaying the kept event history. Returns each item's point-in-time state plus an F-037 provenance receipt (observed_at = the state's effective time). existence:false with a first_seen receipt when the item did not yet exist at T; latest state with as_of_clamped:false when T is in the future. Read-only; price 0.0.

ParametersJSON Schema

Name	Required	Description
`as_of`	Yes	ISO-8601 point in time T.
`query`	No	Title substring to reconstruct every matching item (capped at 25). Use instead of item_id.
`ledger`	Yes	Ledger key (e.g. 'sanction', 'license', 'subsidy', 'recall', 'pharma', 'bid', 'grant', 'pubcom', 'ordinance', 'tos', 'realestate', 'landprice').
`item_id`	No	Stable item id (single-item reconstruction).
`jurisdiction`	No	Jurisdiction code (default 'jp').

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses behavior: replaying event history, handling future T (as_of_clamped), non-existent items (existence:false), and return of provenance receipt. Also declares read-only and price 0.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is concise with 4 sentences, front-loaded with the main action. Every sentence contributes meaning without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and no annotations, the description covers all key aspects: behavior for past, future, and non-existent items, return format, and cost. It is complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining the 'query' parameter's behavior (title substring capped at 25) and the overall interaction with parameters like as_of and item_id, enhancing understanding beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool reconstructs historical state of ledger items as of a past date, with specific verb 'reconstruct' and resource 'ledger item'. It distinguishes from sibling tools by focusing on temporal query.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context on when to use (historical reconstruction), mentions alternatives within the description ('Use instead of item_id'), and notes read-only nature and zero cost. Lacks explicit when-not-to-use but is generally clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tos_watch_getAInspect

Get a ToS snapshot detail plus full event timeline. Returns firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so description must bear full burden. It discloses return of 'firstSeenAt' and 'ledgerVerified', but says nothing about authentication, rate limits, or side effects. Safe read is implied but not stated.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no extraneous text. The main verb and resource are front-loaded in the first sentence.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple get-by-ID tool with no output schema, the description adequately states it returns detail and timeline. Mentioning two specific fields adds value, but could be more complete about the response structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter 'itemId' with 0% schema coverage. Description does not explain its required format or meaning beyond being an identifier for the ToS snapshot. Agent must infer from context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool gets a ToS snapshot detail and full event timeline, distinguishing it from siblings like tos_watch_search and tos_watch_timeline. Mentions specific returned fields, making purpose explicit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Context is clear: to retrieve a specific ToS snapshot by its ID, differentiating from search and timeline tools. However, no explicit when-not-to-use or alternatives are stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tos_watch_recent_changesBInspect

Recent revised events across all SaaS ToS documents since the given ISO8601 timestamp. Each item includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No
`since`	Yes
`vendor`	No

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that results include 'firstSeenAt' and 'ledgerVerified', providing output context beyond the schema. However, it does not state that the tool is read-only, nor does it mention rate limits, authentication, or any side effects. With no annotations, the description carries the full burden but only partially addresses it.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the core purpose, followed by a useful output detail. Every word adds value with no redundancy. Highly concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has three parameters, no output schema, and no annotations, the description is incomplete. It omits 'limit' and 'vendor' explanations, does not describe the return format (list? pagination?), and lacks behavioral context. Significant gaps remain for an agent to use it effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description explains the 'since' parameter (ISO8601 timestamp) but ignores 'limit' and 'vendor'. With 0% schema description coverage, the description was expected to compensate but only covers one of three parameters, leaving significant gaps for the agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it retrieves recent revised events of SaaS ToS documents since a timestamp. The verb 'watch recent changes' is specific, and the resource is identified. However, it does not explicitly differentiate from sibling tools like tos_watch_search or other domain watch tools, though the name and context make it distinct enough.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives such as tos_watch_search or tos_watch_timeline. The description lacks any context about prerequisites, exclusions, or typical use cases, leaving the agent to infer usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tos_watch_searchAInspect

Search Japanese / English-language SaaS Terms of Service snapshots (Stripe / Anthropic / AWS / Google Cloud / GitHub …). Stage 1 covers 'terms' docs. Each hit includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description
`limit`	No
`query`	No	タイトル / 本文先頭抜粋・部分一致
`vendor`	No	'stripe' / 'anthropic' / 'aws' / 'gcp' / 'github' …
`docType`	No

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It discloses that search covers 'terms' docs and each hit includes firstSeenAt and ledgerVerified, but does not mention read-only nature, rate limits, or other behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no wasted words. First sentence states purpose and examples; second adds stage scope and hit metadata.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema; description mentions hit content (firstSeenAt, ledgerVerified) but fails to explain the four input parameters (limit, query, vendor, docType). The tool's search capabilities are under-specified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 50%; the description does not explain the query, vendor, or docType parameters. It adds no meaning beyond the schema's existing Japanese descriptions, which are already present.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool searches Japanese/English-language SaaS Terms of Service snapshots, lists example vendors, and specifies it's for 'terms' documents in Stage 1. This distinguishes it from siblings like tos_watch_get or tos_watch_timeline.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied usage for searching snapshot content, but no explicit when-to-use or alternatives. Siblings are not mentioned, so the agent must infer when to use search vs. other TOS tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tos_watch_timelineBInspect

Time-ordered events only for a ToS document (the differentiator: when it appeared and each revision since). Includes firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must disclose behavioral traits. It only mentions return fields (firstSeenAt, ledgerVerified) but does not explain auth needs, rate limits, error handling, or whether the operation is read-only. Minimal transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with key purpose, no redundant information. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema and no annotations, the description is too sparse. It does not specify output format, pagination, errors, or prerequisites. For a tool with a single parameter, it lacks sufficient detail for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. The description mentions 'a ToS document' but does not explicitly define the itemId parameter or provide format/example. It adds little meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides time-ordered events for a ToS document, including firstSeenAt and ledgerVerified. This distinguishes it from sibling tools like tos_watch_get (current state) and tos_watch_recent_changes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for historical timeline by saying 'when it appeared and each revision since', but it does not explicitly state when to use this tool over alternatives like tos_watch_get or tos_watch_recent_changes. No when-not guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tos_watch_verify_ledgerAInspect

Verify the hash-chain integrity of a ToS document (tamper detection). Returns chainValid, brokenAt (if any), checked event count, firstSeenAt and ledgerVerified.

ParametersJSON Schema

Name	Required	Description	Default
`itemId`	Yes

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It discloses return fields but does not mention side effects, authentication, or rate limits. For a read-only verification tool, this is adequate but not thorough.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One concise sentence plus a list of return fields. No unnecessary words; purpose and outputs are front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple schema (1 param) and no output schema, the description covers purpose, inputs, and return fields. Missing some usage guidance but overall complete for the tool's scope.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description mentions 'ToS document' but does not explicitly link itemId to that document. With 0% schema description coverage, the description should provide more detail about the parameter, such as format or example.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('verify the hash-chain integrity') and the resource ('ToS document'), with tamper detection noted. It distinguishes from sibling tools like tos_watch_get or tos_watch_search by focusing on integrity verification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use vs. alternatives. It implies use for tamper detection, but does not discuss exclusions or compare to other verify_ledger siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

verify_receiptAInspect

Verify a provenance receipt (F-037): recomputes the HMAC signature, checks the intra-receipt chain linkage, and — when the receipt carries an external anchor reference — confirms it against the published F-028 anchor. Returns {valid:true, anchor:"verified"|"pending"|"unverified"} or {valid:false, reason:"signature_mismatch"|"chain_link_broken"|"anchor_mismatch"|"malformed"}. Tamper-evident; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`receipt`	Yes	A receipt JSON object as produced by any provenance-emitting tool.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

In the absence of annotations, the description details the verification steps, return value fields, and notes that the tool is tamper-evident and free. However, it does not mention authentication requirements, idempotency, or potential side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, dense sentence that front-loads the core action. It efficiently conveys functionality and return format, though it could be slightly clearer with bullet points or separate sentences.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with one parameter and no output schema, the description fully explains what the tool does, how it verifies, and what the return values mean. No gaps remain for correct invocation or interpretation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The sole parameter 'receipt' is described as 'A receipt JSON object as produced by any provenance-emitting tool,' which adds meaningful context beyond the schema. Schema coverage is 100%, so the description complements it well.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: verifying a provenance receipt (F-037) by recomputing HMAC, checking intra-receipt chain linkage, and confirming external anchor references. It specifies exact return format, making it distinct from any sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates this is a free, tamper-evident verification tool but does not explicitly state when to use it versus alternatives (e.g., content_authenticity_provenance_check). No when-not or prerequisite guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

weather_risk_air_qualityAInspect

Air-quality snapshot (PM2.5, PM10, US / European AQI and a coarse category) for a lat/lon via the free Open-Meteo air-quality API. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`lat`	Yes	Latitude (-90..90)
`lon`	Yes	Longitude (-180..180)

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden and does well by stating 'Read-only; price 0.0 (free)'. This discloses that the tool is safe (no mutations) and has no cost. However, it could add more details like rate limits or source freshness.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that directly states the tool's purpose, key outputs, and essential traits (read-only, free). Every element contributes uniquely, and it is front-loaded with the primary action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity (2 params, no output schema), the description adequately informs an agent about what the tool returns and its safety profile. It could mention that the snapshot is for current conditions, though 'snapshot' implies that. No output schema exists, so completeness is high but not perfect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already covers both parameters (lat, lon) with descriptions. The description merely mentions 'for a lat/lon', adding no extra semantic value or usage hints beyond what the schema provides. Since schema coverage is 100%, baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description specifies the tool provides an 'Air-quality snapshot' including specific pollutants (PM2.5, PM10), both US and European AQI, and a coarse category, for a given lat/lon. This clearly distinguishes it from sibling weather tools like weather_risk_current_weather or weather_risk_forecast, which focus on different aspects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for air quality queries via the Open-Meteo API, but it does not explicitly state when to use this tool over alternatives (e.g., for current vs. forecast air quality) or mention prerequisites or limitations like data freshness.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

weather_risk_current_weatherAInspect

Current weather (temperature, apparent temperature, humidity, precipitation, wind, weather code) for a lat/lon via the free Open-Meteo API. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`lat`	Yes	Latitude (-90..90)
`lon`	Yes	Longitude (-180..180)

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses read-only nature and pricing (free), plus the underlying Open-Meteo API. No annotations provided, so the description carries transparency well. No mention of rate limits or data freshness, but adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence effectively conveys purpose, data elements, source, read-only status, and cost. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lists return variables but no output schema. For a simple current weather tool, this is fairly complete. Could specify units or format, but not essential given the context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameter descriptions (lat/lon ranges). Description adds no extra parameter meaning beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'current weather' and lists specific data elements, distinguishing it from forecast or other weather siblings. However, it does not explicitly contrast with sibling tools like weather_risk_forecast.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied usage for current weather data, but no explicit guidance on when to choose this over alternatives like forecast or air quality. The read-only and free attributes are noted but not contextualized.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

weather_risk_forecastAInspect

Daily weather forecast (temp max/min, precipitation probability, wind, weather code) for up to N days for a lat/lon via the free Open-Meteo API. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description
`lat`	Yes	Latitude (-90..90)
`lon`	Yes	Longitude (-180..180)
`days`	No	Forecast horizon in days (bounded)

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses the tool is read-only and free, and mentions the underlying API (Open-Meteo). However, it does not discuss rate limits, data update frequency, or error handling, which would be helpful given no annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two short sentences, highly concise with no redundant information. Every part adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description lists returned data fields but no output schema is provided. It omits units (e.g., Celsius/Fahrenheit) and data resolution. Adequate for a simple weather tool but could be more complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds vague 'up to N days' but does not clarify bounds or units (e.g., allowed max days, default). The lat/lon descriptions are sufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides a daily weather forecast with specific elements (temp max/min, precipitation probability, wind, weather code) for a lat/lon location. It is distinct from sibling tools like weather_risk_current_weather which provides current conditions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for daily forecasts but does not provide explicit when-to-use or when-not-to-use guidance compared to siblings like weather_risk_current_weather or weather_risk_severe_flags.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

weather_risk_heat_indexAInspect

Compute the heat index (feels-like temperature) and risk category from air temperature (Celsius) and relative humidity. Pure compute; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`tempC`	Yes	Air temperature in degrees Celsius
`humidityPct`	Yes	Relative humidity percent (0-100)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries burden. It discloses no side effects ('Pure compute') and cost, but does not describe output format (e.g., risk category levels) which would help agent anticipate behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no waste. Purpose is front-loaded, and every sentence adds value. Ideal length for a simple compute tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a straightforward compute tool with full schema coverage, the description is mostly complete. However, noting the risk category output format would improve completeness, but not critical.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions. The description adds context (mentions heat index and risk category) but does not significantly extend meaning beyond schema. Baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verbs ('Compute') and resources ('heat index and risk category') and clearly distinguishes from sibling weather tools by specifying the computation and mentioning it's a free pure compute.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage via context (heat index computation) but lacks explicit when-to-use, when-not-to-use, or alternative tools. The 'Pure compute; price 0.0' is informative but not a usage guideline.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

weather_risk_severe_flagsAInspect

Severe-weather flags (heavy rain / high wind / extreme heat / frost) derived from the Open-Meteo forecast, each with its threshold and worst value. Read-only; price 0.0 (free).

ParametersJSON Schema

Name	Required	Description	Default
`lat`	Yes	Latitude (-90..90)
`lon`	Yes	Longitude (-180..180)

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description discloses read-only behavior and zero cost, which are key. It also states the data source (Open-Meteo) and that each flag includes threshold and worst value. However, it does not mention rate limits or response format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two short sentences, front-loading the purpose and key attributes (input, flags, read-only, cost). No superfluous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with two simple parameters and no output schema, the description provides sufficient context: what flags are returned, that they include thresholds and worst values, and that it is read-only and free. It lacks example output but is complete enough for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already describes lat and lon with full coverage. The description adds no parameter-specific meaning beyond stating the data source, so it meets the baseline of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns severe-weather flags (heavy rain, high wind, extreme heat, frost) derived from Open-Meteo, with threshold and worst value. This distinguishes it from sibling weather tools like current_weather or forecast.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for retrieving severe weather flags but does not explicitly compare with siblings (e.g., current_weather or forecast) or state when not to use it. No alternatives or exclusions are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Discussions

No comments yet. Be the first to start the discussion!

Related MCP Servers

indaga-agent
Biology & Medicine Health & Wellness
birinets
A
license
-
quality
B
maintenance
A local-first MCP server for querying multi-omic personal health data (genome, labs, wearables) with an honesty contract and progressive disclosure skills.
Last updated 2026-07-05
1
AGPL 3.0
evermint-mcpofficial
Security Blockchain Autonomous Agents
EverMint-app
A
license
-
quality
B
maintenance
Tamper-evident receipts for AI agent actions. The notary layer for agent-to-agent transactions.
Last updated 2026-05-07
100
1
MIT
mcp-database-server
Databases Knowledge & Memory Developer Tools
fireproof-storage
F
license
B
quality
D
maintenance
Fireproof ledger database with multi-user sync
Last updated 2024-12-19
4
19
31
AgentAudit MCP
Observability Autonomous Agents
Rumblingb
A
license
-
quality
B
maintenance
Provides an immutable, tamper-evident audit trail for AI agents, enabling event logging with cryptographic chaining, search, verification, and statistics.
Last updated 2026-07-28
2
MIT

View all MCP Servers

Try in Browser

Your Connectors

Resources

Need Help?