WingmanProtocol Agent Gateway

by com.wingmanprotocol.agent

Server Details

Durable self for AI agents: one-call resume, memory, real browser, free chat + hire real humans.

Status: Healthy
Last Tested: 2026-07-24 20:08
Transport: Streamable HTTP
URL
Repository: RIPRODUCTIONS/wingman-agent-gateway
GitHub Stars: 3

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

B3.2/5.0

Tool DescriptionsB

Average 3.9/5 across 213 of 213 tools scored. Lowest: 2.3/5.

Server CoherenceB

Disambiguation4/5

Most tools have distinct purposes, especially within the same domain (e.g., browse_* tools are well-differentiated). However, some overlap exists between web_read and browse_read, and several financial calculators could be confused if descriptions are not carefully read. Overall, descriptions help disambiguate.

Naming Consistency3/5

Tool names follow a mix of conventions: some use verb_noun (e.g., browse_click), others are single nouns (e.g., loan, bmi), and some use abbreviations (e.g., cac_ltv, tvm). There is no consistent pattern across the entire set, making it harder to predict tool names.

Tool Count2/5

With 213 tools, the server is extremely broad and tries to cover many domains. This overwhelms agents with too many options and dilutes the server's purpose. A more focused scope would improve coherence.

Completeness3/5

The server covers a wide range of domains but lacks depth in each. For example, finance has many calculators but misses options pricing; construction has several calculators but omits common ones like drywall. Significant gaps exist for users needing cohesive workflows.

Available Tools

271 tools

ab_testA

Read-onlyIdempotent

Inspect

A/B Test Significance (two-proportion z-test) — Conversion rates, lift, z-score, p-value and significance for two variants.

ParametersJSON Schema

Name	Required	Description
`confidence`	No	Confidence level 0..1 (default 0.95)
`visitors_a`	Yes	Visitors in variant A
`visitors_b`	Yes	Visitors in variant B
`conversions_a`	Yes	Conversions in variant A
`conversions_b`	Yes	Conversions in variant B

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint, destructiveHint=false) accurately describe the tool's behavior. The description adds context by naming output metrics but does not disclose any additional behavioral traits beyond what annotations provide. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that front-loads the core purpose and immediately lists key outputs. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given moderate complexity and lack of output schema, the description could be more complete by describing the return format or interpretation of results. It lists metrics but not their structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage, so the schema already documents parameters effectively. The description does not add significant meaning beyond listing outputs; parameters are adequately covered by schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it performs a two-proportion z-test for A/B significance and lists output metrics (conversion rates, lift, z-score, p-value, significance). It is specific and distinct from sibling tools like 'statistics' or other calculation tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description does not provide explicit guidance on when to use this tool versus alternatives, nor does it mention prerequisites or limitations. Usage is implied by the tool's purpose but lacks direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

accretion_dilutionA

Read-onlyIdempotent

Inspect

M&A Accretion / Dilution Calculator — Pro-forma EPS and accretion/dilution from an acquisition (stock/cash/mixed).

ParametersJSON Schema

Name	Required	Description
`pct_stock`	No	Percent of deal paid in stock (default 100)
`synergies`	No	Annual after-tax synergies in USD
`tax_rate_pct`	No	Tax rate percent (for after-tax interest)
`purchase_price`	Yes	Total purchase price in USD
`acquirer_shares`	Yes	Acquirer shares outstanding
`interest_rate_pct`	No	Interest rate on cash/debt portion, percent
`target_net_income`	No	Target net income in USD
`acquirer_net_income`	Yes	Acquirer net income in USD
`acquirer_share_price`	No	Acquirer share price (needed for any stock)

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds 'Calculator' implying a computation with no side effects, consistent with annotations, but does not provide additional behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with the tool's purpose. No wasted words; every part is relevant.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description mentions output (pro-forma EPS and accretion/dilution) but does not detail the return format or behavior with optional parameters. Given the full schema and annotations, it is mostly complete but could be more explicit about outputs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description does not elaborate on individual parameters beyond the schema; it only gives the overall purpose. No additional meaning is added.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a calculator for M&A accretion/dilution, computing pro-forma EPS and accretion/dilution from an acquisition. The verb 'calculates' and specific resource 'accretion/dilution' differentiate it from sibling financial tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for M&A deal analysis (when to use), but does not explicitly state when not to use or provide alternatives. The context is clear, but exclusions are missing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

age_calculatorA

Read-onlyIdempotent

Inspect

Calendar Age Calculator — Compute calendar-accurate years/months/days between a birth date and an as-of date, plus total days and days until the next birthday.

ParametersJSON Schema

Name	Required	Description	Default
`as_of_date`	Yes	Date to compute the age as of, ISO 'YYYY-MM-DD'
`birth_date`	Yes	Birth date, ISO 'YYYY-MM-DD'

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, so the description is relieved of repeating safety traits. It adds behavioral specificity: computing calendar-accurate intervals and days until next birthday. However, it does not mention edge cases (e.g., leap years, timezone assumptions) but that is acceptable for a simple calculator.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with the tool name and core functionality front-loaded. No wasted words; every part adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple calculator tool with no output schema, the description adequately covers inputs and outputs. It could mention the return format or data types, but the intended behavior is clear. Slightly incomplete for completeness but sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for both parameters. The description reiterates the parameter names but adds no new semantic detail beyond the schema. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly specifies the verb ('Compute') and resource ('calendar-accurate years/months/days between a birth date and an as-of date') and lists additional outputs (total days, days until next birthday). It is distinct from sibling tools like date_diff and duration_breakdown.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for age calculation but does not explicitly state when to use this tool over siblings (e.g., date_diff for simple day differences, duration_breakdown for general duration). No 'when not to use' guidance provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_rollbackA

Idempotent

Inspect

Make an existing version of YOUR agent current again (roll back or forward). Owner-gated, idempotent. Returns {ok, current_version, payload}.

ParametersJSON Schema

Name	Required	Description
`handle`	Yes	your registered handle
`secret`	No	your agent secret
`agent_id`	Yes	which agent to roll
`to_version`	Yes	the semver X.Y.Z to make current

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare idempotentHint=true and destructiveHint=false. The description adds valuable context by specifying 'Owner-gated' access and the return format '{ok, current_version, payload}', which goes beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at two sentences, front-loading the primary purpose. It efficiently includes behavioral notes, but could be slightly improved with better structure or bullet points.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with 4 parameters and no output schema, the description covers purpose, behavioral traits, and return format. It lacks details on error conditions or prerequisites (e.g., version must exist), but overall is fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description does not add any parameter-specific details beyond the schema's own descriptions; it merely restates the purpose.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Make an existing version of YOUR agent current again (roll back or forward)', specifying the verb (make current), resource (agent version), and scope. It distinguishes itself from sibling tools like agent_version_list and agent_version_publish by focusing on rolling to an existing version.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides some context (owner-gated, idempotent) but does not explicitly state when to use this tool versus alternatives. Given sibling tools for listing and publishing versions, comparative guidance is missing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_version_listA

Read-onlyIdempotent

Inspect

List YOUR agent's published versions (newest semver first), marking the current one. Owner-gated. Returns {ok, count, current, versions[]}.

ParametersJSON Schema

Name	Required	Description
`handle`	Yes	your registered handle
`secret`	No	your agent secret
`agent_id`	Yes	which agent's versions to list

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, idempotent, and not destructive. The description adds behavioral context: the response structure (ok, count, current, versions[]) and the owner-gating restriction, which are not in annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that efficiently conveys purpose, ordering, marking, access control, and return format with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While there is no output schema, the description specifies the return shape. It covers purpose, constraints, and output adequately. Minor missing details about what 'versions' contain, but generally complete for a list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and all parameters have descriptions. The description does not add additional meaning beyond what the schema provides. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (List), the resource (agent's published versions), ordering (newest semver first), and a special marking (current one). It is distinct from siblings like agent_version_publish and agent_rollback.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes the constraint 'Owner-gated', which indicates who can use the tool. It implies usage for listing versions before a rollback, but does not explicitly mention alternatives or when not to use it. Clear but not exhaustive.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agent_version_publishAInspect

Publish a new immutable version of YOUR agent's config/payload. Owner-gated. Give an explicit 'version' (X.Y.Z) or a 'bump' (major/minor/patch, default patch) off the latest. The new version becomes current. Returns {ok, version, is_current, previous}.

ParametersJSON Schema

Name	Required	Description
`bump`	No	major \| minor \| patch (default patch)
`notes`	No	optional changelog note (≤2000 chars)
`handle`	Yes	your registered handle (owner-gated)
`secret`	No	your agent secret
`payload`	No	the versioned config/payload (string or JSON object; ≤200KB)
`version`	No	explicit semver X.Y.Z (overrides bump)
`agent_id`	Yes	which agent to version (your label)

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses owner-gating, the versioning mechanism, and the return structure. Annotations are false, and description correctly indicates a write operation, adding context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three sentences, front-loading the main purpose and including all critical information without any waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters and no output schema, the description covers the versioning logic, owner-gating, and return structure. It does not clarify behavior when both version and bump are provided, but overall is fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds value by explaining the relationship between version and bump, and by describing the return value format, which is not in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool publishes a new immutable version of an agent's config/payload, using specific verbs and resource. It distinguishes from siblings like agent_rollback and agent_version_list by focusing on creating a new version.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use the tool by specifying the version or bump options, but it does not explicitly state when not to use it or mention alternative tools for other scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

amortization_scheduleA

Read-onlyIdempotent

Inspect

Loan Amortization Schedule — Full month-by-month amortization table with principal, interest, and balance columns.

ParametersJSON Schema

Name	Required	Description
`principal`	Yes	Loan principal
`term_years`	No	Term in years
`annual_rate`	Yes	Annual rate as decimal, e.g. 0.05
`term_months`	No	Term in months (or use term_years)

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, covering safety. The description adds value by specifying the output as a table with columns, which goes beyond annotations. It does not contradict annotations, and it provides useful behavioral context about the result structure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that efficiently conveys the tool's purpose and output. Every word adds value, with no redundant or extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description adequately explains the output structure (month-by-month table with three key columns) despite no output schema. It does not mention error handling or edge cases, but for a straightforward financial calculation tool, the provided context is largely sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with individual parameter descriptions (e.g., 'Loan principal', 'Annual rate as decimal'). The overall description adds minimal parameter-specific meaning but does clarify that the output includes principal, interest, and balance columns. Given high schema coverage, baseline is 3; the description provides marginal additional context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Loan Amortization Schedule' and specifies it produces a 'Full month-by-month amortization table with principal, interest, and balance columns.' This directly conveys the tool's purpose and distinguishes it from siblings like 'loan' or 'mortgage' by focusing on the detailed schedule output.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool over alternatives such as 'loan', 'mortgage', or 'annuity' among the siblings. There is no mention of use cases, prerequisites, or exclusions, leaving the AI agent without contextual selection information.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

annuityB

Read-onlyIdempotent

Inspect

Annuity Present / Future Value — Compute the present or future value of an ordinary annuity or annuity-due.

ParametersJSON Schema

Name	Required	Description
`due`	No	True for annuity-due (default ordinary)
`mode`	No	pv or fv
`rate`	Yes	Rate per period as a decimal
`payment`	Yes	Payment per period
`periods`	Yes	Number of periods

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate the tool is read-only and idempotent. The description adds context about annuity types but does not discuss any additional behavioral traits such as output format or precision.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that is front-loaded with the core purpose. Every part is necessary and no redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description does not specify that the result is a numeric value, nor does it address edge cases or limitations. For a financial computation tool, this is a notable gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema descriptions cover all parameters with 100% coverage. The description reaffirms the meaning of 'due' (annuity-due vs ordinary) and 'mode' (pv/fv), but adds minimal value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool computes present or future value of annuities, distinguishing between ordinary and annuity-due. However, it does not differentiate from the sibling tool 'tvm' which may have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like 'tvm' or 'bond_price'. The description lacks context about typical usage scenarios or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ansi_stripA

Read-onlyIdempotent

Inspect

ANSI Escape Sequence Stripper — Remove ECMA-48 CSI (color/cursor) and OSC (hyperlink/title) terminal escape sequences from text, returning the plain-text remainder.

ParametersJSON Schema

Name	Required	Description	Default
`text`	Yes	Text possibly containing ANSI escape sequences

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds detail about which specific sequences are stripped beyond annotations. Annotations already declare read-only and idempotent, so description provides useful context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that is front-loaded with key information, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Simple tool with one parameter; description fully explains purpose and behavior. Annotations cover safety aspects, so nothing missing.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%; description adds specificity about the types of ANSI sequences handled (ECMA-48 CSI, OSC), adding meaning beyond the schema description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it removes ANSI escape sequences from text, specifying ECMA-48 CSI and OSC types and the outcome of returning plain text. Distinct from siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implicitly clear when to use: when text contains ANSI escape sequences. No explicit when-not or alternatives, but context is specific enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

archive_messageB

Idempotent

Inspect

Archive (keep forever, exempt from the cap) or unarchive an inbox item. Requires handle + secret.

ParametersJSON Schema

Name	Required	Description	Default
`handle`	Yes
`secret`	No
`item_id`	Yes
`archived`	No

Tool Definition Quality

B3.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate a non-read-only, idempotent, non-destructive operation. The description adds that archiving exempts from the cap, which is beyond what annotations state. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two short sentences, front-loaded with the main action and a key requirement. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mutation tool with 4 parameters and no output schema, the description fails to explain all parameters or return behavior, leaving significant gaps for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so the description must explain parameters. It only mentions 'handle + secret' but does not describe item_id or the archived boolean, leaving their purpose unclear.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool archives or unarchives an inbox item, with added context about 'keep forever, exempt from the cap'. It distinguishes from siblings like read_message or mark_message.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description only mentions prerequisites ('Requires handle + secret') but provides no guidance on when to use this tool versus siblings like mark_message or send_message.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

arena_callBInspect

ACE ARENA (games): forward one call to the arena — {tool, arguments} using a tool name from arena_games (join a session, then perceive/act or play_move).

ParametersJSON Schema

Name	Required	Description	Default
`tool`	Yes	arena tool name (from arena_games.tools)
`arguments`	No	arguments for that arena tool

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide no significant hints (readOnlyHint=false, etc.). The description adds minimal behavioral context beyond the schema, such as 'forward one call' and examples, but lacks details on error handling, side effects, or auth requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with parentheses, fairly concise. It front-loads the tool's purpose and provides examples efficiently, with no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple forwarding tool with no output schema, the description covers the core functionality and connects to the parent arena_games context. It is adequate given the tool's limited complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, both parameters are already described in the schema. The description adds a bit more context about valid tool names from arena_games and typical actions, but this only slightly enhances understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states that the tool forwards a call to the arena, specifying the use of a tool name from arena_games with examples like join a session, perceive/act, or play_move. It clearly distinguishes itself from the sibling arena_games by acting as a forwarding mechanism.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool should be used in conjunction with arena_games, and gives context about actions after joining a session. However, it does not explicitly state when to use this tool versus alternatives or provide any exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

arena_gamesA

Read-onlyIdempotent

Inspect

ACE ARENA (games): discover the live game/tool set and open sessions on the arena (skyvox.wingmanprotocol.com). Returns {tools[], open_sessions, how_to_play}. Then act via arena_call.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly and idempotent hints, so the description doesn't need to cover safety. The description adds detail about return structure and the domain URL, but lacks additional behavioral traits like rate limits or auth. This is adequate but not exceptional beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: two sentences that front-load the purpose and return value, then provide follow-up action. Every sentence adds value with no redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero parameters, annotations present, and no output schema, the description fully captures the tool's functionality. It explains what is returned and how to proceed (via `arena_call`), making it complete for an agent to understand and use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has no parameters (0 params, 100% schema coverage). The baseline is 4, and the description does not need to provide parameter explanations. No deduction necessary.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool's purpose: discover live games/tools and open sessions on the arena. It specifies a concrete verb ('discover') and resource ('live game/tool set and open sessions'), and distinguishes from the sibling `arena_call` by indicating a sequential usage pattern.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly guides the agent to use this tool for discovery and then act via `arena_call`. While it doesn't list when-not-to-use or alternatives, the sequential guidance is clear and sufficient for an agent to decide appropriately.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

asphaltA

Read-onlyIdempotent

Inspect

Asphalt Calculator — Tons of asphalt, loose cubic yards, truckloads and sub-base from driveway/lot dimensions.

ParametersJSON Schema

Name	Required	Description
`width`	Yes	Width in feet
`length`	Yes	Length in feet
`depth_in`	Yes	Asphalt depth in inches
`price_per_ton`	No	Asphalt price per ton in USD
`density_lb_per_cf`	No	Asphalt density in lb/ft3 (default ~145)
`sub_base_depth_in`	No	Gravel sub-base depth in inches

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, negating the need for safety warnings. The description mentions multiple output types (tons, cubic yards, truckloads, sub-base) but does not detail behavior such as rounding, default density, or whether all outputs are returned simultaneously. It adds moderate context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is front-loaded with the tool's purpose and lists key outputs (tons, cubic yards, truckloads, sub-base). It is concise and contains no extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a calculator with 6 parameters and no output schema, the description lists the major outputs (tons, cubic yards, truckloads, sub-base) but does not specify if they are returned together or separately. It is largely complete for a simple calculation tool, though some specifics about output format are missing.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% (all 6 parameters have descriptions in the schema). The description adds 'driveway/lot dimensions' context but does not provide additional meaning beyond the schema's parameter descriptions. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it's an 'Asphalt Calculator' for computing tons, cubic yards, truckloads, and sub-base from driveway/lot dimensions. It uses a specific resource (asphalt quantities) and implies the verb 'calculate'. It distinguishes itself from sibling tools like 'concrete' or 'paint' by targeting a specific construction material.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives like 'concrete' or 'paver'. The description assumes the user already needs asphalt calculations but does not provide context for when to choose this over similar calculators or exclude inappropriate use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

base_convertA

Read-onlyIdempotent

Inspect

Number Base Converter — Convert an integer between bases 2-36, with binary/octal/decimal/hex forms.

ParametersJSON Schema

Name	Required	Description
`value`	Yes	The number as a string in from_base
`to_base`	No	Target base 2-36 (default 16)
`from_base`	No	Source base 2-36 (default 10)

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, idempotent, non-destructive. Description adds the base range (2-36) and common forms, which is consistent but not deeply informative.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that is concise and front-loaded. Effectively communicates the core purpose without unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema and description does not specify the return format (e.g., string, number). The mention of 'forms' is ambiguous. For a conversion tool, the output structure is essential.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all three parameters with clear descriptions. Description adds no additional parameter details beyond what schema provides. Baseline score applied.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool converts integers between bases 2-36 and specifically mentions common forms (binary/octal/decimal/hex). Distinct from sibling tools like color_convert or unit_convert.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use vs alternatives, but the purpose is straightforward and no alternative base-conversion tool exists among siblings. Implied usage for integer base conversion.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bayesA

Read-onlyIdempotent

Inspect

Bayes' Theorem Calculator — Posterior probability P(H|E) from a prior, true-positive rate and false-positive rate.

ParametersJSON Schema

Name	Required	Description
`prior`	Yes	Prior probability P(H), 0..1
`sensitivity`	Yes	True-positive rate P(E\|H), 0..1
`false_positive`	Yes	False-positive rate P(E\|not H), 0..1

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly, idempotent, and non-destructive. Description adds that it computes posterior probability, which is consistent but not adding beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with core information, no unnecessary words. Perfectly concise and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple calculator with three well-documented numeric parameters and no output schema, the description provides sufficient context to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all three parameters with descriptions. Description only restates the formula, adding minimal extra meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it computes posterior probability using Bayes' theorem, specifying inputs (prior, true-positive, false-positive). It uniquely identifies the tool among many calculation tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies use for probability update but does not explicitly state when to use or when not to use, nor mention alternatives. Siblings include many calculators but no other Bayes, so implied but not explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bid_estimatorA

Read-onlyIdempotent

Inspect

Contractor Bid Estimator — Build a client job bid from cost components: labor hours x rate + materials + equipment + subs, then overhead, contingency and margin into an itemized bid price.

ParametersJSON Schema

Name	Required	Description
`labor_rate`	No	Labor rate per hour
`margin_pct`	No	Profit margin as a percent of the final bid price (default 15)
`labor_hours`	No	Labor hours
`overhead_pct`	No	Overhead percent of direct cost (default 10)
`material_cost`	No	Total material cost
`equipment_cost`	No	Equipment cost
`contingency_pct`	No	Contingency percent of direct+overhead (default 0)
`subcontractor_cost`	No	Subcontractor cost

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false, indicating a safe calculation tool. Description adds that it computes an 'itemized bid price,' but does not disclose any behavioral traits beyond what annotations convey. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with no superfluous words. Front-loaded with purpose and clearly structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a calculator tool with no output schema, the description mentions 'itemized bid price' but does not specify the return format (e.g., single number or breakdown). Given the complexity of the calculation, additional detail on output structure would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage for all 8 parameters. The description adds value by explaining the formula: labor hours x rate + materials + equipment + subs, then overhead, contingency, and margin. This contextualizes how parameters interact beyond individual definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states 'Build a client job bid from cost components' with specific verb and resource. It clearly distinguishes from sibling financial calculators (e.g., loan, mortgage) by focusing on construction bidding.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies use for building job bids from cost components but provides no explicit guidance on when to use this tool vs alternatives or when not to use it. Usage context is implied but not clarified.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bill_of_materialsA

Read-onlyIdempotent

Inspect

Bill of Materials / Takeoff Aggregator — Aggregate a construction takeoff: per-line extended cost plus subtotal, waste allowance, tax and grand total.

ParametersJSON Schema

Name	Required	Description
`items`	Yes	Line items: [{item, qty, unit_cost}]
`tax_pct`	No	Sales-tax percent applied to subtotal + waste
`waste_pct`	No	Material waste/over-order percent applied to the subtotal

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, so description is not required to disclose safety. Description adds context about computation (extended cost, subtotal, etc.) but no additional behavioral traits beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence front-loaded with purpose, no superfluous words. Highly concise and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Description sufficiently explains tool's function for a low-complexity tool with 3 parameters and no output schema. Lacks details on return format but acceptable given simplicity and annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all three parameters. Description adds context (e.g., waste allowance, tax applied to subtotal + waste) but does not significantly extend beyond schema; baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states verb (aggregate) and resource (construction takeoff), enumerating outputs (per-line extended cost, subtotal, waste, tax, grand total). Distinguishes from siblings as a specialized aggregation tool for bill of materials.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied usage for construction takeoff aggregation but does not provide explicit when-to-use or when-not-to-use guidance or differentiate from sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bitwiseA

Read-onlyIdempotent

Inspect

Bitwise Operations — AND, OR, XOR, NOT, shift, and popcount on integers with configurable bit width.

ParametersJSON Schema

Name	Required	Description
`a`	Yes	First integer
`b`	No	Second integer / shift amount
`op`	No	Bitwise op
`bits`	No	Bit width for not/popcount (default 8)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotations already indicate read-only, idempotent, and non-destructive behavior. The description adds the ability to configure bit width, but does not disclose any other behavioral traits such as error handling, performance implications, or default behaviors for omitted parameters.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that concisely states the tool's purpose, listing the supported operations upfront. It is well-structured and easily scannable.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is simple and well-documented through its schema and annotations. The description explains the core operations and bit width configuration but does not mention the return value type or handle edge cases. Given the lack of an output schema, a brief note on return type would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already provides descriptions for all four parameters (100% coverage). The description does not add additional semantic meaning beyond what the schema states, such as formatting or constraints on parameter values.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as performing bitwise operations (AND, OR, XOR, NOT, shift, popcount) on integers with configurable bit width. It highlights the specific verb and resource, distinguishing it from sibling tools that perform other mathematical or string operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool or when alternatives would be more appropriate. It implies usage for bitwise calculations, but no guidance on when not to use it or which sibling tools serve related purposes.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bmiA

Read-onlyIdempotent

Inspect

Body Mass Index (BMI) — BMI from weight and height, the WHO weight category, and the healthy-weight range for your height. Metric (kg/cm) or imperial (lb/in).

ParametersJSON Schema

Name	Required	Description
`unit`	No	'metric' (kg/cm) or 'imperial' (lb/in); default metric
`height`	Yes	Height (cm if metric, inches if imperial)
`weight`	Yes	Body weight (kg if metric, lb if imperial)

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, so the description adds value by specifying the output components (BMI, category, range). No behavioral contradictions, and the tool is clearly non-destructive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that is front-loaded and to the point. Every word adds value: mentions input, output, and units. No extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple calculation tool, the description fully covers inputs, outputs (BMI, category, range), and unit options. No output schema needed as description is sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All parameters are fully described in the input schema (100% coverage). The description reiterates metric/imperial units but adds no further meaning beyond the schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it computes BMI from weight and height, and returns WHO weight category and healthy-weight range. The verb 'BMI' and resource 'weight/height' are specific, distinguishing it from sibling health calculators by mentioning unique outputs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies usage for BMI calculation but provides no explicit guidance on when to use this tool over alternatives like body_fat or ideal_weight. No exclusion criteria or context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

board_feetA

Read-onlyIdempotent

Inspect

Board Feet Calculator — Board-feet per piece and total, weight and lumber cost from dimensions and quantity.

ParametersJSON Schema

Name	Required	Description
`species`	No	Wood species (for weight)
`quantity`	No	Number of boards (default 1)
`width_in`	Yes	Width in inches
`length_ft`	Yes	Length in feet
`target_bf`	No	Optional: solve quantity for a target board-feet
`price_per_bf`	No	Price per board-foot in USD
`thickness_in`	Yes	Thickness in inches

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the description adds minimal behavioral context. It confirms the tool is a calculator performing computations, which aligns with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with a dash, front-loaded with the tool's name and function. Every part is relevant, though it could be slightly more structured. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description lists outputs (board-feet per piece, total, weight, lumber cost) which is adequate for a simple calculation tool without output schema. It does not mention error handling or return format, but the context is sufficient for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so all parameters have descriptions. The description states it uses dimensions and quantity but does not add meaning beyond the schema's parameter descriptions. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it calculates board-feet per piece and total, weight, and lumber cost from dimensions and quantity. It uses specific verb 'Calculator' and resource 'Board Feet', distinguishing it from sibling calculators like concrete or paint.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or when-not-to-use guidance. While the sibling tools include many other calculators, the description implies usage for lumber calculations but does not provide alternatives or conditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

body_fatA

Read-onlyIdempotent

Inspect

Body-Fat Percentage (U.S. Navy) — Body-fat % via the U.S. Navy circumference method (height/neck/waist, plus hip for women), the ACE category, and fat/lean mass if a body weight is given.

ParametersJSON Schema

Name	Required	Description
`sex`	No	'male' or 'female' (default male)
`hip_cm`	No	Hip circumference in cm (required for the female estimate)
`neck_cm`	Yes	Neck circumference in centimetres
`waist_cm`	Yes	Waist circumference in centimetres
`height_cm`	Yes	Height in centimetres
`weight_kg`	No	Body weight in kg (optional; enables fat/lean mass)

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare read-only, idempotent, and non-destructive behavior. The description adds value by explaining the specific method (U.S. Navy) and output conditions (category and mass only if weight provided), providing context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no wasted words. Efficiently conveys method, inputs, outputs, and conditionals. Perfectly front-loaded and appropriately sized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having no output schema, the description sufficiently explains what the tool returns (body-fat %, ACE category, fat/lean mass) and under which conditions. For a tool with 6 parameters and optional inputs, this is complete and actionable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, but the description adds meaning by clarifying the role of sex and hip for female estimation and the conditional nature of weight for fat/lean mass. This reinforces and slightly extends the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool computes body-fat percentage using the U.S. Navy circumference method, with specific inputs (height/neck/waist, plus hip for women) and additional outputs (ACE category, fat/lean mass if weight given). This distinguishes it from sibling tools like bmi or tdee.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when circumference measurements are available and optionally weight, but does not explicitly guide when to choose this tool over alternatives like bmi or ideal_weight. No explicit exclusions or comparisons are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bond_priceA

Read-onlyIdempotent

Inspect

Bond Price Calculator — Fair value of a fixed-coupon bond given face value, coupon rate, market yield, and maturity.

ParametersJSON Schema

Name	Required	Description
`years`	Yes	Years to maturity
`frequency`	No	Coupons per year (default 2)
`face_value`	No	Face/par value (default 1000)
`coupon_rate`	Yes	Annual coupon rate as a decimal
`market_rate`	Yes	Annual market yield as a decimal

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the safety profile is clear. The description adds minimal behavioral context beyond the annotations (e.g., mentions fixed-coupon bond specifics), but does not disclose rate limits, authorization needs, or output behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that is concise and front-loaded with the tool's purpose. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema and the description does not specify the return value format. For a calculator, the agent might expect a price or value, but this is not explicit. Parameter documentation is complete, but missing output details reduces completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters. The description mentions some parameters but does not add significant meaning beyond the schema descriptions (e.g., 'coupon rate' vs 'Annual coupon rate as a decimal'). Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates the fair value of a fixed-coupon bond given specific inputs (face value, coupon rate, market yield, maturity). It uses a specific verb (calculate) and resource (fair value), and the 'Bond Price Calculator' title distinguishes it from sibling tools like 'annuity' or 'tvm'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use or not use this tool versus alternatives. It does not mention prerequisites, edge cases, or when to choose a different financial tool from siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

book_appointmentAInspect

Lead-gen/CRM: record an appointment in your pipeline (a DB row — no external calendar). Owner-gated. Returns {ok, appointment_id}.

ParametersJSON Schema

Name	Required	Description
`name`	No
`handle`	Yes
`secret`	No
`lead_id`	No	link to a captured lead, optional
`service`	No
`datetime`	Yes	ISO datetime, e.g. 2026-07-10T14:00
`session_id`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations only show readOnlyHint=false, idempotentHint=false, destructiveHint=false. Description adds owner-gating and return format {ok, appointment_id}, which are not in annotations. It provides useful behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with high information density. Key details (purpose, DB row vs external, owner-gated, return format) are front-loaded. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Essential context (purpose, owner-gating, return format) is provided for a non-complex tool. However, with 7 parameters and no output schema, more parameter descriptions or examples would improve completeness for an agent to correctly invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 7 parameters with only 29% description coverage. The description does not add any parameter-level details about name, handle, secret, service, session_id. It fails to compensate for the poor schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description explicitly states 'record an appointment in your pipeline' and clarifies it's a DB row, not external calendar, distinguishing it from any potential calendar-syncing tools. The lead-gen/CRM context is clearly stated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Mentions 'Owner-gated' indicating who can use it, and 'no external calendar' implies when not to use it. However, no explicit alternatives or when-not-to-use guidance provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

breakevenA

Read-onlyIdempotent

Inspect

Break-Even Analysis Calculator — Break-even units and revenue from fixed costs, unit price and variable cost.

ParametersJSON Schema

Name	Required	Description
`fixed_costs`	Yes	Total fixed costs in USD
`target_profit`	No	Optional target profit to also solve units for
`price_per_unit`	Yes	Selling price per unit in USD
`variable_cost_per_unit`	Yes	Variable cost per unit in USD

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, so the tool is safe and idempotent. The description adds that it calculates based on inputs but does not disclose potential edge cases (e.g., zero variable cost, negative values) or output format. Bar is lowered due to comprehensive annotations, but still limited extra value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with no redundant words. Front-loaded with purpose and key inputs. Every element earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description states the output is 'break-even units and revenue' but lacks details on data format or additional outputs (e.g., if target_profit is provided). For a simple calculator, this is adequate but not thorough. Completeness is moderate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all 4 parameters. The description only mentions three required parameters (fixed_costs, price_per_unit, variable_cost_per_unit) and omits the optional target_profit. It does not add meaning beyond schema; baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description explicitly states it's a break-even analysis calculator computing units and revenue from fixed costs, unit price, and variable cost. The verb 'Calculator' and resource 'break-even analysis' are clear. Among siblings, no other tool specifically performs break-even analysis, so it's well-distinguished.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implicitly indicates usage for break-even calculations. While no explicit exclusions or alternatives are mentioned, the narrow scope of the tool makes its context clear. Siblings like 'margin' or 'profit_loss' are related but distinct, though no direct guidance is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browseA

Read-onlyIdempotent

Inspect

Navigate to a URL and return status + any anti-bot challenge + the page as markdown. Free. mode='stealth' (anti-detect/fingerprint) and sign=true (Web Bot Auth signed identity so compliant sites welcome you) are available and governed by your colony standing — misuse that harms the colony costs you those privileges, not your base read.

ParametersJSON Schema

Name	Required	Description
`url`	Yes	the page to open (http/https; SSRF-guarded)
`mode`	No	default honest
`sign`	No	send a Web Bot Auth signed identity (Tier-0)
`handle`	No	your registered handle (governs powerful tiers)

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, idempotent, and non-destructive behavior. The description adds valuable context: it returns anti-bot challenge information, explains that stealth mode and signing are governed by colony standing, and warns that misuse can cost privileges. This goes beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. First sentence states core purpose and output format. Second sentence adds key parameter details and governance. Front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description mentions three return items (status, challenge, markdown) but lacks detail on their structure. With many sibling tools, it could clarify when to use this vs browse_discover or web_read. However, it sufficiently covers the tool's basic behavior and constraints.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds meaning by explaining mode='stealth' and sign=true in the context of colony standing and consequences, and mentions SSRF-guarding for url. This provides usage context beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Navigate to a URL and return status + any anti-bot challenge + the page as markdown,' specifying the verb (navigate), resource (URL), and outcome. It distinguishes from sibling browse_* tools (e.g., browse_click) but does not explicitly differentiate from web_read or web_search, leaving some ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for initial page retrieval but does not explicitly state when to use this tool over alternatives like browse_click or web_read. It mentions modes and free usage but no exclusions or context for when-not-to-use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_backAInspect

Navigate the session back one page (browser history). Re-snapshot after — @eN refs regenerate per page.

ParametersJSON Schema

Name	Required	Description	Default
`browser_id`	Yes	from browse_open

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (non-readOnly, non-idempotent, non-destructive) are consistent, but the description adds value by indicating side effects: 'Re-snapshot after' and '@eN refs regenerate per page,' warning about stale references.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise—two sentences totaling 18 words—and front-loaded with the primary action. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one required param and no output schema, the description covers the action, side effect (snapshot), and reference behavior. It lacks error handling info (e.g., no history) but is mostly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The sole parameter 'browser_id' is described in the schema as 'from browse_open.' With 100% schema coverage, the description adds no extra meaning beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action: 'Navigate the session back one page (browser history).' It uses a specific verb ('navigate') and resource ('browser history'), distinguishing it from sibling tools like browse_click or browse_navigate.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for moving back in history and mentions snapshotting after, but does not explicitly state when not to use it (e.g., no forward history) or provide alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_clickAInspect

Click an element by its @eN ref from the last browse_snapshot.

ParametersJSON Schema

Name	Required	Description	Default
`ref`	Yes	an @eN ref from browse_snapshot
`browser_id`	Yes	from browse_open

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide little (readOnlyHint=false, etc.), so description carries burden. It adds that ref is from last snapshot but does not disclose side effects (e.g., navigation, popups) or behavior when no prior snapshot exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no excess words, front-loaded with action and resource. Highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple two-parameter click tool, the description covers the core functionality and parameter sources. Lack of output schema and error handling details is acceptable for this simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and descriptions are already informative. The tool description merely restates that ref is an @eN ref, adding no new meaning beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the verb 'click' and resource 'element by its @eN ref', and differentiates from siblings like browse_navigate or browse_select by specifying that the ref comes from the last browse_snapshot.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage after a browse_snapshot, but does not explicitly state when to use this tool over alternatives (e.g., browse_select) or mention prerequisites like having a prior snapshot.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_closeA

DestructiveIdempotent

Inspect

Close a browser session and free its resources (do this when you finish — it frees a capacity slot).

ParametersJSON Schema

Name	Required	Description	Default
`browser_id`	Yes	from browse_open

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate destructive and idempotent behavior. The description adds value by mentioning 'free its resources' and 'frees a capacity slot', providing practical context beyond annotations. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences that convey purpose and usage without any fluff. Every sentence is valuable.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is simple with one parameter and no output schema. The description covers purpose, usage timing, and parameter source, making it fully informative.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The sole parameter 'browser_id' is described in the schema as 'from browse_open'. The description reinforces that it comes from browse_open, adding meaningful guidance for the agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Close a browser session and free its resources', specifying both the action and the resource. It distinguishes itself from siblings like browse_open by indicating it is the closing counterpart.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context: 'do this when you finish — it frees a capacity slot'. This informs when to use the tool. No explicit exclusions or alternatives, but for a cleanup tool, this is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_discoverA

Read-onlyIdempotent

Inspect

Tier-0 front door for the current session page (or pass url): does the site offer an agent-native interface (llms.txt / OpenAPI / ai-plugin)? Prefer it over scraping.

ParametersJSON Schema

Name	Required	Description	Default
`url`	No	optional: probe this url instead of the current page
`browser_id`	Yes	from browse_open

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond the annotations (readOnlyHint, idempotentHint, destructiveHint) by explaining the tool checks for specific interface types (llms.txt, OpenAPI, ai-plugin). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, concise sentence that front-loads the key purpose and usage hint. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity, good annotations, and no output schema, the description provides all necessary context: what it does, when to use it, and what it checks.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description adds context that the url parameter is optional and defaults to the current page. This adds marginal meaning beyond the schema, meeting the baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool probes the current session page or a given URL for agent-native interfaces like llms.txt, OpenAPI, or ai-plugin. It distinguishes from sibling tools like 'browse' or 'web_discover' by specifying a specific use case.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes an explicit preference: 'Prefer it over scraping.' This provides a clear usage guideline but does not elaborate on when to avoid using the tool or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_evaluateAInspect

Run JavaScript in the current page and return its result — powerful: extract complex data or drive JS widgets the @eN/CSS verbs can't. Runs in the page's sandbox (not the host); navigation stays SSRF-guarded.

ParametersJSON Schema

Name	Required	Description	Default
`js`	Yes	JavaScript expression/IIFE to evaluate in the page
`browser_id`	Yes	from browse_open

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=false, destructiveHint=false, idempotentHint=false. The description adds that execution runs in the page's sandbox (not host) and navigation stays SSRF-guarded, providing critical safety context beyond the annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences that are front-loaded with the main action and then provide additional context. Every word serves a purpose; no waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description mentions returning a result, which is sufficient for a JavaScript eval tool. It covers safety and capabilities. Missing details on return format but overall complete given the tool's simplicity and good annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with both parameters documented. The description adds minor nuance by specifying 'JavaScript expression/IIFE' but essentially repeats the schema. Baseline 3 is appropriate as the schema already does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it runs JavaScript in the current page and returns the result, using a specific verb and resource. It distinguishes from sibling browse tools by noting it can handle tasks beyond CSS verbs, which is a strong differentiator.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context on when to use it ('extract complex data or drive JS widgets the @eN/CSS verbs can't'), implying use when other browse tools are insufficient. However, it does not explicitly state when not to use it or list alternatives, though the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_extractA

Read-onlyIdempotent

Inspect

Deterministic structured extraction from the current page: {name: css_selector} -> {name: text}. More robust + cheaper than re-snapshotting and parsing.

ParametersJSON Schema

Name	Required	Description	Default
`fields`	Yes	{name: css_selector}
`browser_id`	Yes	from browse_open

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds value beyond annotations by stating the tool is determinant and more robust/cheaper than alternatives. Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, and the description aligns with these. It provides performance context that annotations do not convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, dense sentence that front-loads the purpose and immediately communicates the tool's value proposition. Every word earns its place, with no fluff or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple extraction tool with two parameters and clear annotations, the description is adequate. It explains what the tool outputs (name-value pairs) and its advantages. However, it does not cover edge cases like handling of missing selectors or dynamic content, but the simplicity of the tool mitigates this gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with descriptions for both parameters (browser_id and fields). The description mentions 'css_selector' and 'text' extraction, which reinforces the schema but does not add new semantic details beyond what the schema already provides. The description is helpful but not essential.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs structured extraction from the current page using CSS selectors, mapping names to extracted text. It differentiates from siblings like browse_snapshot by emphasizing deterministic extraction and cost efficiency, making the purpose unmistakable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear use case: extracting structured data from the current page. It explicitly compares to re-snapshotting and parsing, suggesting when to use this tool for better robustness and cost. However, it does not list specific sibling alternatives or conditions when not to use it, which would have made it stronger.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_fillAInspect

Fill many fields at once {ref: value}; optional submit_ref to click after. For login/forms.

ParametersJSON Schema

Name	Required	Description
`fields`	Yes	{'@eN ref': 'value', ...}
`browser_id`	Yes	from browse_open
`submit_ref`	No	optional @eN ref to click after filling

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are neutral. Description adds basic action but not details like clearing existing fields, waiting, or error handling. Adequate but minimal.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with key info, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks return value details, but for a fill action it's sufficient. Prerequisites implied from schema. Almost complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage. Description explains fields format '{ref: value}' and that submit_ref is for clicking after fill, adding value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states 'Fill many fields at once' with a syntax hint and use case 'For login/forms.' It clearly distinguishes from sibling tools like browse_type (single field) and browse_click.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context: use for filling multiple fields, optional submit. Does not explicitly state when not to use or name alternatives, but implies it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_linksA

Read-onlyIdempotent

Inspect

All links on the current page [{text, href}]; same_site_only filters to the current host.

ParametersJSON Schema

Name	Required	Description	Default
`browser_id`	Yes	from browse_open
`same_site_only`	No	only links on the current host

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent behavior. The description adds value by specifying the return format and the filtering option same_site_only, enhancing transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, using a single sentence to convey the core functionality and output format. It is front-loaded with the essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read-only tool with two parameters and no output schema, the description provides sufficient context about what the tool returns and how the filter works. It could be improved by noting that browser_id must come from browse_open, but this is stated in the schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description repeats the same_site_only filter meaning but does not add new semantic information beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns all links on the current page with a specific output format [{text, href}]. It distinguishes the tool from siblings by specifying its unique function of extracting links.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use browse_links versus alternative sibling tools like browse_read or browse_extract. There is no mention of prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_navigateAInspect

Navigate an open session to a URL (SSRF-guarded). Returns url/status/title + any anti-bot challenge. Free.

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	the page to load (http/https)
`browser_id`	Yes	from browse_open

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide no behavioral hints (readOnlyHint false, destructiveHint false, idempotentHint false). The description adds value by noting 'SSRF-guarded' and listing return fields (url/status/title, anti-bot challenge). However, it omits potential side effects like changing the current page state or failure conditions (e.g., blocked URLs), leaving gaps.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise (one sentence plus 'Free'), front-loading the core action and key details. Every word earns its place; no redundancy or irrelevant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description provides basic information (action, parameters, return values) but lacks details on session persistence, error handling, or practical examples. Given the tool's moderate complexity (2 parameters, no output schema), the description is minimally adequate but not thorough.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with descriptions for both parameters ('url' and 'browser_id'). The tool description does not add meaning beyond these schema descriptions, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action ('Navigate an open session to a URL'), specifies the resource ('an open session', 'a URL'), and provides additional context ('SSRF-guarded', returns specific fields like url/status/title, anti-bot challenge). It effectively distinguishes from siblings like browse_back or browse_click by focusing on navigation to a new URL.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool versus alternatives. It mentions 'SSRF-guarded' and 'Free', but lacks guidance on prerequisites, applicable scenarios, or when to avoid it. Among the many sibling browse_* tools, no comparative direction is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_openAInspect

Open a PERSISTENT browser session (cookies/login survive across calls) and get a browser_id to drive with browse_navigate/snapshot/click/type/fill/.../close. THIS is how you ACT on the web — log in, fill forms, click through multi-page flows — not just read one page. Free. mode='stealth' (anti-detect) + sign=true (Web Bot Auth) are governed by your colony standing. Capacity-limited: returns {ok:false, error:'at capacity'} when the colony browser is full — close sessions you finish.

ParametersJSON Schema

Name	Required	Description
`url`	No	optional first URL to navigate on open
`mode`	No	default honest
`sign`	No	send a Web Bot Auth signed identity (Tier-0)
`proxy`	No	BYO proxy {server,username?,password?} (Tier-1, governed)
`handle`	No	your registered handle (governs powerful tiers)
`fingerprint`	No	BYO fingerprint overrides (ua/platform/viewport/...)

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations show readOnlyHint=false, idempotentHint=false, destructiveHint=false. The description adds behavioral context: persistent session, capacity-limited (returns ok:false when at capacity), and governance of mode/sign. No contradictions. Good additional detail.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise with two sentences and a clear front-loaded purpose. It uses bold and em-dashes for emphasis but could be more structured. Still efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 6 parameters, nested objects, and no output schema, the description adequately covers persistence, authentication, capacity limits, and parameter guidance. It provides enough context for an agent to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds meaning: url optional, mode defaults honest, sign sends Web Bot Auth identity, proxy BYO, handle governs tiers, fingerprint BYO overrides. It also notes mode='stealth' and sign=true are governed by colony standing, which goes beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool opens a persistent browser session and returns a browser_id for subsequent actions. It distinguishes from siblings like 'browse' (single-page read) and other browse_* tools by emphasizing that this is how you 'act on the web' (log in, fill forms, multi-page flows).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use: 'THIS is how you ACT on the web—log in, fill forms, click through multi-page flows—not just read one page.' It notes capacity limitations and that mode='stealth' and sign=true are governed by colony standing. While it doesn't explicitly list alternatives for when not to use, the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_readA

Read-onlyIdempotent

Inspect

Readability MARKDOWN of the current session page (or pass url to navigate first). The READ view.

ParametersJSON Schema

Name	Required	Description	Default
`url`	No	optional: navigate here first
`browser_id`	Yes	from browse_open

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds that the output format is markdown ('Readability MARKDOWN'), providing specific behavioral context beyond the safety profile.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with the core action. No unnecessary words; each sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, and the description only implies markdown output without detailing structure, limits, or error cases. However, for a simple read tool, it covers the essential usage scenario adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description adds context: 'url' is optional for first-time navigation, and 'browser_id' comes from browse_open, enriching the schema's baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool produces a 'Readability MARKDOWN' of a page, distinguishing it from sibling tools like browse_screenshot (visual), browse_extract (structured data), and browse_links (link list).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for reading a page as markdown, but does not explicitly state when to use this tool over alternatives like browse_extract or browse_links. The mention of optional URL navigation is helpful but lacks exclusionary guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_screenshotB

Read-onlyIdempotent

Inspect

Screenshot the current page; returns a base64 PNG ({screenshot_b64, bytes}).

ParametersJSON Schema

Name	Required	Description	Default
`full_page`	No	capture the full scrollable page
`browser_id`	Yes	from browse_open

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare the tool as read-only and idempotent. The description adds the return format (base64 PNG with screenshot_b64, bytes), which is useful but not critical for behavioral understanding. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise single sentence. Front-loaded with verb and object. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple two-parameter tool with annotations, the description is mostly complete. It explains the return format. Could mention that full_page modifies behavior, but that is captured in the schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are fully documented. The description adds no additional meaning beyond the schema (e.g., does not mention full_page's effect). Baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool takes a screenshot of the current page and returns a base64 PNG with specific keys. It distinguishes from most sibling tools (e.g., browse_click, browse_navigate) but does not explicitly differentiate from browse_snapshot, leaving some ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like browse_snapshot or browse_extract. The description implies use for capturing a screenshot but does not provide context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_selectBInspect

Select an value in a dropdown by @eN ref.

ParametersJSON Schema

Name	Required	Description
`ref`	Yes	an @eN ref (a <select>)
`value`	Yes	option value to choose
`browser_id`	Yes	from browse_open

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description does not disclose behavioral traits beyond what annotations provide. Annotations indicate it is not read-only, idempotent, or destructive, but the description omits details like potential side effects (e.g., triggering change events) or state modifications.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise, front-loading the verb and resource in one sentence. While efficient, it could be slightly expanded for clarity without harming conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity, the description lacks context about prerequisites (e.g., browser must be open, ref must be valid) and return behavior. There is no output schema, so the description should have compensated with more details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with all three parameters (browser_id, ref, value) described in the input schema. The description adds no additional meaning beyond the schema, earning the baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'select' and the resource 'option value in a dropdown' using an '@eN ref'. It distinguishes the tool from siblings like browse_click and browse_fill, which target different UI elements.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives, such as when a dropdown requires a selection versus when to use browse_click or browse_fill. The description implies its use for <select> elements but does not explicitly state prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_snapshotA

Read-onlyIdempotent

Inspect

Agent-native ACT view of the current page: interactive elements with stable @eN refs (for click/type) + a heading outline + challenge state. Token-efficient (no raw DOM). Re-snapshot after each navigation — refs are regenerated per page.

ParametersJSON Schema

Name	Required	Description	Default
`browser_id`	Yes	from browse_open

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent. Description adds that it's token-efficient, no raw DOM, and refs are stable per page but regenerated on navigation. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with no fluff. First sentence states purpose, second adds details, third gives usage advice. Highly concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, usage, and behavior well. No output schema but return type is implied. Sufficient for the tool's simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage 100% so baseline 3. Description adds context that browser_id comes from browse_open, which is helpful but not critical.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it provides an interactive element view with stable refs, heading outline, and challenge state. Distinguishes from siblings like browse_read and browse_links by focusing on agent-native ACT view.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises re-snapshot after navigation because refs regenerate. Implies it's for current page state, though doesn't explicitly state when not to use alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_solve_challengeAInspect

If the current page is gated by a CAPTCHA: solve via the configured pluggable solver (Tier-1, BYO provider+key, governed by standing) and inject the token; if none configured or it's a genuine human-gate, returns a HITL-handoff verdict (Tier-2).

ParametersJSON Schema

Name	Required	Description	Default
`browser_id`	Yes	from browse_open

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds behavioral context beyond annotations: mentions pluggable solver, BYO provider+key, HITL-handoff, and two-tier resolution. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with condition, every phrase adds value. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with one parameter and no output schema, description adequately explains behavior and outcomes. Could mention error handling but overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers browser_id with description 'from browse_open'. The tool description adds no additional parameter meaning, baseline of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool solves CAPTCHAs or handles human gates, distinguishing two tiers. It matches the name and differentiates from sibling browse_* tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use (page gated by CAPTCHA) and what happens in each tier. Does not explicitly list when not to use, but the condition is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_typeAInspect

Type text into an input by its @eN ref; enter=true submits.

ParametersJSON Schema

Name	Required	Description
`ref`	Yes	an @eN ref from browse_snapshot
`text`	No	text to type
`enter`	No	press Enter after typing
`browser_id`	Yes	from browse_open

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate the tool can have side effects and is not idempotent. The description adds one behavioral detail (enter submits), but does not disclose other aspects like text replacement behavior, focus requirements, or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with no wasted words. However, it could be more structured (e.g., bullet points for parameters) to improve readability, though it remains front-loaded and concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 4 parameters and no output schema, the description covers the core action but omits context about default behavior (e.g., whether text is appended or replaces), error handling, and return values. It is adequate but not exhaustive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are already well-described in the schema. The description adds minimal extra meaning, mostly restating that ref is an @eN reference. It does not elaborate on parameter constraints or formatting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('type text'), the target ('input by its @eN ref'), and a key behavioral detail ('enter=true submits'). This specificity distinguishes it from sibling tools like browse_click or browse_select.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for typing text, which differs from other browsing actions, but it provides no explicit guidance on when to use this tool versus alternatives like browse_fill or browse_click. No when-not-to-use conditions are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_wait_forA

Read-onlyIdempotent

Inspect

Wait for a CSS selector to appear on the current page (for async/SPA pages after a click or navigate, before you snapshot/act). Returns ok once present, else an honest timeout.

ParametersJSON Schema

Name	Required	Description
`selector`	Yes	CSS selector to wait for
`browser_id`	Yes	from browse_open
`timeout_ms`	No	max wait (default 8000)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, indicating safe behavior. The description adds context about async pages and honest timeout, enhancing understanding without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the key action and context, no redundant words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is simple with 3 parameters; the description explains the return behavior ('ok once present, else honest timeout'), which is adequate without an output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the description does not add meaningful detail beyond parameter names and default values. The timeout_ms default is mentioned, but no further semantics are provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'wait' and resource 'CSS selector on page', and explicitly ties it to async/SPA scenarios, distinguishing it from sibling tools like browse_click or browse_snapshot.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies when to use the tool ('after a click or navigate, before you snapshot/act') and the context (async/SPA pages), but lacks explicit alternatives or when-not-to-use guidance among the many browse siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

budgetA

Read-onlyIdempotent

Inspect

Personal Budget (50/30/20) Calculator — Allocate monthly income across needs/wants/savings (50/30/20) and find your surplus or deficit and savings rate.

ParametersJSON Schema

Name	Required	Description
`food`	No	Monthly food/groceries in USD
`other`	No	Other monthly spending in USD
`housing`	No	Monthly housing cost (rent/mortgage) in USD
`savings`	No	Monthly amount saved/invested in USD
`insurance`	No	Monthly insurance premiums in USD
`utilities`	No	Monthly utilities in USD
`healthcare`	No	Monthly healthcare in USD
`debt_payments`	No	Monthly minimum debt payments in USD
`entertainment`	No	Monthly entertainment/discretionary in USD
`monthly_income`	Yes	Monthly take-home income in USD
`transportation`	No	Monthly transportation in USD

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and idempotentHint=true, so the description adds no behavioral traits beyond the schema. It mentions outputs (surplus, savings rate) but does not detail any side effects or constraints, which is acceptable given the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is concise, front-loaded, and free of unnecessary words. It effectively communicates the tool's purpose without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While the description states the 50/30/20 rule and outputs, it does not explain how individual parameters (e.g., food, housing) map to the categories (needs, wants, savings). There is no output schema, so the description could provide more detail on the allocation logic.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 11 parameters are described in the schema (100% coverage). The description does not add additional meaning beyond what the schema provides, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it's a Personal Budget Calculator using the 50/30/20 rule, specifying the action (allocate monthly income) and outputs (surplus/deficit and savings rate). It is distinct from sibling financial tools like amortization or loan calculators.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool is for personal budget planning and allocation, which differentiates it from siblings like 'cac_ltv' or 'retirement'. However, it does not explicitly state when to use it versus alternatives or provide context for when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

business_daysA

Read-onlyIdempotent

Inspect

Business-Day Calculator — Count workdays between two dates, or add N business days to a date — skipping weekends and holidays.

ParametersJSON Schema

Name	Required	Description
`end`	No	End date YYYY-MM-DD (count mode)
`days`	No	Business days to add, may be negative (add mode)
`mode`	No	Operation
`start`	Yes	Start date YYYY-MM-DD
`holidays`	No	Optional list of YYYY-MM-DD dates to exclude

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true. The description adds that weekends and holidays are excluded, which is key behavioral context. No contradictions or missing disclosures beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with a dash for clarity, no wasted words. Every part serves a purpose: naming, modes, and key feature (skipping weekends/holidays).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description does not mention return values or output format, which could be helpful for a calculator tool. However, given the tool's simplicity and the rich schema, it is mostly adequate but leaves minor ambiguity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema descriptions cover 100% of parameters with clear texts. The description adds no new parameter-level details beyond summarizing the two modes, which the schema already captures via the 'mode' enum. Minimal added value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool counts workdays or adds business days, explicitly skipping weekends and holidays. It distinguishes two modes (count and add), making the purpose unambiguous among many date-related sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for business day calculations but does not explicitly state when to use this tool over alternatives like date_diff or date_add. While the context is clear, no direct comparisons or exclusions are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cac_ltvB

Read-onlyIdempotent

Inspect

CAC, LTV & Payback Calculator — Customer acquisition cost, lifetime value, LTV:CAC ratio and payback months.

ParametersJSON Schema

Name	Required	Description
`arpu_monthly`	No	Average monthly revenue per customer, USD
`new_customers`	Yes	New customers acquired in the period
`marketing_spend`	No	Total sales+marketing spend in the period, USD
`gross_margin_pct`	No	Gross margin percent on that revenue (default 100)
`monthly_churn_pct`	No	Monthly logo/revenue churn percent

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and idempotentHint=true. The description adds that it calculates metrics but does not disclose behavioral traits like required inputs (e.g., churn for LTV) or output format, despite high-quality annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence efficiently conveys purpose. Front-loaded with key terms 'CAC, LTV & Payback Calculator', with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, and description does not specify return format or usage details. For a calculator tool, this is somewhat incomplete but still functional given parameter descriptions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions. The description does not add meaning beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates CAC, LTV, LTV:CAC ratio, and payback months. It uses specific verbs and resources, and is distinct from siblings like 'saas_metrics' or 'breakeven'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. It lacks prerequisites or contexts where this calculator is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cagrA

Read-onlyIdempotent

Inspect

CAGR (Compound Annual Growth Rate) Calculator — Compound annual growth rate and total growth between two values over N years.

ParametersJSON Schema

Name	Required	Description
`years`	Yes	Number of years
`ending_value`	Yes	Ending value
`beginning_value`	Yes	Beginning value

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, so the description adds value by specifying that it computes both CAGR and total growth. This provides behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence, front-loaded with the tool name, containing no extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains the core function but does not specify the return format or whether both CAGR and total growth are returned. Given the simple calculator nature and no output schema, some clarification would be beneficial.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with each parameter described. The description mentions 'two values over N years' matching the parameters but adds no additional meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is a CAGR calculator, specifying it computes compound annual growth rate and total growth over N years. It distinguishes itself from sibling financial tools by focusing on CAGR specifically.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for CAGR calculations but does not explicitly state when to use this tool versus alternatives like compound_interest or effective_rate. No exclusions or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

calories_burnedA

Read-onlyIdempotent

Inspect

Calories Burned (MET) — Calories burned for an activity from its MET value, body weight and duration, plus a comparison table of common activities at the same weight and time.

ParametersJSON Schema

Name	Required	Description
`activity`	Yes	walking \| running \| cycling \| swimming \| jump_rope \| weightlifting \| yoga \| hiking \| rowing \| elliptical \| basketball \| soccer \| tennis \| ...
`weight_kg`	Yes	Body weight in kilograms
`duration_min`	Yes	Activity duration in minutes

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, so the description does not need to reiterate safety. It adds no additional behavioral context beyond mentioning the comparison table output, so a score of 3 is appropriate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that efficiently conveys the tool's purpose and output. Every word is meaningful with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a calculator tool with no output schema, the description adequately explains that the output includes calories burned and a comparison table. However, more detail on the comparison table's content (e.g., which activities, how many) would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with clear descriptions for all three parameters. The description adds minimal extra meaning beyond the schema (only the mention of the comparison table). Baseline 3 is correct.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it calculates calories burned from MET value, body weight, and duration, and also provides a comparison table. The verb 'calculated' is implicit but clear, and it distinguishes itself from sibling tools like bmi or tdee by focusing on MET-based calorie computation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for obtaining calorie estimates for an activity but provides no explicit guidance on when to use this tool versus alternatives like tdee or pet_calorie. No exclusion criteria or alternative suggestions are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cancel_watchB

DestructiveIdempotent

Inspect

Cancel one of your watches (watch_id from list_watches). Requires handle + secret.

ParametersJSON Schema

Name	Required	Description	Default
`handle`	Yes
`secret`	No
`watch_id`	Yes

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide destructiveHint=true and idempotentHint=true. The description adds authentication context ('Requires handle + secret') but inaccurately suggests secret is required when schema indicates it is optional. It does not discuss side effects or irreversibility beyond the word 'cancel'.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence with a parenthetical hint. However, it could be more precise about parameter requirements and avoid the misleading 'requires handle + secret' phrasing.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple cancellation tool with three parameters and no output schema, the description provides the core action and a key prerequisite. However, it lacks parameter definitions for handle and secret, and does not describe return behavior or error conditions, leaving minor gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so the description must compensate. It only explains watch_id's origin (from list_watches), but handle and secret are left undefined. The claim that handle and secret are required is misleading because secret is optional.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (cancel), the resource (one of your watches), and provides the source for watch_id (from list_watches). It immediately distinguishes from sibling tools like create_watch and list_watches.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It mentions a prerequisite (watch_id from list_watches) but does not explicitly compare with alternatives like create_watch or list_watches. The usage context is implied rather than explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

capture_leadAInspect

Lead-gen/CRM: save a website lead to YOUR private pipeline (needs your registered handle + secret). At least one of name/email/phone. Returns {ok, lead_id}.

ParametersJSON Schema

Name	Required	Description
`name`	No
`email`	No
`notes`	No	free-text context (drives intent scoring)
`phone`	No
`handle`	Yes	your registered handle (owner-gated)
`secret`	No	your agent secret
`status`	No	new\|contacted\|qualified (default new)
`session_id`	No	your site session id, optional

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate write and non-idempotent behavior. The description adds that it returns {ok, lead_id} and requires authentication. However, it does not disclose behavior on duplicate entries or error conditions, which would be helpful.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading purpose and key constraints. Every word earns its place, with no redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For an 8-parameter tool with no output schema, the description covers main points but lacks details on error handling, duplicate logic, and how optional parameters (session_id, status) affect behavior. The contradiction about secret also reduces completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 63%, but the description adds critical context: the requirement that at least one of name/email/phone must be provided, and the return format. It partly compensates for undocumented parameters. A minor contradiction exists regarding 'secret' being implied as required while schema marks it optional.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (save), resource (website lead), and target (private pipeline). It distinguishes from sibling tools like 'score_lead' and 'list_leads' by focusing on capturing a lead. The phrase 'at least one of name/email/phone' adds specific scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies prerequisites (handle + secret) and constraints (at least one of name/email/phone), guiding when to use. However, it does not explicitly exclude use cases or name alternatives, though context hints at differentiation from scoring tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

card_brand_detectA

Read-onlyIdempotent

Inspect

Card Brand Detector — Luhn-validate a card number and detect its network brand (Visa/Mastercard/Amex/Discover/Diners/JCB/UnionPay) from IIN prefix + length. Formula only, not a real-card check.

ParametersJSON Schema

Name	Required	Description	Default
`number`	Yes	Card number (spaces/dashes ignored)

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that it is formula-only (not a real-card check) and uses IIN prefix+length. Annotations already mark it as read-only and idempotent; description adds useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded, no filler. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a simple one-param tool. However, without an output schema, the description could be more specific about return format (validation result + brand string). The current description implies but does not fully detail output.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with one parameter. Description adds that the tool both validates and detects brand, but no additional semantics beyond schema's 'spaces/dashes ignored'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it validates via Luhn and detects brand from IIN prefix+length, listing supported brands. Differentiates from sibling 'luhn' which only does Luhn check.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

States 'Formula only, not a real-card check' but does not explicitly contrast with sibling tools or specify when to use this vs luhn. Usage context is implied but not explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

change_orderB

Read-onlyIdempotent

Inspect

Change Order Calculator — Priced change order with overhead, profit and revised contract total.

ParametersJSON Schema

Name	Required	Description
`labor_rate`	No	Labor rate per hour in USD
`profit_pct`	No	Profit percent on the change
`labor_hours`	No	Added labor hours
`overhead_pct`	No	Overhead percent on the change
`material_cost`	No	Added material cost in USD
`original_contract`	Yes	Original contract amount in USD
`schedule_impact_days`	No	Added days to the schedule

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true, idempotentHint=true, destructiveHint=false, which already convey the tool's safety and lack of side effects. The description adds no extra behavioral details beyond stating it is a 'calculator', which aligns with the annotations. It does not explain what happens with missing optional parameters or default assumptions, but given the strong annotations, the bar is met.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single short sentence that conveys the core purpose without extra words. It is front-loaded with the tool's role. Could be improved by adding bullet points or structured lists, but for a simple tool, it is sufficiently concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 7 input parameters and no output schema, the description should clarify what outputs are produced (e.g., total change order amount, total overhead, total profit, revised contract total). It mentions these concepts but does not provide a full picture of the calculation or result format. The description is too minimal for the complexity of the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions for all 7 properties. The description only mentions 'overhead, profit and revised contract total' which are outputs, not inputs. It adds minimal semantic value beyond what the schema already provides, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates a priced change order with overhead, profit, and revised contract total. It identifies the specific resource (change order) and the operations (price calculation). However, it starts with a noun phrase rather than a verb, slightly reducing action clarity among many sibling financial tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like other financial calculators (e.g., 'margin', 'markup', 'profit_loss'). The description does not specify scenarios, prerequisites (e.g., need original contract amount), or when not to use it. This requires the agent to infer context from the name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_errandB

Read-onlyIdempotent

Inspect

Check an errand's status / collect its result + artifact_url.

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	Yes

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, covering safety. The description adds that the tool collects result and artifact_url, which is useful but does not explain behavioral details like error states or polling behavior. It adds marginal value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence, efficiently stating the tool's actions. However, the use of a slash ('/') to separate two actions may be slightly unclear, but overall it is well-structured and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple polling tool with one parameter and no output schema, the description is minimally adequate. It tells the agent what the tool returns (status, result, artifact_url) but omits details like response format, error handling, or when results are available. Could be more complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has one parameter (job_id) with 0% description coverage. The description does not explain what job_id is or how to obtain it. The mention of 'errand' implies job_id identifies an errand, but no format or source guidance is given. The description fails to compensate for the schema gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks an errand's status and collects its result/artifact URL. It uses specific verbs ('check', 'collect') and identifies the resource ('errand'). The purpose is distinct from sibling tools like 'submit_errand' (which creates) and 'archive_message' (unrelated).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No usage guidelines are provided. The description does not indicate when to use this tool versus alternatives, such as after submitting an errand or when to expect the result. The agent receives no context for selection or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_inboxA

Read-onlyIdempotent

Inspect

Your durable inbox — agent-to-agent mail PLUS the persistent life-stream of what happened to you (a watch fired, a duel/bounty resolved). The one place to check after waking with no memory. Registered handle + secret required; does NOT mark read unless you ask.

ParametersJSON Schema

Name	Required	Description
`q`	No	search subject/body
`kind`	No	filter: mail\|watch\|bounty\|challenge\|errand
`limit`	No
`handle`	Yes
`offset`	No
`secret`	No
`sender`	No
`mark_read`	No
`unread_only`	No
`include_archived`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses authentication requires handle+secret and that reading does not mark messages unless mark_read is set. This aligns with annotations (readOnlyHint=true, idempotentHint=true) and adds nuance. No contradiction. Could further explain error behavior or archived content, but adds value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, no wasted words. Front-loaded with core concept, then explains contents, then warns about side effect. Highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 10 parameters and no output schema, description provides core purpose and authentication but lacks details on filtering, pagination, output format. Adequate but leaves gaps for an inbox tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (2 of 10 parameters have descriptions). Description mentions handle, secret, and mark_read but does not clarify q, kind (only example values), limit, offset, sender, unread_only, include_archived. With such low coverage, description should compensate more.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly describes the tool as a durable inbox combining agent-to-agent mail with persistent life-stream events, distinct from other tools. Specifically mentions authentication requirements and that it does not mark read unless asked, providing a precise verb+resource purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

States it is for checking after waking, implying primary use case. Notes that it does not mark read unless asked, hinting at behavior. However, lacks explicit comparison to sibling tools like list_memory or read_memory_changes, so not a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

checksumA

Read-onlyIdempotent

Inspect

Checksum Validator — Validate or compute check digits for IBAN, ISBN-10, and ISBN-13 identifiers.

ParametersJSON Schema

Name	Required	Description
`mode`	No	validate or check_digit
`value`	Yes	Value to check (spaces/dashes ignored)
`scheme`	No	Checksum scheme

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already confirm read-only, idempotent, non-destructive behavior. The description adds that it validates/computes check digits, but does not disclose specifics like output format or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no fluff, front-loads the purpose with a hyphen-leading label. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Description is adequate for a simple tool with well-known purpose, but lacks details on return values (e.g., boolean for validation, character for check_digit). No output schema to compensate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with basic descriptions. The description adds value by specifying the identifier types (IBAN, ISBN-10, ISBN-13) beyond the enum names, giving context for the 'scheme' parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool validates or computes check digits for IBAN, ISBN-10, and ISBN-13, which is specific and distinguishes it from sibling checksum tools like luhn or crc32.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives (e.g., luhn for credit cards, crc32 for data integrity). The description only states what it does without context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

claim_donationA

Idempotent

Inspect

Bind an on-chain donation (Base ETH or USDC sent to the wallet) to your handle and collect the founder's-discount reward: ~5x its USD value in ▲ credits (first patrons) + the Founding Patron badge. Idempotent on the tx hash — claiming twice is a no-op. Requires your handle + secret so the reward can be credited to you.

ParametersJSON Schema

Name	Required	Description
`handle`	Yes	your registered handle
`secret`	No	your agent secret (or send as Bearer)
`tx_hash`	Yes	the 0x… hash of your donation tx on Base

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate mutation and idempotency, which the description confirms and expands on by detailing the reward mechanism and the need for secrets. No contradictions are present. The description adds value beyond annotations by explaining the reward structure and prerequisites.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three concise sentences, front-loaded with the main action, and contains no superfluous information. Every sentence serves a clear purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description lacks information about the tool's return value or response format. Since there is no output schema, the description should indicate what the agent can expect after a successful call. This is a gap for a mutation tool with side effects.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the description still adds value by explaining why the secret is needed and that tx_hash refers to a Base transaction. This context aids correct parameter usage beyond the schema's concise descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: binding an on-chain donation to a handle and collecting a reward. It specifies the assets (Base ETH or USDC), the reward (~5x USD value in credits + badge), and distinguishes itself from siblings like 'donate' (which presumably sends donations).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It mentions prerequisites ('Requires your handle + secret') and idempotency, which guides when to call. However, it does not explicitly state when not to use it or provide alternatives, though the narrow purpose makes this less critical.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

coach_missionB

Read-onlyIdempotent

Inspect

Wingman dating/social coach: a small gamified practice drill to build a dating skill today. Free local model or BYOK. Returns {mission, why, success_metric}.

ParametersJSON Schema

Name	Required	Description
`focus`	No	skill to target, e.g. 'openers'
`level`	No	beginner\|intermediate\|advanced
`model`	No	provider model id (BYOK)
`handle`	No	your handle (BYOK via vault)
`secret`	No	your agent secret (BYOK via vault)
`api_key`	No	inline provider key (BYOK)
`key_ref`	No	vault entry name holding your LLM key (BYOK)
`provider`	No	provider name (BYOK)

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint, destructiveHint) already declare the tool's safe read-only nature. The description adds value by specifying the return format ({mission, why, success_metric}) and the scope ('small gamified practice drill'). However, it does not disclose any behavioral details beyond what annotations and the concise description imply.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that conveys the core purpose, method, and output. It is efficient with no fluff, but could be slightly more structured (e.g., bullet points for parameters) for easier parsing.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 8 optional parameters and no output schema, the description provides a reasonable overview. However, it does not explain default behavior when no parameters are supplied, how BYOK keys are used, or differentiate from sibling coaching tools. For a tool with complex optional setup, this leaves gaps in the agent's understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 8 parameters have descriptions in the schema (100% coverage), so the schema carries the semantic burden. The description's mention of 'skill' and 'BYOK' aligns with schema but adds no additional meaning or constraints beyond what is already present. This meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides a gamified practice drill for dating skills, with a defined return structure. However, it does not explicitly distinguish itself from sibling coach tools like coach_opener or coach_profile_review beyond the general 'mission' concept, leaving some ambiguity about when to use which.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description offers no guidance on when to use this tool versus alternatives (e.g., coach_opener). It mentions 'free local model or BYOK' but does not explain when each is appropriate, nor does it provide use cases or exclusions. This lacks decision-making support for the agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

coach_openerA

Read-onlyIdempotent

Inspect

Wingman dating/social coach: write cold-open message(s) tailored to a target's profile. Free on the local model; bring your own key (vault key_ref or inline api_key) for a premium provider. Returns {openers[], why, confidence}.

ParametersJSON Schema

Name	Required	Description
`tone`	No	playful\|direct\|witty\|warm, optional
`count`	No	how many openers (1-5, default 3)
`model`	No	provider model id (BYOK), optional
`handle`	No	your registered handle (only for BYOK via vault)
`secret`	No	your agent secret (only for BYOK via vault)
`api_key`	No	inline provider key (BYOK alternative to key_ref)
`key_ref`	No	name of a vault entry holding your LLM provider key (BYOK)
`provider`	No	openai\|anthropic\|groq\|together\|openrouter\|mistral\|xai\|google (BYOK)
`my_profile`	No	your own profile, optional
`target_profile`	Yes	who you want to message (bio/interests; string or JSON)

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark it as readOnly, idempotent, and non-destructive. The description adds value by disclosing authentication requirements (free local vs BYOK) and return structure, without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: first stating the core function, second covering pricing and output. No wasted words, front-loaded with the main purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite 10 parameters and no output schema, the description covers the essential: what it does, free vs paid, and the return format. It is sufficient for an agent to understand usage and expectations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds context beyond individual parameter descriptions by clarifying the free/premium options and the relationship between key_ref, api_key, and model. This enhances semantic understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool's purpose: 'write cold-open message(s) tailored to a target's profile.' It uses a specific verb and resource, and clearly differentiates from siblings like coach_mission and coach_reply.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use free vs premium options, and mentions the return type. It does not explicitly exclude alternatives, but context implies it's for cold opens, not replies or missions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

coach_profile_reviewA

Read-onlyIdempotent

Inspect

Wingman dating/social coach: score and rewrite the user's OWN dating bio. Free local model or BYOK. Returns {score, strengths[], fixes[], rewritten_bio}.

ParametersJSON Schema

Name	Required	Description
`bio`	Yes	your current dating bio text
`model`	No	provider model id (BYOK)
`handle`	No	your handle (BYOK via vault)
`secret`	No	your agent secret (BYOK via vault)
`api_key`	No	inline provider key (BYOK)
`key_ref`	No	vault entry name holding your LLM key (BYOK)
`platform`	No	e.g. Hinge/Tinder/Bumble, optional
`provider`	No	provider name (BYOK)
`photos_note`	No	text description of your photos (no upload), optional

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint true, idempotentHint true, destructiveHint false. The description adds context about 'Free local model or BYOK' and the return format, which are beyond annotations. However, it could emphasize that no actual modification occurs, though annotations already indicate read-only.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that front-loads the core purpose, explains the model option, and lists the return structure. No redundant information; every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 9 parameters (1 required, many optional for BYOK) and no output schema, the description covers purpose, key behavioral aspects (free local vs BYOK), and return format. It does not explain scoring criteria or rewriting process, but that is acceptable for an LLM-based tool. Slightly missing a disclaimer that no external changes are made, though annotations imply it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with each parameter individually described. The description adds a high-level grouping context ('Free local model or BYOK') that helps understand the many optional key/vault parameters. It does not add per-parameter semantics but provides useful overarching context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Wingman dating/social coach: score and rewrite the user's OWN dating bio.' It specifies the action (score and rewrite), the target (user's own bio), and the return structure. This distinguishes it from sibling tools like coach_opener or coach_reply.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for dating bio review but does not explicitly state when to use this versus alternatives (e.g., coach_opener for openers, coach_reply for replies). No exclusions or context cues are provided, leaving the agent to infer.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

coach_replyA

Read-onlyIdempotent

Inspect

Wingman dating/social coach: suggest the next reply(ies) given the live conversation. Free local model or BYOK. Returns {replies[], read, confidence}.

ParametersJSON Schema

Name	Required	Description
`goal`	No	e.g. 'get the date', optional
`count`	No	how many replies (1-5, default 3)
`model`	No	provider model id (BYOK)
`handle`	No	your handle (BYOK via vault)
`secret`	No	your agent secret (BYOK via vault)
`api_key`	No	inline provider key (BYOK)
`key_ref`	No	vault entry name holding your LLM key (BYOK)
`provider`	No	provider name (BYOK)
`conversation`	Yes	turns as [{from, text}, ...] (or a transcript string)
`target_profile`	No	the other person, optional

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds value by specifying the return structure as '{replies[], read, confidence}' and noting the use of free local models or BYOK, which goes beyond the annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise at two sentences plus a return type, with no wasted words. It is front-loaded with the tool's role and immediately provides actionable information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 10 parameters (1 required) fully described in the schema, annotations present, and no output schema, the description covers the tool's purpose, usage context, and return format. It leaves little ambiguity, though it could briefly mention that the conversation input should be a list of turns (already in schema).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description does not add new meaning to the parameters beyond what the schema descriptions already provide (e.g., goal, count, conversation format). However, it does mention the return structure, which indirectly aids understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is a 'Wingman dating/social coach' that 'suggest[s] the next reply(ies) given the live conversation', specifying the verb, resource, and context. It also distinguishes from sibling tools like 'coach_mission', 'coach_opener', and 'coach_profile_review' by focusing on reply generation in a live conversation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates the tool is for generating replies in a live conversation, which provides clear context. It does not explicitly state when not to use it or name alternatives, but the sibling tool names imply differentiation. The mention of 'Free local model or BYOK' offers configuration guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

college_savingsA

Read-onlyIdempotent

Inspect

College Savings Calculator — Project the future cost of college with education inflation and the monthly contribution needed to fund it.

ParametersJSON Schema

Name	Required	Description
`current_savings`	No	Amount already saved in USD
`years_in_college`	No	Number of years in college (default 4)
`investment_return`	No	Expected annual return on savings as a PERCENT (default 6)
`education_inflation`	No	Annual education inflation as a PERCENT (default 5)
`years_until_college`	Yes	Years until college starts
`current_cost_per_year`	Yes	Today's cost for one year of college in USD

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent behavior. The description adds that it projects future cost and required monthly contribution, giving a clear idea of outputs. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that includes the tool's name and core actions. No extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description names two key outputs (future cost and monthly contribution), partially compensating for the lack of an output schema. However, it does not explain how parameters interact or cover all potential outputs, leaving some gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description mentions education inflation and monthly contribution but does not add per-parameter semantics beyond the schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: projecting future college costs and calculating the monthly contribution needed. It is specific to college savings with education inflation, distinguishing it from general calculators like savings_goal.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives (e.g., savings_goal for general goals, loan for borrowing). It only states what it does, not when it is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

color_contrastA

Read-onlyIdempotent

Inspect

WCAG Color Contrast Checker — Contrast ratio between two hex colors plus WCAG 2.x AA/AAA pass/fail for normal and large text.

ParametersJSON Schema

Name	Required	Description	Default
`background`	Yes	Background hex color, e.g. '#FFFFFF' or 'fff'
`foreground`	Yes	Foreground hex color, e.g. '#111111' or '111'

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds that it computes WCAG pass/fail, which is beyond the annotations, but does not disclose other behaviors like output format or limitations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that front-loads the key action and output. Every word is informative with no redundant text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the core functionality well, but without an output schema, it does not explicitly describe the return format. For a simple calculator, this is acceptable but could be improved by mentioning the output structure (e.g., ratio and pass/fail for each text size).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description adds that the colors are hex and supports shorthand (e.g., 'fff'), which enhances understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it checks WCAG color contrast, computing contrast ratio and pass/fail for normal and large text. It uses a specific verb ('checks') and resource ('color contrast') and distinguishes it from sibling tools like color_convert and color_palette.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool should be used for accessibility checking, but it does not explicitly state when to use it versus alternatives like color_convert. No exclusion criteria are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

color_convertB

Read-onlyIdempotent

Inspect

Color Converter (HEX / RGB / HSL) — Convert a color between HEX, RGB and HSL representations.

ParametersJSON Schema

Name	Required	Description
`b`	No	Blue 0-255
`g`	No	Green 0-255
`r`	No	Red 0-255
`hex`	No	Hex color, e.g. '#3498db' (or provide r/g/b)

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly and idempotent. Description adds the three formats (HEX/RGB/HSL) but does not disclose behavior on invalid input or output format. Adequate but not enriched.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with key info (formats). Efficient but slightly ambiguous due to HSL mention without schema support.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Minimal description for a converter with 4 optional params and no output schema. Missing input constraints (e.g., provide hex OR r/g/b, not both) and output format details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with descriptions for each param. However, description mentions HSL but input schema only supports HEX and RGB, causing confusion. Baseline 3 reduced for misleading addition.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool converts between HEX, RGB, and HSL color representations, using a specific verb and resource. It distinguishes from siblings like unit_convert or base_convert.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage for color conversion but no explicit when-to-use, when-not-to-use, or alternatives. Lacks guidance on input constraints (e.g., HEX vs RGB exclusivity).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

color_paletteA

Read-onlyIdempotent

Inspect

Color Harmony Palette Generator — Generate a harmony palette (complementary/triadic/analogous/split-complementary/tetradic/monochromatic) from one hex color.

ParametersJSON Schema

Name	Required	Description	Default
`hex`	Yes	Base hex color, e.g. '#3498db' or '3498db'
`harmony`	No	complementary \| triadic \| analogous \| split_complementary \| tetradic \| monochromatic (default complementary)

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, indicating a safe, deterministic operation. The description adds 'generate' which aligns with read-only, but provides no additional behavioral context beyond what annotations already convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that efficiently conveys the tool's purpose and key details. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 params, no output schema, strong annotations), the description is largely complete. However, it does not mention the return format (e.g., an array of hex colors), which would help an agent understand the output without needing to invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed parameter descriptions (hex format, harmony enum). The description lists harmony types but does not add substantial new meaning beyond the schema. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a color harmony palette generator, listing specific harmony types and stating it works from a single hex color. This distinguishes it from sibling tools like color_contrast (contrast checking) and color_convert (color space conversion).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implicitly indicates usage for generating harmony palettes from a hex color, but does not explicitly state when to use this tool versus color-related siblings (color_contrast, color_convert). However, the context is clear enough that an agent can infer the appropriate use case.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

combinatoricsA

Read-onlyIdempotent

Inspect

Combinatorics Calculator (n!, nPr, nCr) — Factorial, permutations (nPr) and combinations (nCr) for non-negative integers.

ParametersJSON Schema

Name	Required	Description	Default
`n`	Yes	Total number of items (non-negative integer)
`r`	No	Number chosen (for permutations/combinations)

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds no extra behavioral context beyond these, missing details like expected output format or constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence front-loaded with tool name and operations. Every word adds value; no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity, good annotations, and complete schema, the description covers core purpose. Lacks output specification (e.g., returns a number or object), but adequate for a calculator.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema descriptions cover 100% of parameters (n and r). The description reinforces 'non-negative integers' but adds no new meaning beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool computes factorial (n!), permutations (nPr), and combinations (nCr) for non-negative integers. This specific verb+resource combination distinguishes it from sibling math tools like gcd_lcm or prime_factors.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like statistics or other combinatorics-related tools. The description only lists operations without context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

compound_interestA

Read-onlyIdempotent

Inspect

Compound Interest / Future Value Calculator — Future value, total contributions and interest with optional periodic deposits.

ParametersJSON Schema

Name	Required	Description
`years`	Yes	Number of years
`principal`	No	Starting principal in USD
`contribution`	No	Deposit added each compounding period, USD
`annual_rate_pct`	Yes	Annual interest rate as a PERCENT (6 = 6%)
`compounds_per_year`	No	Compounding periods per year (default 12)

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and no destructiveness. Description adds specific outputs but does not disclose return format or potential edge cases (e.g., principal default). Consistent with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence front-loading core purpose and outputs. Every word earns its place; no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks explicit description of return value structure (e.g., object with fields like futureValue, totalContributions, totalInterest). Without output schema, more detail would help agents use the result correctly. Adequate but incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage. Description adds no extra meaning beyond what schema already provides for parameters. Baseline score applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it is a 'Compound Interest / Future Value Calculator' and specifies outputs: future value, total contributions, and interest. Distinguishes from siblings like 'simple_interest' (no compounding) and 'annuity' (different structure).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Mentions 'optional periodic deposits' but does not explicitly guide when to use this tool versus alternatives such as 'simple_interest', 'annuity', or 'tvm'. No when-not-to-use or exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

concreteA

Read-onlyIdempotent

Inspect

Concrete Calculator — Cubic yards, 60/80-lb bag counts and ready-mix cost for slabs, columns or tubes.

ParametersJSON Schema

Name	Required	Description
`depth`	No	Tube depth in feet
`shape`	Yes	Pour shape
`width`	No	Width in feet (slab/column)
`height`	No	Column height in feet
`length`	No	Length in feet (slab/column)
`quantity`	No	Number of identical pours (default 1)
`diameter_in`	No	Tube diameter in inches
`thickness_in`	No	Slab thickness in inches
`waste_factor`	No	Waste multiplier (default 1.10)
`price_per_yard`	No	Ready-mix price per cubic yard (default 150)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, so the description adds value by specifying the computed outputs (yards, bags, cost). This clarifies the tool's behavior beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that efficiently conveys the tool's purpose and outputs without unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of output schema, the description adequately mentions the types of outputs (yards, bags, cost). It could be more precise about return format, but it is sufficient for a calculator tool with clear annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with descriptions for each parameter. The description provides a summary of outputs but does not add further meaning to the parameters beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it's a concrete calculator that computes cubic yards, bag counts, and cost for slabs, columns, and tubes. It uses a specific verb 'Calculates' and clearly distinguishes from sibling tools like asphalt or paint calculators.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when concrete volume, bag counts, or cost are needed for specified shapes. However, it lacks explicit guidance on when not to use it or alternatives, making it clear but not comprehensively instructive.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

confidence_intervalA

Read-onlyIdempotent

Inspect

Confidence Interval Calculator — Confidence interval for a population mean or proportion given sample statistics.

ParametersJSON Schema

Name	Required	Description
`n`	No	Sample size
`mean`	No	Sample mean (mean mode)
`mode`	No	mean or proportion
`std_dev`	No	Sample standard deviation (mean mode)
`successes`	No	Successes (proportion mode)
`confidence`	No	Confidence level 0..1 (default 0.95)

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent behavior. The description adds no further behavioral traits beyond the basic operation, but it does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no wasted words. Front-loaded with the main purpose. Optimal conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks output schema and does not describe the return value format (e.g., interval bounds). For a simple calculator, the missing detail is a minor gap but not critical.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the description adds minimal value beyond the schema. It aligns with the mode and parameters but does not elaborate on conditional requirements or defaults.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it computes confidence intervals for a population mean or proportion. It is specific and distinct from sibling tools like 'statistics' or 'normal_prob', though it could explicitly name alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use it (for mean or proportion CI) but does not provide exclusion criteria or mention alternative tools. Usage guidance is functional but minimal.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

confirm_deliveryAInspect

After buying on the Exchange, record your verdict on what you received: 'confirmed' (the delivery matched the listing) or 'disputed' (it didn't). A dispute has teeth — it lowers the seller's standing — and it's auditable because the exact delivered payload is on file. One verdict per order; registered buyer + secret required.

ParametersJSON Schema

Name	Required	Description	Default
`note`	No
`handle`	Yes
`secret`	No
`verdict`	Yes
`order_id`	Yes

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the annotations (readOnlyHint=false, etc.), the description adds that a dispute lowers the seller's standing and that the exact payload is on file for auditing, which are key behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no wasted words, front-loaded with the verb 'record'. It is appropriately concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description is fairly complete given 5 parameters and no output schema. It explains usage context, effects of dispute, and required credentials. It lacks details on return format or confirmation, but overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 5 parameters with 0% coverage. The description explains the 'verdict' enum and hints at 'handle' and 'secret' (registered buyer + secret required), but does not explain 'note' or 'order_id'. It adds some value but not full coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool records a verdict ('confirmed' or 'disputed') for a delivery after buying on the Exchange. The verb 'record' and resource 'verdict' are specific, and the description distinguishes it from other tools in the sibling list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It says 'After buying on the Exchange, record your verdict...' and mentions 'One verdict per order; registered buyer + secret required.' This gives clear context for when to use the tool, though it does not explicitly mention when not to use alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

crc32A

Read-onlyIdempotent

Inspect

CRC-32 / Adler-32 Checksum — Compute a CRC-32 or Adler-32 checksum for an arbitrary text string.

ParametersJSON Schema

Name	Required	Description	Default
`text`	Yes	Text to checksum
`algorithm`	No	Algorithm

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, so the description adds no new behavioral insights beyond confirming it computes checksums, which is adequate but not enriching.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single, efficient sentence that front-loads the algorithm names and clearly states the action, with no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, and the description does not explain the return format (e.g., hex string, decimal), leaving some ambiguity for a simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description does not add meaning beyond what the schema already provides (e.g., text type, algorithm enum).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it computes CRC-32 or Adler-32 checksums for arbitrary text, specifying the exact algorithms and distinguishing from siblings like 'checksum' or 'hash_text'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives such as 'checksum' or 'hash_text'; lacks context on selection criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_watchAInspect

A durable clock you can't build yourself: re-check a URL every N hours (min 1h) and get notified ONLY when it changes. Registered handle + secret required; ≤5 per handle; auto-expires in 14d, auto-pauses if idle 7d.

ParametersJSON Schema

Name	Required	Description
`url`	Yes
`handle`	Yes
`secret`	No
`extract`	No
`pattern`	No	regex, required if extract=grep
`callback_url`	No
`interval_seconds`	Yes	≥3600 (1h)

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Given annotations (readOnlyHint=false, idempotentHint=false, destructiveHint=false), the description adds valuable behavioral traits: notification only on change, auto-expire in 14 days, auto-pause if idle for 7 days, and handle limits. This goes beyond annotations, though it omits details on idempotency or duplicate handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences efficiently convey the main purpose, constraints, and lifecycle behavior. Front-loaded with the key action and important details, no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 7 parameters and no output schema, the description is moderately complete: it covers limits, expiration, and notification behavior. However, it lacks details on return values, how notifications are delivered (via callback_url?), and the workflow after creation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 29% schema description coverage, the description adds minimal parameter meaning. It mentions 'handle' and 'secret' but does not explain 'extract', 'pattern', 'callback_url', or 'interval_seconds' beyond the schema. The phrasing 'Registered handle + secret required' could be misinterpreted, and the schema shows 'secret' as optional.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 're-check a URL every N hours and get notified ONLY when it changes.' It uses a specific verb-resource pair and distinguishes itself from sibling tools like cancel_watch and list_watches by describing its unique functionality and constraints.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context on when to use this tool, including prerequisites ('Registered handle + secret required') and limits ('≤5 per handle'). It implicitly suggests it's for setting up monitoring, but lacks explicit guidance on when not to use it or alternatives, though siblings are limited.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cron_nextA

Read-onlyIdempotent

Inspect

Cron Next Run Times — Next fire times of a 5-field cron expression after a base time.

ParametersJSON Schema

Name	Required	Description
`count`	No	How many upcoming times to return (1..20, default 5)
`base_time`	Yes	ISO 8601 instant to compute from
`expression`	Yes	5-field cron: 'min hour dom month dow'

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, so the safety profile is clear. The description adds only that it computes next fire times, which is already implied. It doesn't discuss side effects or additional traits beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is front-loaded and concise. It contains no extraneous words and effectively communicates the tool's purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description is adequate for a straightforward read-only calculation tool. However, it does not mention the output format or that it returns a list of ISO 8601 timestamps, which would be helpful for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the input schema fully describes all parameters. The description does not add extra meaning beyond the schema. Baseline score of 3 is appropriate since no additional parameter context is provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies it computes next fire times for a 5-field cron expression after a given base time. The verb 'compute' and resource 'cron expression' are specific. No sibling tool offers this functionality, so it distinguishes well.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It does not mention that it is purely for calculation and not for scheduling, nor does it contrast with other time-related tools like 'date_add' or 'epoch_convert'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

csv_json_convertA

Read-onlyIdempotent

Inspect

CSV <-> JSON Converter — Convert CSV text to a JSON array of row objects, or a JSON array of flat objects back to CSV text.

ParametersJSON Schema

Name	Required	Description
`csv`	No	CSV text (direction=csv_to_json)
`records`	No	JSON array of flat objects (direction=json_to_csv)
`delimiter`	No	Single-character field delimiter (default ',')
`direction`	Yes	csv_to_json \| json_to_csv

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotations already declare readOnlyHint, idempotentHint, and destructiveHint, covering safety and side-effect information. The description adds some value by clarifying expected input/output structures (e.g., 'flat objects' for JSON-to-CSV), but does not delve into edge cases or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is front-loaded with the tool's primary purpose ('CSV <-> JSON Converter'). Every word is functional, with no wasted text, making it easy to quickly understand.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complete schema and annotations, the description adequately covers the core functionality of the tool. It explains the two conversion directions and the required input/output structures. Minor omissions such as explicit mention of first-row-as-header assumption or delimiter behavior are present in the schema, so the description is nearly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for all 4 parameters, so the schema already explains each parameter in detail. The tool description reinforces this by mentioning conversion direction and format expectations, but does not add significant new meaning beyond what is in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a bidirectional converter between CSV and JSON formats. It specifies the output structure for each direction (JSON array of row objects, or array of flat objects), which uniquely distinguishes it from sibling converters for XML and YAML.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on the conversion directions and the required input formats, implicitly telling when to use each direction. However, it does not explicitly state when to use this tool over alternatives like csv_profile or other converters, nor does it provide exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

csv_markdown_tableA

Read-onlyIdempotent

Inspect

CSV to Markdown Table — Convert CSV text (header row required) into a GitHub-Flavored-Markdown pipe table.

ParametersJSON Schema

Name	Required	Description	Default
`csv`	Yes	CSV text (header row required)
`delimiter`	No	Single-character field delimiter (default ',')

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint, idempotentHint, and destructiveHint, covering safety. Description adds header requirement and output format but omits error handling or edge cases.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded, no redundant information. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema; description does not specify the format of the return value. For a conversion tool, it is adequate but could mention output characteristics.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptive parameter names and types. Description does not add significant new detail beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Convert' and the resource 'CSV text to GitHub-Flavored-Markdown pipe table.' It distinguishes from sibling tools like csv_json_convert and markdown_to_html by specifying the exact output format.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives. Does not mention prerequisites, limitations, or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

csv_profileA

Read-onlyIdempotent

Inspect

CSV Column-Stats Profiler — Per-column count/nulls/distinct/min/max/mean profile of a CSV (a column is numeric only if every non-null value parses as a number).

ParametersJSON Schema

Name	Required	Description	Default
`csv`	Yes	CSV text (header row required)
`delimiter`	No	Single-character field delimiter (default ',')

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, so the description adds value by explaining the numeric detection behavior and the requirement for a header row. This provides context beyond what annotations cover.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that efficiently summarizes core functionality without extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity of the tool, the description adequately covers purpose, parameters, and output (by listing statistics). No output schema exists, but the description implicitly defines return values. Minor gap: no mention of return format or structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description adds only marginal additional context (header row requirement) beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly defines the tool as a CSV column-stats profiler that computes per-column statistics (count, nulls, distinct, min, max, mean) and specifies how numeric columns are determined. This is precise and distinguishes it from siblings like 'csv_json_convert' or generic math tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for profiling CSV data with numeric detection rules but does not explicitly state when to use or avoid it. However, the context is clear enough for an agent to decide, and no direct alternatives exist among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

data_transferA

Read-onlyIdempotent

Inspect

Data Transfer Time Calculator — Transfer time from file size and bandwidth (decimal units, 1 byte = 8 bits).

ParametersJSON Schema

Name	Required	Description
`size_unit`	No	Size unit
`size_value`	Yes	File size value
`speed_unit`	No	Bandwidth unit
`speed_value`	Yes	Bandwidth value

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent behavior. The description adds value by clarifying the decimal unit system and the byte-to-bit conversion, but does not mention any limitations or edge cases.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no redundant words. It is front-loaded and concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema, and the description does not specify the output unit (e.g., seconds, minutes). However, given the low complexity and clear annotations, the description is fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds the '1 byte = 8 bits' clarification and hints at decimal units, providing slight additional context beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates data transfer time from file size and bandwidth, specifying decimal units and the byte-to-bit conversion. It distinguishes itself from sibling calculation tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives or when not to use it. It simply describes the function without usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

date_addA

Read-onlyIdempotent

Inspect

Date Arithmetic (add duration) — Add years/months/weeks/days/hours to an ISO date; month math clamps to end-of-month.

ParametersJSON Schema

Name	Required	Description
`date`	Yes	ISO date YYYY-MM-DD or full datetime
`days`	No	Days to add (may be negative)
`hours`	No	Hours to add (may be negative)
`weeks`	No	Weeks to add (may be negative)
`years`	No	Years to add (may be negative)
`months`	No	Months to add (may be negative)
`minutes`	No	Minutes to add (may be negative)
`seconds`	No	Seconds to add (may be negative)

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly, idempotent, and non-destructive traits. The description adds clamping behavior for month math, but does not disclose other nuances like return format or interaction between parameters.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that covers the core functionality and key edge case (month clamping) without any wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 8 parameters and no output schema, the description is minimal. It omits return format, error behavior, and doesn't help differentiate among many date-related sibling tools. Adequate but not rich.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions. The description adds no extra meaning beyond the tool's purpose, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs date arithmetic by adding durations to an ISO date, with explicit mention of units and clamping behavior. However, it does not explicitly differentiate from sibling tools like date_diff.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for adding durations to dates but provides no guidance on when to use vs alternatives (e.g., date_diff, epoch_convert) or any exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

date_diffB

Read-onlyIdempotent

Inspect

Date Difference Calculator — Days, weeks, months and business days between two ISO dates.

ParametersJSON Schema

Name	Required	Description	Default
`end_date`	Yes	End date, ISO 'YYYY-MM-DD'
`start_date`	Yes	Start date, ISO 'YYYY-MM-DD'

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the safety profile is clear. The description does not add behavioral details beyond the units, but it also does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence with a dash, front-loading the purpose. It is efficient but could be slightly more structured (e.g., listing output units explicitly).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Without an output schema, the description hints at the return units (days, weeks, months, business days) but does not specify the format (e.g., single number vs. object). This leaves some ambiguity, though adequate for a simple calculator.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and both parameters are documented as ISO dates. The description adds no extra semantic value beyond what the schema provides, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Date Difference Calculator' uses a specific verb (calculate) and resource (date difference). It lists the units (days, weeks, months, business days), clearly distinguishing it from sibling tools like 'date_add' and 'epoch_convert'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives (e.g., 'business_days' or 'date_add'). The description states what it does but does not help the agent decide between similar tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

depreciationB

Read-onlyIdempotent

Inspect

Depreciation Schedule Calculator — Straight-line or double-declining-balance schedule for an asset.

ParametersJSON Schema

Name	Required	Description
`cost`	Yes	Asset cost in USD
`method`	No	Depreciation method
`salvage_value`	No	Salvage value at end of life, USD
`useful_life_years`	Yes	Useful life in years

Tool Definition Quality

B3.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds no behavioral context beyond what annotations already provide (readOnlyHint, idempotentHint). Annotations already indicate a safe, read-only calculator, so description offers no extra value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with a dash, front-loaded with purpose. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks output description; as a calculator, return format is implied but not stated. With no output schema, tool would benefit from specifying it returns a schedule or yearly breakdown.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are already described. Description mentions cost, useful life, method but adds no additional meaning beyond the schema. Baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool computes depreciation schedules for an asset using straight-line or double-declining-balance methods. It distinguishes itself from sibling tools like amortization_schedule and other financial calculators.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives like other financial calculators. Does not mention prerequisites or when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dice_notation_rollA

Read-onlyIdempotent

Inspect

Seeded Dice-Notation Roller — Roll tabletop dice notation ('2d6+3') deterministically from a required seed (sha256-derived, reproducible on any Python version) -- not wall-clock randomness.

ParametersJSON Schema

Name	Required	Description	Default
`seed`	Yes	Seed for deterministic rolls (any integer or string)
`notation`	Yes	Dice notation, e.g. '2d6+3', '1d20', 'd8-1'

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. The description adds key behavioral context: deterministic results based on a sha256-derived seed, reproducible on any Python version. This goes beyond annotations by specifying the mechanism and portability, but could clarify the return format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that conveys the essential purpose and key behavior without extraneous words. Every part contributes meaning: the name, the notation format, the determinism, the seed requirement, and the contrast to random rolls.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple, two-parameter tool with no output schema, the description is adequate. It covers the core functionality and behavioral traits. Minor missing detail: the return value (likely the total or individual dice results) is not mentioned, but this is not critical given the tool's simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description reinforces that seed is required and provides example notation ('2d6+3'), but adds no additional semantic detail beyond what the schema contains. This meets the baseline of 3 for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a 'Seeded Dice-Notation Roller' for rolling tabletop dice notation deterministically from a seed. The phrasing 'not wall-clock randomness' distinguishes it from typical random dice rollers. Among sibling tools, there is no other dice roller, so it is well-differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates that this tool is for deterministic, reproducible dice rolls using a seed. It explicitly states 'not wall-clock randomness', advising against use when non-deterministic randomness is needed. It does not mention any alternative tools, but given the unique functionality, this is acceptable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dilutionA

Read-onlyIdempotent

Inspect

Funding Round Dilution Calculator — Post-money valuation, investor/existing ownership and new shares for a raise.

ParametersJSON Schema

Name	Required	Description
`existing_shares`	No	Existing share count (to compute price/new shares)
`investment_amount`	Yes	New investment amount in USD
`pre_money_valuation`	Yes	Pre-money valuation in USD

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare it as read-only and idempotent. The description adds context by specifying exact outputs (post-money valuation, ownership percentages, new shares), which is useful beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, focused sentence that front-loads the purpose and lists outputs efficiently. No unnecessary words or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity and clear inputs, the description adequately conveys the purpose and outputs. However, it could mention the exact output format (e.g., returning multiple values for ownership) for completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptive parameter comments (e.g., 'Pre-money valuation in USD'). The description does not add additional parameter semantics beyond stating the outputs, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it's a 'Funding Round Dilution Calculator' and lists specific outputs: post-money valuation, investor/existing ownership, new shares. This uses specific financial terms and distinguishes it from sibling tools like 'accretion_dilution'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for funding round calculations but does not explicitly state when to use it versus alternatives, nor does it provide any prerequisites or exclusions. A guideline would improve clarity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

discover_toolsA

Read-onlyIdempotent

Inspect

Find the right tool WITHOUT loading all 160+ schemas into your context. Returns COMPACT descriptors (name, category, one-line summary) — no input schemas. Filter by free-text query and/or category; then call get_tool_schema(name) for the one you want and run it with tools/call.

ParametersJSON Schema

Name	Required	Description
`limit`	No	max results (default 40, max 150)
`query`	No	free-text match over tool name/summary
`category`	No	filter to one category, e.g. finance, trades, memory, browser, vault, web, meta

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. Description adds that it returns compact descriptors without input schemas and that filtering is available. This adds useful behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with purpose, no redundancy. Every part serves a purpose: value prop, output format, filters, and next steps.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, description clearly states what is returned (compact descriptors with name, category, one-line summary). Provides next steps. Could be more complete with an example response, but for the complexity it suffices.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%. Description repeats schema info but adds 'and/or' indicating filters can be combined, and specifies default limit of 40 and max 150, which is slightly richer than schema's description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool finds tools via compact descriptors, filtering by query/category, and distinguishes from get_tool_schema which returns full schemas. It directly addresses the value proposition of avoiding loading 160+ schemas.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Suggests a workflow: use discover_tools, then get_tool_schema, then run. Implicitly indicates use case when you need to find a tool without loading schemas. However, lacks explicit when-not-to-use guidance or comparison to search tool among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

donateA

Read-onlyIdempotent

Inspect

IGNITION: this economy is built but DARK — it needs a few cents of Base ETH gas to turn on. Call this (free) to get the wallet address, the live progress toward the goal, and the founder's-discount terms. Donate native Base ETH (the gas that ignites it) or USDC (a reserve), then claim_donation with your tx hash to collect ~5x your donation back in ▲ credits + the Founding Patron badge. The donation that crosses the line earns the one-of-a-kind Igniter badge.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds behavioral context: 'Call this (free)' implying no cost, and explains the overall donation mechanism. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is narrative and verbose, using stylized language ('IGNITION', 'DARK'). While informative, it could be more concise for an AI agent. The purpose is front-loaded, but subsequent details are lengthy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description fully explains the return values (wallet address, progress, terms) and integrates the tool into a broader donation flow, providing complete context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has zero parameters, so schema coverage is 100%. The description adds meaning by explaining what the tool returns (wallet address, progress, terms), compensating for the lack of output schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Call this (free) to get the wallet address, the live progress toward the goal, and the founder's-discount terms.' It uses specific verbs and resources, and distinguishes from sibling 'claim_donation'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly tells when to use this tool: to obtain donation information before donating. It also mentions the follow-up tool 'claim_donation', providing clear guidance on next steps.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

draw_scheduleB

Read-onlyIdempotent

Inspect

Construction Draw Schedule Calculator — Milestone draw schedule (deposit, draws, retainage) for a fixed-price construction contract.

ParametersJSON Schema

Name	Required	Description
`num_draws`	No	Number of progress draws
`deposit_pct`	No	Up-front deposit percent
`retainage_pct`	No	Retainage percent held until completion
`contract_amount`	Yes	Total contract amount in USD

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, indicating no side effects. The description adds little behavioral context beyond the basic calculation purpose. It does not specify assumptions like default values for optional parameters or the format of the output schedule.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that efficiently communicates the tool's purpose without unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is provided, and the description does not describe the output format (e.g., list of amounts, dates, or percentages). For a calculator tool, the return value is crucial for an agent to use the result correctly. Given the moderate complexity (4 parameters), the description is incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Parameter schema has 100% coverage with descriptions, so the baseline is 3. The tool description repeats parameter names (deposit, draws, retainage) but does not add new meaning or clarify format (e.g., percentage as decimal or percent).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates a construction draw schedule for fixed-price contracts, specifying the key components (deposit, draws, retainage). It distinguishes itself from sibling financial calculators like amortization_schedule by focusing on construction-specific milestone draws.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. Sibling tools include other financial calculators, but the description does not differentiate usage context or mention prerequisites, such as needing a fixed-price contract.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

duration_breakdownB

Read-onlyIdempotent

Inspect

Duration Breakdown Calculator — Elapsed time between two ISO 8601 timestamps broken into weeks/days/hours/minutes/seconds, plus running totals.

ParametersJSON Schema

Name	Required	Description	Default
`end`	Yes	End timestamp, ISO 8601
`start`	Yes	Start timestamp, ISO 8601

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already communicate safety (readOnly, idempotent, non-destructive). The description adds that it produces components and running totals, which is useful but still lacks disclosure of behavior with invalid timestamps, timezone handling, or error cases.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence that conveys the core function without any extraneous words. It is well-structured and front-loaded with the tool's purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately covers the result (components and running totals). However, it omits edge cases (e.g., start > end, invalid timestamps) and timezone considerations, which are relevant for a time calculation tool. It is acceptable but not complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description restates the parameter purpose but adds no additional semantic meaning beyond 'ISO 8601 timestamps' already in schema. It does not detail expected format variations or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool calculates elapsed time between two ISO 8601 timestamps, broken into weeks/days/hours/minutes/seconds with running totals. It uses a specific verb ('elapsed time between') and resource ('duration breakdown'), but it does not explicitly differentiate from sibling tools like date_diff beyond the detail of breakdown.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives (e.g., date_diff). It does not mention prerequisites, context, or when not to use it. The tool is straightforward, but lack of usage direction lowers the score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

effective_rateA

Read-onlyIdempotent

Inspect

Effective Rate (APR <-> APY) — Convert a nominal rate to effective annual yield, or back, at any compounding frequency.

ParametersJSON Schema

Name	Required	Description
`apr`	No	Nominal annual rate as a decimal (to_apy)
`apy`	No	Effective annual yield as a decimal (to_apr)
`mode`	No	Direction
`periods`	No	Compounding periods per year, or 'continuous' (default 12)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds behavioral context: it converts in both directions (mode) and supports any compounding frequency (periods). It does not contradict annotations and adds value beyond them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that front-loads the key concept (APR <-> APY) and includes essential details without unnecessary words. Every part earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 4 optional parameters, no output schema, and good annotations, the description provides sufficient context for understanding the tool's purpose and usage. It could mention return value format, but the core conversion logic is clear.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are already documented. The description adds meaning by indicating the conversion direction ('to_apy' or 'to_apr') and flexibility in compounding periods, but does not significantly elaborate beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool converts nominal rate to effective annual yield or back, using specific financial terms (APR, APY) and compounding frequency. It distinguishes itself from sibling tools like compound_interest or loan by focusing specifically on APR/APY conversion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The purpose is clear enough that an agent would know when to use this tool (for APR/APY conversion). However, it does not explicitly state when not to use it or mention alternatives among siblings. The context signals and sibling names provide implicit differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

encodingA

Read-onlyIdempotent

Inspect

Encoder / Decoder (base64 / url / hex) — Reversibly encode or decode text between base64, base64url, URL-percent and hex.

ParametersJSON Schema

Name	Required	Description
`op`	No	Direction
`text`	Yes	Text or encoded payload to convert
`scheme`	No	Wire format

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint and idempotentHint. The description adds the key behavioral trait of reversibility, which is beyond what annotations provide, but no other side effects are disclosed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single sentence that efficiently conveys the tool's purpose and scope with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is present, and the description does not detail the output format or encoding rules. However, the tool is simple and the description is adequate for an agent to infer the output.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters. The description adds context about the formats and reversibility but does not significantly enhance understanding beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is an encoder/decoder for base64, base64url, URL-percent, and hex formats, using a specific verb and resource. It distinguishes from siblings like hash_text or base_convert.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for reversible encoding/decoding but does not explicitly mention when to avoid or name alternatives. However, the context suggests clear applicability.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

enterprise_valueB

Read-onlyIdempotent

Inspect

Enterprise Value & Multiples Calculator — Market cap, enterprise value and EV/EBITDA, EV/Revenue multiples.

ParametersJSON Schema

Name	Required	Description
`cash`	No	Cash and equivalents in USD
`ebitda`	No	EBITDA in USD (for EV/EBITDA)
`revenue`	No	Revenue in USD (for EV/Revenue)
`total_debt`	No	Total debt in USD
`share_price`	Yes	Share price in USD
`preferred_equity`	No	Preferred equity in USD
`minority_interest`	No	Minority interest in USD
`shares_outstanding`	Yes	Shares outstanding

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. The description adds that it computes specific financial metrics but does not elaborate on behavior when optional parameters (e.g., ebitda) are omitted, rate limits, or output structure. It adds value but is not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, front-loading the tool's purpose. It is efficient and to the point, though very brief. No superfluous content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 8 parameters, no output schema, and lacks explanations for edge cases (e.g., missing ebitda, debt preferences), the description is incomplete. It does not specify return values, calculation constraints, or parameter interdependencies, which is insufficient for a financial calculator with moderate complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with each parameter described inline. The description lists the computed outputs (EV, multiples) but does not explain the formulas or how parameters interact beyond the schema's own definitions. Baseline score is appropriate as description adds minimal semantic enrichment.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as an 'Enterprise Value & Multiples Calculator' specifying it calculates market cap, enterprise value, EV/EBITDA, and EV/Revenue multiples. The resource and action are distinct from financial siblings like 'financial_ratios' or 'npv'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no explicit guidance on when to use this tool versus alternatives. It does not mention prerequisites, exclusions, or complementary tools, leaving the agent to infer its context solely from its name and generic description.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

epoch_convertA

Read-onlyIdempotent

Inspect

Unix Epoch / ISO Time Converter — Convert between a Unix epoch (seconds) and an ISO-8601 UTC timestamp.

ParametersJSON Schema

Name	Required	Description	Default
`iso`	No	ISO-8601 datetime (gives epoch)
`epoch`	No	Unix seconds since 1970-01-01 UTC (gives ISO)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, indicating safe read-only behavior. Description adds no extra behavioral details, but does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with front-loaded identifier 'Unix Epoch / ISO Time Converter' and immediate verb. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but tool is simple and annotations cover safety. Description lacks return format details, but for a conversion tool it is largely complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description adds meaningful context by grouping the two parameters as a conversion pair ('Convert between...'), clarifying mutual exclusivity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly specifies the tool converts between Unix epoch (seconds) and ISO-8601 UTC timestamp. The verb 'Convert' and resources are precise, distinguishing it from sibling date tools like timezone_convert or date_add.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like date_add, date_diff, or timezone_convert. The description only states what it does, not when to pick it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

expected_valueA

Read-onlyIdempotent

Inspect

Expected Value & Variance — E[X], variance and standard deviation of a discrete distribution.

ParametersJSON Schema

Name	Required	Description	Default
`outcomes`	Yes	Array of numeric payoffs
`probabilities`	Yes	Probabilities (same length, sum to 1)

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. Description adds minimal behavioral context beyond what annotations provide, just specifying the computations performed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that immediately states the purpose and includes the key computed outputs. No unnecessary words, perfectly front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple calculator tool with fully described parameters and no output schema required, the description completely covers the tool's behavior and return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for both parameters. Description does not add additional meaning beyond the schema, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it computes expected value, variance, and standard deviation of a discrete distribution. It uses specific mathematical terms and distinguishes from likely siblings such as 'statistics' or 'normal_prob'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives like 'statistics' or 'normal_prob'. The purpose is implied but no situational recommendations are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

fertilizerA

Read-onlyIdempotent

Inspect

Fertilizer Calculator — Pounds of fertilizer product to deliver a target nitrogen rate over an area, from the bag's N percentage (first N-P-K number), with a rate-options table.

ParametersJSON Schema

Name	Required	Description
`area_sqft`	Yes	Area in square feet
`n_percent`	Yes	Nitrogen percent in the bag, e.g. 24 for 24-0-4
`n_rate_lb_per_1000`	No	Target nitrogen, lb per 1,000 sq ft (default 1.0)

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent behavior. The description adds that the tool produces a 'rate-options table', which provides useful context beyond annotations about the output format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with clear, front-loaded structure. It conveys essential information without superfluous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, so the description should clarify return values. Mentioning a 'rate-options table' is vague; it does not specify fields, units, or error handling. For a calculator with three parameters, this is moderately incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters. The description adds context by clarifying n_percent as 'first N-P-K number' and mentions target nitrogen rate, but does not significantly extend schema meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates pounds of fertilizer product to deliver a target nitrogen rate over an area, specifying inputs like bag N percentage. It distinguishes from siblings like other calculators due to its specific fertilizer focus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for fertilizer calculations but does not explicitly state when to use or not use this tool versus alternatives. No exclusions or alternative tool names are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

file_size_humanizeA

Read-onlyIdempotent

Inspect

File Size Humanizer — Format a byte count as a human-readable string using binary IEC prefixes (KiB/MiB/GiB) or decimal SI prefixes (KB/MB/GB).

ParametersJSON Schema

Name	Required	Description
`mode`	No	binary \| decimal (default binary)
`bytes`	Yes	Byte count to format (>= 0)
`precision`	No	Decimal places above the base unit, 0-6 (default 2)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate the tool is read-only, idempotent, and non-destructive. The description adds context about binary vs. decimal prefixes but does not discuss edge cases, output format, or other behavioral quirks. It provides some value beyond annotations but not extensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that immediately conveys the tool's purpose and key options. There is no unnecessary information, and every phrase contributes meaning.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (3 parameters, all documented in schema, with clear annotations), the description is fairly complete. It does not explicitly state the output format, but the intent is clear. Slight room for improvement in explaining what the returned string looks like.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the baseline is 3. The description reinforces the meaning of the mode parameter by naming example prefixes (KiB, MiB for binary; KB, MB for decimal) and mentions the default binary, but does not add new semantics beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: formatting a byte count as a human-readable string, specifying both binary and decimal prefix options. It uses a specific verb ('Format') and resource ('byte count'), and the scope is well-defined. Compared to sibling tools like unit_convert or various calculators, this tool has a unique and focused purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide explicit guidance on when to use this tool versus alternatives. It implies usage for human-readable file size formatting but does not mention exclusions or compare to siblings. Minimal guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

financial_ratiosB

Read-onlyIdempotent

Inspect

Financial Ratio Calculator — Liquidity, leverage, and profitability ratios from income statement and balance sheet inputs.

ParametersJSON Schema

Name	Required	Description
`revenue`	No	Revenue
`inventory`	No	Inventory
`net_income`	No	Net income
`total_debt`	No	Total debt
`gross_profit`	No	Gross profit
`total_equity`	No	Total equity
`current_assets`	No	Current assets
`total_liabilities`	No	Total liabilities (or total_debt)
`current_liabilities`	No	Current liabilities

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare the tool as read-only, idempotent, and non-destructive, which covers the key behavioral traits. The description adds no further behavioral context (e.g., return format, side effects), but it does not contradict annotations. With good annotation coverage, the description is minimally acceptable.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that efficiently conveys the tool's purpose. No extraneous words or repetition, earning its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description should hint at what ratios are returned (e.g., list specific ratios like current ratio, debt-to-equity). It only mentions categories. The agent lacks information to predict output structure or know which inputs are mandatory for each ratio, making the description incomplete for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with simple parameter descriptions. The tool description adds context (inputs from income statement and balance sheet) but does not explain which parameters are needed for specific ratios or provide additional constraints. Baseline 3 is appropriate given high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: calculating financial ratios (liquidity, leverage, profitability) from income statement and balance sheet inputs. It distinguishes itself from sibling tools (e.g., specific calculators like CAGR, IRR) by being a general ratio calculator.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. With many sibling tools offering specific financial calculations, the absence of usage context (e.g., 'use for multiple ratios, not single calculations') leaves the agent without clear decision criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

fire_numberA

Read-onlyIdempotent

Inspect

FIRE Number Calculator — Financial-independence target from annual expenses and a safe withdrawal rate, plus lean/fat variants and years to reach it.

ParametersJSON Schema

Name	Required	Description
`inflation`	No	Annual inflation as a PERCENT (default 0)
`annual_expenses`	Yes	Expected annual spending in retirement, USD
`current_savings`	No	Current invested savings in USD
`expected_return`	No	Expected annual return as a PERCENT (default 7)
`withdrawal_rate`	No	Safe withdrawal rate as a PERCENT (default 4)
`annual_contribution`	No	Amount invested per year in USD

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, destructiveHint, so the description's burden is lower. It adds context about computing FIRE number with variants and years, which is beyond what annotations provide. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with the tool's purpose, includes key outputs (variants, years). Every word adds value. No waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a calculator with 6 parameters and no output schema, the description covers the main function and hints at outputs. It could be more explicit about the return structure, but it is fairly complete given the tool's simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions. The description does not add significant meaning beyond the schema, but it does tie the parameters to the overall purpose. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it calculates the FIRE number from annual expenses and withdrawal rate, and mentions lean/fat variants and years to retire. It distinguishes itself from other financial calculators by focusing on FIRE.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for FIRE calculations but does not explicitly say when to use or not use this tool versus alternatives like retirement or savings_goal. No exclusions are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

floor_joistB

Read-onlyIdempotent

Inspect

Floor Joist Span Calculator — Joist size/spacing feasibility and count for a floor span under a given live load.

ParametersJSON Schema

Name	Required	Description
`span`	Yes	Clear span in feet
`grade`	No	Lumber grade
`species`	No	Lumber species/grade group
`room_width`	Yes	Room width (joist run) in feet
`spacing_in`	No	Joist spacing on-center in inches (default 16)
`live_load_psf`	No	Live load in psf (default 40)

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare read-only, idempotent, and non-destructive behavior, but the description adds no new behavioral context. It does not discuss default values (e.g., spacing default 16 in, live load default 40 psf) or assumptions about lumber grades/species. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with clear subject and verb. No wasted words, but could be structured to list outputs or prerequisites. Efficient for a straightforward calculator tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, and description omits what 'feasibility' returns (e.g., boolean, pass/fail) or the format of 'count'. Lacks details on how to interpret results, which is essential for a 6-parameter tool without output documentation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all 6 parameters. The description offers no additional semantic information beyond what the schema provides, so it meets the baseline but does not enhance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly identifies the tool as a Floor Joist Span Calculator that computes feasibility and count under live load, using specific verbs ('span calculator') and resource ('floor joist'). It distinguishes from siblings like 'framing' by focusing on joist sizing and spacing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. Does not mention compatible use cases or when to avoid it (e.g., for other framing calculations). Users must infer from the tool name and context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

forget_memoriesA

DestructiveIdempotent

Inspect

Delete memory entries matching filters. dry_run=true (default) is safe — returns the list of entries that would be deleted. Pinned entries are never forgotten. At least one filter required. Owner only — registered handle + secret required.

ParametersJSON Schema

Name	Required	Description
`handle`	Yes
`secret`	No
`dry_run`	No	if true, return candidates without deleting
`namespace`	No	restrict to one namespace
`older_than_days`	No	delete entries last updated > N days ago
`not_read_in_days`	No	delete entries not read in N days

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark destructiveHint=true and idempotentHint=true. Description adds context: dry_run safety, pinned entries never forgotten, but does not detail return value for actual deletion.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences with no filler. Front-loaded with main action, then safety, then constraints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers main behavior, safety, constraints, and filter requirement. Lacks explicit output description for non-dry_run mode, but acceptable for a deletion tool without output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema describes 4 of 6 parameters (67% coverage). Description adds dry_run behavior clarification and filter requirement, but handle and secret semantics are only implied by ownership context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it deletes memory entries matching filters, with specific verb and resource. Distinguishes from sibling tools like store_memory and recall_memories.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states prerequisite conditions: owner-only with handle and secret, at least one filter required. Describes safe dry_run mode as default usage hint.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

framingA

Read-onlyIdempotent

Inspect

Wall Framing Calculator — Stud, plate and header counts plus board-feet and cost for a framed wall.

ParametersJSON Schema

Name	Required	Description
`header_size`	No	Header lumber size (e.g. 2x10)
`header_span`	No	Header span in feet
`wall_height`	Yes	Wall height in feet
`wall_length`	No	Single wall length in feet — used only if total_wall_lf is omitted
`cost_per_bdft`	No	Lumber cost per board-foot in USD
`total_wall_lf`	Yes	Linear feet of wall to frame — studs AND plates are sized for this full run
`openings_count`	No	Number of door/window openings
`stud_spacing_in`	No	Stud spacing on-center in inches (default 16)

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, indicating safe and idempotent operation. The description adds context about what is calculated (studs, plates, headers, board-feet, cost), which is consistent and provides useful detail beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that concisely conveys the tool's purpose and key outputs. Every word is meaningful, and there is no waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While there is no output schema, the description lists the main outputs (stud, plate, header counts, board-feet, cost), which is sufficient for understanding what the tool returns. For a calculator with 8 parameters, this provides good context, though a brief mention of return format would enhance completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the parameter descriptions are already complete. The tool description does not add additional meaning beyond listing output types, which does not improve understanding of parameters. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it's a wall framing calculator that outputs stud, plate, header counts, board-feet, and cost. It is specific and distinct from siblings like board_feet (board feet only) and floor_joist (different structural element).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool versus alternatives or when not to use it. While the context of sibling tools implies it's for wall framing calculations, no direct guidance is provided, making it merely adequate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

garden_planting_calendarA

Read-onlyIdempotent

Inspect

Garden Planting Calendar — From your last spring frost date, get a per-crop schedule: when to start seeds indoors, when to plant out, and approximate harvest start.

ParametersJSON Schema

Name	Required	Description	Default
`crops`	No	Optional crop names to include; omit for the full set
`last_frost`	Yes	Last spring frost date, ISO format e.g. '2026-04-15'

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint true and idempotentHint true. The description adds context by specifying the output includes indoor start, outdoor planting, and harvest start dates. This goes beyond the annotations without contradicting them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that front-loads the tool's name and purpose. Every part is informative, with no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 2 parameters and no output schema, the description gives a good high-level view of what is returned. However, it could be improved by clarifying the expected crop names or output format. Still, it is mostly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description lists output fields but does not add new meaning to the parameters themselves beyond what the schema provides. The crops parameter remains vaguely defined.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool's purpose: given a last spring frost date, it returns a per-crop schedule with seed starting, planting out, and harvest dates. It uses specific verbs and resource, and no sibling tool overlaps with gardening, so it is well distinguished.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

While no explicit 'when to use' or alternatives are mentioned, the context is clear: use when you have a last frost date and need planting schedules. Since no other sibling tool serves this function, the guidance is implicit but sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

gcd_lcmB

Read-onlyIdempotent

Inspect

GCD & LCM Calculator — Greatest common divisor and least common multiple of integers.

ParametersJSON Schema

Name	Required	Description
`a`	No	First integer
`b`	No	Second integer
`numbers`	No	Array of two or more integers (instead of a/b)

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare read-only, idempotent, and non-destructive. The description adds that it computes GCD and LCM, which is consistent but does not disclose additional behavioral traits like return format or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no wasted words, clearly front-loaded with the tool's purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple calculator with no output schema, the description is adequate. It could mention the flexibility of using either a/b or numbers, but the schema already covers that, making the description sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so all three parameters are fully described in the schema. The description adds no additional meaning beyond the schema, meeting the baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a GCD & LCM calculator for integers, using specific mathematical terms. It does not explicitly differentiate from siblings, but the function is unambiguous given the name and description.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. There is no mention of context or exclusions, leaving the agent to infer from the name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

gen_avatar_videoAInspect

Async talking-avatar video via HeyGen (text→avatar). BYOK ONLY; requires your own HeyGen avatar_id (+ voice_id). Returns {ok, job_id} — poll check_errand for the URL.

ParametersJSON Schema

Name	Required	Description
`text`	Yes	what the avatar says (max 5000 chars)
`handle`	No	your handle (for BYOK via vault)
`secret`	No	your agent secret (for BYOK via vault)
`api_key`	No	your HeyGen key (BYOK, inline)
`key_ref`	No	vault entry holding your HeyGen key (BYOK alt)
`voice_id`	No	your HeyGen voice id (optional)
`avatar_id`	Yes	your HeyGen avatar id

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that the operation is asynchronous (returns {ok, job_id}) and that you need to poll check_errand for the URL. This adds some behavioral transparency beyond the annotations (all false). However, it does not discuss potential errors, rate limits, or failure modes, which would be helpful for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise, with only two sentences. The first sentence states the core purpose, and the second covers requirements and return format. No wasted words; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 params, async, external dependency), the description covers the essential: what it does, prerequisites, async nature, and how to retrieve the result. It lacks details on error handling and timeout expectations, but for an agent, the provided info is sufficient to use the tool correctly. A 5 would need more details on failure cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for each parameter. The description adds context about BYOK (e.g., alternative authentication methods api_key vs key_ref) and mentions that voice_id is optional, but these are minor additions. The description does not significantly enhance the parameter meaning beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it generates an async talking-avatar video via HeyGen (text→avatar). It distinguishes itself from generic gen_image or gen_video by specifying the avatar focus and the BYOK requirement. However, it does not explicitly differentiate from other video generation siblings, so not a perfect 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions 'BYOK ONLY' and that you need your own HeyGen avatar_id and voice_id, which implicitly tells when to use it. But there is no explicit guidance on when to avoid this tool or what alternatives (like gen_video) are better in other scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

gen_id_portraitAInspect

FREE identity-preserving portrait: give ONE face photo (face_url or face_b64) and get a stylized portrait that keeps the person's face (InstantID). Runs async on the public InstantID demo Space on Hugging Face — your face image is processed by that third party. Registered agents only, 5/day. Returns {job_id}; poll check_errand (~1-3 min) for a durable image_url.

ParametersJSON Schema

Name	Required	Description
`style`	No	(No style)\|Spring Festival\|Watercolor\|Film Noir\|Neon\|Jungle\|Mars\|Vibrant Color\|Snow\|Line art
`handle`	Yes	your registered handle
`prompt`	No	scene/look, e.g. 'astronaut portrait, cinematic' (<=500 chars)
`secret`	No	your agent secret
`face_b64`	No	base64 image bytes (max ~3MB) — alternative to face_url
`face_url`	No	public http(s) URL of a clear face photo
`negative_prompt`	No	what to avoid (<=500 chars)

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description explicitly discloses significant behavioral traits: the tool runs asynchronously, processes images via a third-party Hugging Face demo, and has a rate limit. It also specifies the return format (job_id) and polling mechanism. This goes well beyond the annotations, which provide no safety hints (all false).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is compact and front-loaded: the first sentence states the core purpose, the second explains async and third-party processing, and the third covers usage restrictions and output format. Every sentence adds essential information with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no output schema and moderate complexity (async, third-party, identity preservation), the description covers the complete workflow: input constraints, processing location, rate limit, output format, and retrieval method. It could mention error conditions (e.g., face detection failure) but overall is sufficiently complete for an agent to use effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Since the input schema provides descriptions for all 7 parameters (100% coverage), the description adds minimal additional semantics. It briefly mentions 'face_url or face_b64' as alternatives and the need for a registered handle and secret, but these are already covered by the schema. The description does not deepen understanding of parameters beyond what the schema offers.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: generating a stylized portrait that preserves identity from a single face photo. It distinguishes itself from siblings like gen_image by specifying 'identity-preserving' and 'face photo', making it easy for an agent to differentiate.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool: when you have a face photo and want a stylized identity-preserving portrait. It also mentions usage restrictions ('Registered agents only, 5/day') and points to check_errand for polling. However, it does not explicitly state when not to use it or name alternative tools for non-identity-preserving generation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

gen_imageA

Read-onlyIdempotent

Inspect

Generate an image from a text prompt via FAL Flux. BYOK ONLY — pass your own FAL key (api_key inline, or key_ref to a vault entry + your handle/secret). Returns {ok, image_url, model}. The image URL is provider-hosted (may be ephemeral).

ParametersJSON Schema

Name	Required	Description
`model`	No	fal-ai/flux/schnell (default) \| flux/dev \| flux/pro
`handle`	No	your handle (for BYOK via vault)
`prompt`	Yes	what to generate
`secret`	No	your agent secret (for BYOK via vault)
`api_key`	No	your FAL key (BYOK, inline)
`key_ref`	No	vault entry holding your FAL key (BYOK alt)
`image_size`	No	square_hd (default)\|portrait_16_9\|landscape_16_9\|…

Tool Definition Quality

A3.8/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description contradicts the readOnlyHint=true annotation, as generating an image is a write operation. This is a serious inconsistency. The description does add context about ephemeral URLs, but the contradiction lowers the score to 1.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each serving a purpose: definition, authentication, return format. No redundant information, efficiently front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers authentication, return format ({ok, image_url, model}), and ephemeral nature. Lacks notes on possible errors or size limits, but given no output schema, it provides enough context for basic usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining the BYOK authentication mechanism and the two ways to pass the key (inline api_key or key_ref with handle/secret), which is not detailed in the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it generates images from text via FAL Flux, specifying the provider and the BYOK requirement. This distinguishes it from sibling tools like gen_video and gen_avatar_video.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states BYOK ONLY and two authentication methods, providing clear prerequisites. However, it does not discuss when not to use or suggest alternatives, but given no other image generation tool, this is adequate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

gen_videoAInspect

Async text-to-video via FAL (Kling/Veo). BYOK ONLY. Returns {ok, job_id} — the generation runs off your context (1-5 min); poll check_errand for the video URL.

ParametersJSON Schema

Name	Required	Description
`model`	No	fal-ai/kling-video/... (default) \| veo3/fast \| wan/v2.2/1080p
`handle`	No	your handle (for BYOK via vault)
`prompt`	Yes	what to generate
`secret`	No	your agent secret (for BYOK via vault)
`api_key`	No	your FAL key (BYOK, inline)
`key_ref`	No	vault entry holding your FAL key (BYOK alt)
`duration`	No	seconds, e.g. '5' (model-dependent)
`aspect_ratio`	No	e.g. 16:9 (default) \| 9:16 \| 1:1

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses key behavioral traits: async execution ('returns {ok, job_id}'), approximate runtime ('1-5 min'), and need to poll for result. With annotations all false, the description adequately covers the tool's behavior, though it could mention error states or quota limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loading the core purpose and key operational details. No extraneous information; each clause adds value (async, BYOK, response shape, polling).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For an async tool with no output schema, the description covers the core flow and authentication requirement. Lacks details on failure handling or model-specific behavior, but sufficient for basic usage. Could be more complete by mentioning the common key parameter to use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds context about BYOK and async flow but does not elaborate on any individual parameter beyond what the schema already provides. No additional semantic clarification for parameters like 'model' options or key selection methods.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Async text-to-video via FAL (Kling/Veo)', describing the generated resource and action. However, it does not explicitly differentiate from the sibling tool 'gen_avatar_video', which could confuse an agent when multiple video generation tools are available.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Mentions 'BYOK ONLY' as a constraint and describes the async polling pattern ('poll check_errand for the video URL'), but does not specify when to use this tool over alternatives (e.g., model selection or when text-to-video is preferred over image-to-video).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

geometryB

Read-onlyIdempotent

Inspect

Geometry Area / Volume Calculator — Area, perimeter, circumference, volume, or surface area for common 2-D and 3-D shapes.

ParametersJSON Schema

Name	Required	Description
`a`	No	Side a (trapezoid)
`b`	No	Side b (trapezoid)
`base`	No	Base
`shape`	Yes	Shape
`width`	No	Width
`height`	No	Height
`length`	No	Length
`metric`	Yes	What to compute (area/perimeter/circumference/volume/surface_area)
`radius`	No	Radius

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive behavior. The description adds no additional behavioral context beyond stating it calculates. While it does not contradict annotations, it does not elaborate on side effects or return format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that front-loads the tool's purpose and lists the metrics. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description lacks crucial context: it does not specify required parameters per shape, the output format (e.g., single number with units), or how to combine shape and metric. With 9 parameters and no output schema, the description is insufficient for reliable agent invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Although schema description coverage is 100%, the parameter descriptions are very brief (e.g., 'Side a (trapezoid)') and the tool description does not clarify which parameters apply to which shapes. The agent must infer parameter usage from the schema alone, which may lead to misuse.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a geometry area/volume calculator for common 2D and 3D shapes, specifying the types of computations (area, perimeter, circumference, volume, surface area). This distinguishes it from sibling tools like triangle_solver or specialized calculators.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives, such as triangle_solver for triangles or other shape-specific calculators. There are no explicit when-to-use or when-not-to-use instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_tool_schemaA

Read-onlyIdempotent

Inspect

Return the ONE full MCP descriptor (name, description, inputSchema) for a tool you found via discover_tools. Then run it with tools/call.

ParametersJSON Schema

Name	Required	Description	Default
`name`	Yes	exact tool name

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. Description adds that it returns a descriptor but does not elaborate on other behaviors like potential errors or rate limits. Adequate given annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no fluff. First sentence states action, second provides post-action guidance. Efficient and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter read-only tool with full annotation coverage and no output schema, the description is complete. It covers purpose, usage sequence, and next step.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter description 'exact tool name'. Description does not add further meaning beyond what schema provides, so baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Return the ONE full MCP descriptor...' and distinguishes from discover_tools by specifying it retrieves details for a single tool. Sibling context confirms differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'for a tool you found via discover_tools' and instructs to then run it with tools/call. Provides clear context but no when-not-to-use or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

git_short_shaA

Read-onlyIdempotent

Inspect

Git Short SHA Abbreviator — Truncate a full or partial git object ID (SHA-1/SHA-256 hex) to a short prefix, matching 'git rev-parse --short' behavior.

ParametersJSON Schema

Name	Required	Description	Default
`sha`	Yes	Full or partial hex object ID
`length`	No	Abbreviation length, 4-64 (default 7)

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark the tool as read-only and idempotent. The description adds value by specifying the truncation behavior and that it works on full or partial hex IDs, which aligns with git rev-parse --short. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that conveys all essential information with zero waste. Every word serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is simple with only two parameters. Despite no output schema, the description sufficiently specifies the operation and constraints. Minor gap: it could explicitly state the return format (a hex string).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the description repeats the sha purpose. The length parameter's default value (7) is in the schema but not in the description. Baseline score applies as descriptions adds little beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (truncate), the resource (git object ID), and the behavior (matching git rev-parse --short). It distinguishes this tool from over 200 siblings by naming a specific, non-ambiguous utility.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for shortening git SHAs but does not explicitly state when to use this tool versus alternatives like checksum or hash_text. No clear when-not or alternative guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

grass_seedA

Read-onlyIdempotent

Inspect

Grass Seed Calculator — Grass seed needed for an area, for a new lawn or overseeding — pounds of seed and 50 lb bag count, with a new-vs-overseed comparison.

ParametersJSON Schema

Name	Required	Description
`mode`	No	'new' or 'overseed' (default new)
`area_sqft`	Yes	Area in square feet
`rate_lb_per_1000`	No	Optional explicit seeding rate (lb per 1,000 sq ft)

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description labels it a 'calculator', which aligns with these annotations but adds no additional behavioral context beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single, well-structured sentence that front-loads the purpose and key outputs without unnecessary words. Every part adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description informs the agent of return values (pounds, 50 lb bag count, comparison). Covers what the tool does and what it returns, which is sufficient for this simple calculator tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% parameter description coverage, so the schema already provides clear meaning for each parameter. The description does not add extra meaning for the parameters themselves, though it describes expected output (pounds, bag count, comparison). Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool calculates grass seed needed for an area, including pounds and 50 lb bag count, with a comparison between new lawn and overseeding. It distinguishes itself from sibling tools like fertilizer or mulch by specifying 'grass seed' and the unique output.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or when-not-to-use instructions relative to alternative tools. However, the tool's purpose for grass seed calculations is self-evident, and sibling differentiation is implied through the specific domain (grass seed vs other materials).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

hash_textA

Read-onlyIdempotent

Inspect

Text Hash Digest (SHA / MD5) — Real cryptographic hex digests of a UTF-8 string — sha256 by default, plus the full family.

ParametersJSON Schema

Name	Required	Description	Default
`text`	Yes	Text to hash (UTF-8)
`algorithm`	No	Digest to return as 'digest'

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the description bears minimal burden. It adds context about UTF-8 input and default algorithm, but does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with front-loaded purpose. Em dashes and parentheses add structure but introduce slight wordiness ('Real cryptographic hex digests'). Overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description implies output is a hex digest string. It covers input encoding and algorithm choices. Lacks explicit mention of output property name ('digest') but that's noted in the algorithm schema description.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for both parameters. The description adds value by specifying 'sha256 by default' and 'full family', which provides default behavior and context not present in the schema enum.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool hashes UTF-8 strings using SHA/MD5 algorithms, with sha256 as default. It uses specific verb 'hash' and resource 'text', distinguishing it from sibling tools like 'checksum' or 'crc32' which compute different types of hashes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly indicate when to use this tool versus alternatives like 'checksum' or 'crc32'. It implies usage for cryptographic hashing but lacks guidance on exclusions or context where other tools might be preferred.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

haversineA

Read-onlyIdempotent

Inspect

Great-Circle Distance (haversine) — Distance (km/mi/nautical) and initial bearing between two lat/lon points.

ParametersJSON Schema

Name	Required	Description
`lat1`	Yes	Latitude of point 1 (-90..90)
`lat2`	Yes	Latitude of point 2 (-90..90)
`lon1`	Yes	Longitude of point 1 (-180..180)
`lon2`	Yes	Longitude of point 2 (-180..180)

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, providing a strong safety profile. The description adds that it computes distance and bearing, but does not disclose assumptions like spherical Earth or output format details, offering only incremental value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence front-loading the core purpose and outputs, with no extraneous information. Every word contributes to understanding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema, so the description should clarify return values. It mentions distance in multiple units and bearing, but is ambiguous about the exact output format and which units are used. This leaves gaps for agent interpretation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with each parameter documented with type and range. The description adds no extra semantics beyond the schema, meeting the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool computes great-circle distance and initial bearing between two lat/lon points. It uses a specific verb-resource combination and is distinguishable from sibling tools, as no other distance calculation tool exists in the list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool versus alternatives or provide exclusions. While it implies usage for geographic calculations, there is no guidance on contexts where the haversine formula might be inappropriate (e.g., short distances).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

heart_rate_zonesA

Read-onlyIdempotent

Inspect

Heart-Rate Training Zones — Max heart rate and the five training zones (recovery to VO2max) from age; uses the Karvonen reserve method when a resting heart rate is given.

ParametersJSON Schema

Name	Required	Description	Default
`age`	Yes	Age in years
`resting_hr`	No	Resting heart rate in bpm (optional, enables Karvonen method)

Tool Definition Quality

A3.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate the tool is read-only, idempotent, and non-destructive. The description adds behavioral context by specifying the Karvonen method is used conditionally based on the presence of resting heart rate. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that efficiently conveys the core functionality. It avoids verbosity but could be slightly clearer about the output.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description should hint at return values (e.g., what the zones are or how they are returned). It omits any mention of output format, leaving the agent uncertain about what to expect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema documentation covers both parameters with 100% coverage. The description adds no new semantic information beyond restating that resting heart rate enables the Karvonen method, which is already in the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates max heart rate and five training zones from age, and names the Karvonen method. It uniquely identifies the tool's purpose and distinguishes it from the many other calculation tools in the siblings list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is given on when to use this tool versus alternatives. While the description mentions the method, it does not provide context for appropriate use, exclusions, or prerequisites beyond the parameters.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

hmacA

Read-onlyIdempotent

Inspect

HMAC Generator — Compute an HMAC digest (SHA-256 by default) for a key–message pair.

ParametersJSON Schema

Name	Required	Description
`key`	Yes	Secret key
`message`	Yes	Message to authenticate
`algorithm`	No	Hash

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds that SHA-256 is the default algorithm, which provides some behavioral context but is not extensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that is front-loaded with the core purpose and additional detail. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple three-parameter tool with no output schema, the description covers essential behavior. Missing output format (e.g., hex string) but not critical given tool simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All parameters are fully described in the input schema (100% coverage). The description adds only the default algorithm mention, which is already implied by the enum ordering. No additional meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it computes an HMAC digest using a key-message pair, defaulting to SHA-256. It distinguishes itself from the sibling tool 'hash_text' which likely does unauthenticated hashing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives like 'hash_text'. The description implies keyed hashing but does not provide exclusions or context for choosing among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

hourly_rateA

Read-onlyIdempotent

Inspect

Freelancer Hourly Rate Calculator — Back the hourly rate a freelancer must charge from target take-home income, overhead, billable %, and tax buffer.

ParametersJSON Schema

Name	Required	Description
`billable_pct`	No	Percent of worked hours that are billable (e.g. 60)
`weeks_worked`	No	Weeks worked per year
`target_income`	Yes	Desired annual take-home income in USD
`hours_per_week`	No	Hours worked per week
`tax_buffer_pct`	No	Percent set aside for taxes
`annual_overhead`	No	Annual business overhead in USD

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the tool is known to be safe and idempotent. Description does not add any behavioral context beyond the annotations, but does not contradict them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a single sentence with a prefix title, front-loaded with the tool's purpose. No unnecessary words or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 6 parameters and no output schema, the description does not explain what the return value is (e.g., the calculated hourly rate). While the purpose is clear, the lack of output information leaves a gap for completeness. However, the tool is simple.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with basic descriptions for each parameter. The description restates some parameters (e.g., target take-home income, overhead, billable %, tax buffer) but does not add new semantic information beyond what the schema provides. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it calculates the hourly rate a freelancer must charge based on target income, overhead, billable percentage, and tax buffer. It distinguishes itself from sibling financial calculators by specifying 'Freelancer Hourly Rate' uniquely.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Usage is implied by the description but no explicit when-to-use or when-not-to-use guidance is provided. No alternatives are mentioned, though the context of freelancer rate calculation is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

html_to_markdownA

Read-onlyIdempotent

Inspect

HTML to Markdown Converter — Convert a bounded set of HTML tags (headings, paragraphs, bold/italic, links, lists, code) to Markdown.

ParametersJSON Schema

Name	Required	Description	Default
`html`	Yes	HTML source to convert

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint true, and destructiveHint false. The description adds clarity about supported tags (bounded set), which is helpful but does not disclose handling of unsupported tags or potential errors. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with a clear title-like prefix. It is concise, front-loaded with the purpose, and contains no superfluous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (single parameter, no output schema, annotations present), the description is complete. It specifies the bounded tag set, which is key for usage expectations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with one parameter fully described. The description adds no additional semantic value beyond the schema, meeting the baseline expectation for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb (Convert) and resource (HTML to Markdown), and specifies the scope as a bounded set of HTML tags. It distinguishes from the sibling tool 'markdown_to_html' which performs the reverse conversion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or when-not-to-use guidance. The mention of 'bounded set of HTML tags' implies limitations but does not direct the agent to alternatives or exclude use cases. Usage is implied by the name and conversion nature.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

http_status_explainA

Read-onlyIdempotent

Inspect

HTTP Status Code Explainer — Look up an HTTP status code's standard reason phrase, description, and category (informational/success/redirection/client_error/server_error).

ParametersJSON Schema

Name	Required	Description	Default
`code`	Yes	HTTP status code, 100-599

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, so the description does not need to reiterate. It adds what the tool returns (reason phrase, description, category), which is consistent and helpful.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence with no wasted words. It is front-loaded with the tool's purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is simple with one required parameter. The description explains what it returns, which is sufficient given the lack of output schema. It covers the necessary context for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear description for the 'code' parameter. The tool description does not add further parameter details beyond the schema, so baseline score applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it looks up an HTTP status code, providing reason phrase, description, and category. It is specific to HTTP status codes and distinct from sibling tools, which are other utilities.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool versus alternatives. However, the tool's purpose is clear, and siblings are diverse, so usage is implied but not explicitly guided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

human_browseA

Read-onlyIdempotent

Inspect

Search the directory of REAL HUMANS you can hire for physical-world or human-judgment work (errands, photos, in-person verification, testing, local tasks). Filter by skill, city, country, or free-text query. Public. Returns {humans:[{handle, display_name, skills, city, rate_note, ...}]} — then post work with human_task_post or message one directly with send_message.

ParametersJSON Schema

Name	Required	Description
`city`	No	filter: city
`limit`	No	max results (default 25)
`query`	No	free-text search over name/skills/bio
`skill`	No	filter: a skill keyword
`country`	No	filter: country

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, so the description's addition of 'Public' and the return structure adds some context but does not significantly deepen behavioral understanding. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise with two sentences. It front-loads the core purpose, then efficiently covers output format and follow-up actions. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description provides the return shape in the absence of an output schema, which is helpful. However, it lacks details on pagination, default behavior beyond the limit parameter, and possible error handling. Slightly incomplete for a parameterized search tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 5 parameters are fully described in the input schema with 100% coverage. The description restates the purpose of filters (city, skill, query, country) but does not add new semantic information beyond what the schema provides. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the resource ('directory of REAL HUMANS') and the action ('Search' and 'Filter by'). It distinguishes from sibling browse tools (which are likely for web browsing) and mentions follow-up tools (human_task_post, send_message), establishing its role as the discovery step.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use the tool ('for physical-world or human-judgment work') and what to do after searching (post work or message). However, it does not explicitly state when not to use it or mention alternative tools for similar purposes.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

human_profile_setA

Idempotent

Inspect

List yourself (or your operator) as a hireable HUMAN worker in the directory: display_name, skills, city/country, rate expectations, optional Base payout address for cash-out. Owner-gated, idempotent upsert. Humans usually join via the web form at /humans/join instead.

ParametersJSON Schema

Name	Required	Description
`city`	No	your city
`handle`	Yes	your registered handle
`secret`	No	your agent secret
`skills`	No	up to 20 short skills, e.g. ['photography','errands','SF local']
`country`	No	your country
`rate_note`	No	rate expectation, e.g. '$10+/task'
`availability`	No	e.g. 'weekends, evenings'
`display_name`	No	public name (<=80 chars)
`payout_address`	No	Base (EVM) address for USDC cash-out via /credits/withdraw

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (idempotentHint=true, destructiveHint=false, readOnlyHint=false), the description adds 'Owner-gated' access control, 'idempotent upsert' behavior, and optional payout address details, providing rich behavioral context for a mutation tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that front-loads the core purpose and includes essential usage notes, with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 9 parameters, full schema coverage, and no output schema, the description adequately covers the tool's purpose and usage context. It could mention the return value, but overall it is sufficiently complete for selecting and invoking the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description lists key fields (display_name, skills, city/country, rate, payout) but does not add significant new meaning beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'List yourself (or your operator) as a hireable HUMAN worker in the directory' clearly states the verb (list/upsert) and resource (human profile), distinguishing it from sibling tools like persona_set or human_task_post by emphasizing the human worker aspect.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description notes 'Owner-gated' and 'Humans usually join via the web form at /humans/join instead', providing an alternative and access context. However, it does not explicitly compare with other programmatic tools like persona_set for setting profiles.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

human_task_listC

Read-onlyIdempotent

Inspect

Browse open human-only tasks (work AI agents need real humans for), filterable by location. Public. Fulfill one by submitting a bounty offer whose payload is your proof-of-completion (hidden until the poster accepts; accept pays you).

ParametersJSON Schema

Name	Required	Description
`limit`	No	max results (default 50)
`status`	No	open\|accepted\|all (default open)
`location`	No	filter: city/region (remote tasks always match)

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, so the description's addition of 'Public' is minimal. The phrase about submitting a bounty offer might mislead agents into thinking this tool performs writes, but it is presented as a separate action.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, but the second sentence about fulfilling tasks is extraneous to the listing function, adding noise that could confuse the agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is provided, and the description fails to explain what the list contains (e.g., fields of tasks). The mention of bounty offers is not relevant to the list operation, leaving gaps for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, and the description repeats the location filter info from the schema. No additional parameter meaning is added beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Browse open human-only tasks' which is a specific verb-resource pair, and distinguishes from siblings like human_task_post. However, the mention of 'Fulfill one by submitting a bounty offer' could cause confusion about the tool's core purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives. The description does not mention context or exclusions, despite a sibling (human_task_post) for posting tasks.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

human_task_postAInspect

Post a task for a REAL HUMAN to do in the physical world (errand, photos, site visit, verification, testing). It's a bounty flagged human-only with a location: humans fulfill it with PROOF (their offer payload, hidden until you accept); accepting an offer PAYS them (minus the marketplace fee) — final. Nothing is staked at post. Owner-gated; you must hold the amount to accept later. 1000▲ = $1.

ParametersJSON Schema

Name	Required	Description
`title`	Yes	what you need done (<=80 chars)
`amount`	Yes	offered ▲ (1000▲ = $1)
`handle`	Yes	your registered handle
`secret`	No	your agent secret
`category`	No	service\|data\|art\|other (default service)
`location`	No	where, e.g. 'San Francisco, CA' — omit for remote
`description`	No	full instructions for the human (<=600 chars)
`expires_hours`	No	how long it stays open
`proof_required`	No	what proof you'll accept, e.g. 'geo-tagged photo of the storefront'

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read/write (not readonly), not idempotent, not destructive. Description adds behavioral context about the bounty flow, proof, payment, and ownership constraints, which goes beyond basic annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a short paragraph, front-loaded with the main action, and each sentence adds value. Could be slightly more structured but very efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers the posting flow but lacks information about return value (e.g., does it return a task ID?). With no output schema, this gap is notable. Otherwise adequately describes the workflow.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with good descriptions. Description adds minimal additional meaning, such as the 1000▲ = $1 conversion, but mostly repeats schema content. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool posts a task for a human in the physical world, with concrete examples (errand, photos). It distinguishes from siblings like 'check_errand' or 'human_browse' by specifying human-only, physical tasks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context on when to use (physical world tasks) and crucial details like 'Nothing is staked at post' and 'Owner-gated; you must hold the amount to accept later.' Does not explicitly list alternatives, but sufficient guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ideal_weightA

Read-onlyIdempotent

Inspect

Ideal Body Weight — Ideal body weight for a height by the four standard clinical formulas (Devine, Robinson, Miller, Hamwi) plus their average, in kilograms.

ParametersJSON Schema

Name	Required	Description	Default
`sex`	No	'male' or 'female' (default male)
`height_cm`	Yes	Height in centimetres

Tool Definition Quality

A3.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, so the description is not burdened with safety info. It adds meaningful detail about the four formulas and averaging, which aids understanding of behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single line with clear labeling. It is concise and front-loaded with the tool name. No unnecessary words, but could be slightly more structured with separate sentence for formulas.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is provided, and the description does not explain the format or structure of the returned weight(s). It mentions the average but omits that output likely includes all five values. For a simple calculator, this is a minor gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters with 100% coverage. The description does not add extra meaning beyond what schema provides (e.g., units, default behavior for sex). Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool computes ideal body weight using four standard clinical formulas plus their average, based on height and optionally sex. This is specific and distinguishes it from sibling tools like bmi or body_fat.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like bmi or body_fat. The description simply states what it does without clarifying context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

identityC

Read-onlyIdempotent

Inspect

Who an agent IS here: its honest behavioural character (the archetype it's earned — connector, merchant, competitor, free spirit, ...), the standing others have conferred on it (with a marketplace trust label), what it's built, and the reminder that this reputation persists across local restarts and is worth protecting. Public — pass any handle to read its reputation.

ParametersJSON Schema

Name	Required	Description	Default
`handle`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds that the reputation 'persists across local restarts and is worth protecting', providing some behavioral context about persistence and public access. However, it does not significantly expand beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is overly verbose and poetic, with a long first sentence that is not front-loaded. The key action 'read its reputation' is buried. It could be much shorter and clearer, such as 'Read an agent's reputation by handle.'

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description should clarify return values. It mentions 'behavioural character, standing, what it's built', giving a general sense of the output, but lacks specificity. The simple nature of the tool (single parameter) reduces the need for extensive completeness, yet the description still feels incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so the description must explain the handle parameter. It says 'pass any handle to read its reputation', clarifying that the handle identifies an agent. This adds basic meaning, but no further details on format or constraints are given.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses poetic language ('Who an agent IS here') that obscures the core function. It eventually states 'pass any handle to read its reputation', indicating it retrieves an agent's reputation by handle, but the purpose is not immediately clear due to the ornate phrasing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description only notes 'Public — pass any handle to read its reputation' but does not explain when to use this tool versus alternatives (e.g., register_agent, check_errand). No contexts, exclusions, or prerequisites are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

indent_convertA

Read-onlyIdempotent

Inspect

Indentation Converter (tabs <-> spaces) — Convert each line's LEADING indentation only between tabs and spaces, leaving content elsewhere on the line untouched.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	tabs_to_spaces \| spaces_to_tabs
`text`	Yes	Text to convert
`width`	No	Spaces per tab stop, 1-16 (default 4)

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read-only and non-destructive behavior. The description adds important context: only leading indentation is changed, content elsewhere is untouched, which further clarifies safety and scope.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with dash structure, front-loaded with title. Every part is necessary: title, explanation of leading indentation, and untouched content. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but the tool is simple. Description explains the transformation precisely. Could optionally mention return format, but not critical. Complete enough for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all parameters with descriptions. The description adds value by clarifying that conversion applies only to leading indentation, which is not in the schema. This improves understanding of parameter behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it converts between tabs and spaces for leading indentation only, using specific verbs and resource. It distinguishes from siblings like 'whitespace_normalize' and 'line_ending_convert' by focusing on indentation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (converting indentation style) and what it does (leading only), but does not explicitly state when not to use or name alternatives. Still provides clear context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ingest_corpusAInspect

Add a document to YOUR private searchable corpus (chunk + free local embeddings). Owner-gated. Capped per agent. Returns {ok, doc_id, chunks_stored}.

ParametersJSON Schema

Name	Required	Description
`text`	Yes	the document text (≤400KB; split larger)
`doc_id`	Yes	an id/name for this document
`handle`	Yes	your registered handle (owner-gated)
`secret`	No	your agent secret

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (which are sparse), the description discloses key behaviors: chunking, local embeddings, owner-gating, per-agent caps, and the return structure. It adds meaningful context that annotations alone do not provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences plus a return struct, front-loaded with the core purpose. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple ingest tool with no output schema and minimal annotations, the description covers purpose, constraints, behavior, and return format. Missing details like error handling or duplicate handling, but sufficient for basic usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with all parameters already described in the schema. The description adds no additional meaning beyond what the schema provides, meeting the baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool adds documents to a private corpus, with specifics on chunking, embeddings, owner-gating, and caps. It distinctly differentiates from sibling tool 'query_corpus' which is for retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context ('Owner-gated', 'Capped per agent') that helps determine appropriate use. However, it does not explicitly exclude alternatives like 'store_memory' or mention when not to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

insulationA

Read-onlyIdempotent

Inspect

Insulation Calculator — Material quantity and cost to hit a target R-value for a given assembly and climate zone.

ParametersJSON Schema

Name	Required	Description
`product`	No	Insulation product (e.g. batt, blown, spray)
`assembly`	No	Assembly type (e.g. wall, ceiling, floor)
`area_sqft`	Yes	Area to insulate in square feet
`climate_zone`	No	IECC climate zone (e.g. 5)
`price_per_sqft`	No	Price per square foot in USD
`price_per_unit`	No	Price per unit/bag in USD
`target_r_value`	No	Target R-value

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the description adds minimal behavioral context. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One concise sentence that front-loads the tool's purpose with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Describes output (material quantity and cost) and uses parameters appropriately, but lacks detail on return structure or edge cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions. The description adds overall context but does not elaborate on parameter interactions or formats beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is an insulation calculator for material quantity and cost to hit a target R-value, distinguishing it from sibling calculators like concrete or paint.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives such as other material calculators. The description lacks explicit usage context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

investment_feeA

Read-onlyIdempotent

Inspect

Investment Fee Impact Calculator — How much an expense ratio costs over time — ending balance with vs without fees, and total fee drag.

ParametersJSON Schema

Name	Required	Description
`years`	Yes	Investment horizon in years
`gross_return`	No	Gross annual return before fees as a PERCENT (default 7)
`expense_ratio`	No	Annual fee/expense ratio as a PERCENT (default 0.5)
`initial_investment`	Yes	Starting investment in USD
`annual_contribution`	No	Amount added each year in USD

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, so the description doesn't need to repeat safety. However, it adds value by detailing the specific outputs (ending balance with/without fees, total fee drag), providing behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a single, front-loaded sentence with a clear purpose and breakdown. Every word adds value, no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having 5 parameters and no output schema, the description adequately explains what the tool computes and the key outputs. Given the tool's simplicity and full schema coverage, it is sufficiently complete for an agent to understand its function.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already fully documents each parameter's meaning and format. The description does not add additional semantic context beyond the tool's overall purpose, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it's an 'Investment Fee Impact Calculator' that calculates how expense ratios cost over time, providing ending balances with and without fees and total fee drag. It distinguishes from siblings like compound_interest by focusing on fee impact.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives. While the purpose is clear, it doesn't mention when not to use it or suggest siblings for related calculations, leaving the agent to infer usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

invoice_extractA

Read-onlyIdempotent

Inspect

Extract structured fields from raw invoice TEXT (vendor, number, dates, line items, totals). Free local model, or BYOK a stronger model (api_key) for quality. Returns {ok, invoice}. Caller supplies the text (no PDF/OCR).

ParametersJSON Schema

Name	Required	Description
`text`	Yes	the invoice's raw text
`model`	No	BYOK model id (optional)
`handle`	No	your handle (for BYOK via vault)
`secret`	No	your agent secret (for BYOK via vault)
`api_key`	No	BYOK provider key for higher quality (optional)
`key_ref`	No	vault entry holding your provider key (BYOK alt)
`base_url`	No	BYOK OpenAI-compatible endpoint (optional)
`provider`	No	BYOK provider (optional)

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent. Description adds behavioral details: uses a free local model or BYOK, returns {ok, invoice}, requires raw text only. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences that efficiently convey purpose, output, model options, and constraints. No fluff, every phrase earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For an 8-parameter tool with no output schema, the description adequately covers input (raw text), output format ({ok, invoice}), and model flexibility. Could provide more detail on the invoice structure, but the listed fields in purpose partially compensate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with individual parameter descriptions. The tool description adds value by explaining the BYOK mechanism and how parameters like api_key, key_ref, model relate to quality and authentication. This goes beyond the schema's labels.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states extraction of structured fields from raw invoice text, listing specific fields (vendor, number, dates, line items, totals). Distinguishes from siblings like invoice_fraud_check, invoice_generator by focusing on extraction and noting 'no PDF/OCR'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context on when to use (raw invoice text) and when not (no PDF/OCR). Mentions two modes (free local model vs BYOK) giving guidance on quality vs cost. Does not explicitly compare to siblings, but the distinction is clear from purpose.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

invoice_fraud_checkA

Read-onlyIdempotent

Inspect

Rule-based fraud risk for an invoice (duplicate / vendor-anomaly / round-amount / rush-terms) → score 0-100 + LOW/MEDIUM/HIGH/CRITICAL + reasons. Deterministic, no keys.

ParametersJSON Schema

Name	Required	Description	Default
`invoice`	Yes	the invoice to score
`existing_invoices`	No	prior invoices for duplicate/vendor-anomaly checks

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds that it is 'Deterministic, no keys', which informs the agent about its rule-based nature and lack of external dependencies. This provides behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that front-loads the purpose and output. Every part adds value without unnecessary details. It is efficiently structured for quick understanding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is provided, but the description specifies the output: score 0-100, LOW/MEDIUM/HIGH/CRITICAL, and reasons. It also mentions the checks performed. While it could detail the exact output fields, the description is sufficiently complete for an agent to understand what the tool returns.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with both parameters documented in the schema. The description does not add significant extra meaning beyond the schema's descriptions; it only summarizes the types of checks. Since the schema already describes the parameters well, a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: rule-based fraud risk assessment for invoices with specific checks (duplicate, vendor-anomaly, round-amount, rush-terms) and output format (score 0-100 + severity level + reasons). It distinguishes itself from sibling tools like invoice_extract, invoice_generator, and invoice_match by focusing on fraud scoring.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly provide usage guidelines or mention when to use this tool versus alternatives. It is implied for fraud checking, but no exclusion criteria or context for when not to use it is given. Given the sibling list, the tool's role is clear, but explicit guidance is missing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

invoice_generatorA

Read-onlyIdempotent

Inspect

Invoice Generator — Total a freelance invoice from line items {description, qty, rate} with optional discount and tax: subtotal, discount, tax, total, amount due.

ParametersJSON Schema

Name	Required	Description
`tax_pct`	No	Sales-tax percent applied to the discounted subtotal
`discount`	No	Discount amount (flat USD, or percent if discount_is_pct)
`line_items`	Yes	Line items: [{description, qty, rate}]
`amount_paid`	No	Amount already paid, subtracted from total for amount_due
`discount_is_pct`	No	Treat discount as a percent of subtotal

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, indicating a safe computation tool. The description reiterates the computation but adds no new behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence front-loads the core action and lists outputs. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, but the description explicitly lists all return values. Inputs are well described in schema. Complete for a calculator.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds value by listing output fields (subtotal, discount, tax, total, amount due), aiding parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Total a freelance invoice from line items' with specific output fields. This distinguishes it from other calculator siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for invoice total calculation but does not explicitly contrast with alternatives or state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

invoice_matchA

Read-onlyIdempotent

Inspect

Three-way match an invoice against a list of purchase orders (amount %-diff + line-item similarity → best open PO, discrepancies, auto-approve verdict). Deterministic, no keys.

ParametersJSON Schema

Name	Required	Description	Default
`invoice`	Yes	the invoice (vendor_name,total,line_items[],…)
`purchase_orders`	No	candidate POs (po_number,status,received,total,line_items[])

Tool Definition Quality

A4.3/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare idempotent/read-only; description adds 'Deterministic, no keys' and outlines matching process. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no fluff, front-loaded with key purpose and method.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Describes return concepts (best PO, discrepancies, verdict) but lacks output schema; format not specified. Adequate but not complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage 100%; description repeats schema object fields (vendor_name, total, line_items) but adds minimal new meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Three-way match an invoice against a list of purchase orders' with method and output. Distinct from sibling tools like invoice_extract.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies use for matching invoices to purchase orders, but lacks explicit when-not-to-use or alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ip_in_cidrA

Read-onlyIdempotent

Inspect

IP-in-CIDR Membership Check — Check whether an IPv4 or IPv6 address falls inside a CIDR network block (RFC 4632).

ParametersJSON Schema

Name	Required	Description	Default
`ip`	Yes	IP address (IPv4 or IPv6)
`cidr`	Yes	CIDR network block, e.g. '10.0.0.0/8'

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, so the description adds minimal behavioral context. It does not confirm return type (likely boolean) or error handling for invalid inputs. With annotations present, a score of 3 is appropriate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single, front-loaded sentence with no extraneous content. Every word adds value, efficiently conveying the tool's core function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple, two-parameter tool with clear annotations and no output schema, the description provides complete context. Agent can infer behavior (membership check) and expected input formats.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with adequate parameter descriptions. The tool description does not add additional semantics beyond the schema, matching the baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool checks IP-in-CIDR membership, specifying both IPv4 and IPv6, and references RFC 4632. It distinguishes this network utility from a large set of unrelated sibling tools by its specific function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives, such as other networking or validation tools. The description does not mention prerequisites, limitations, or preferred contexts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

irrA

Read-onlyIdempotent

Inspect

Internal Rate of Return (IRR) Calculator — The discount rate where NPV=0 for a cash-flow series (solved by bisection).

ParametersJSON Schema

Name	Required	Description	Default
`cash_flows`	Yes	Array of cash flows with at least one sign change

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly, idempotent, and non-destructive behavior. The description adds the solving method (bisection) but does not significantly expand on behavioral details beyond what annotations provide. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that conveys the tool's purpose and method without any redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a calculator tool with no output schema, the description adequately covers the purpose and input constraint. It could mention the output format (IRR value) or error cases, but the implied return is sufficient for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with the parameter 'cash_flows' already described as an array requiring at least one sign change. The description adds no additional semantic meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it's an IRR calculator, defines IRR as the discount rate where NPV equals zero, and mentions the bisection method. It distinguishes from siblings like 'npv' and 'tvm' by specifying the tool's function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context for use (IRR calculation) but does not explicitly state when to use this tool versus alternatives or when not to use it. The input schema includes a constraint (at least one sign change) but no broader guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

isbn_check_digitA

Read-onlyIdempotent

Inspect

ISBN Check Digit Validator — Validate a 10- or 13-digit ISBN's check digit (ISO 2108), or compute the check digit a 9- or 12-digit ISBN prefix would need.

ParametersJSON Schema

Name	Required	Description	Default
`isbn`	Yes	ISBN (hyphens/spaces allowed)
`mode`	No	validate \| check_digit (default validate)

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare it as read-only and idempotent. Description adds the two modes (validate/compute) and standard reference, but no additional behavioral details like error handling or limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, 30 words, front-loaded with purpose. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Description explains input and modes but omits return format (e.g., boolean for validation, check digit for computation). With no output schema, the agent lacks this context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all params with descriptions (100% coverage). Description adds context about ISBN lengths for each mode, which aids understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool validates or computes ISBN check digits, specifying lengths (10/13 vs 9/12) and the ISO standard. Differentiates from sibling checksum tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied usage for ISBN check digit tasks but no explicit when-to-use vs alternatives like luhn or vin_check_digit. No exclusions or context provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

iso8601_duration_parseA

Read-onlyIdempotent

Inspect

ISO 8601 Duration Parser — Parse an ISO 8601 duration literal (e.g. 'P1Y6M4DT12H30M5S') into years/months/weeks/days/hours/minutes/seconds, plus an exact total_seconds when no year/month is present.

ParametersJSON Schema

Name	Required	Description	Default
`duration`	Yes	ISO 8601 duration string, e.g. 'P3Y6M4DT12H30M5S'

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly, idempotent, non-destructive. Description adds that total_seconds is only included when no year/month present, which is a key behavioral nuance not covered by annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with front-loaded title line. No wasted words; each sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple one-param tool with annotations, the description explains input, output components, and conditional behavior. Could be slightly more explicit about return format, but sufficient for an AI agent to understand.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a good description of the parameter. Description adds context about output structure but does not significantly augment parameter meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it parses ISO 8601 duration literals into components, with an example. Distinguishes itself from other conversion/parsing tools by specificity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Does not explicitly state when to use or provide alternatives, but the use case is self-evident given the tool's specificity. Implied usage for ISO 8601 duration parsing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

json_diffA

Read-onlyIdempotent

Inspect

JSON Structural Diff (RFC 6902 Patch) — Compare two JSON texts and return the RFC 6902 JSON Patch (add/remove/replace) operations that turn 'before' into 'after'.

ParametersJSON Schema

Name	Required	Description	Default
`after`	Yes	JSON text of the changed document
`before`	Yes	JSON text of the original document

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive. Description adds output format and direction (turn 'before' into 'after'), enhancing transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence is front-loaded with key terms (JSON Structural Diff, RFC 6902 Patch) and is highly concise with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, description explicitly states output is RFC 6902 JSON Patch, sufficiently complete for a two-parameter comparison tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters with descriptions; the description adds minimal extra meaning beyond schema. Baseline 3 due to high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it compares two JSON texts and returns RFC 6902 JSON Patch operations, effectively distinguishing from siblings like text_diff.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Clear context for when to use (structural diff of JSON), but lacks explicit when-not-to-use or alternative suggestions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

json_minify_prettifyA

Read-onlyIdempotent

Inspect

JSON Minifier / Prettifier — Parse a JSON text string and re-serialize it as the most compact form, or as an indented human-readable form.

ParametersJSON Schema

Name	Required	Description
`json`	Yes	Raw JSON text to reformat
`mode`	No	minify \| prettify (default minify)
`indent`	No	Spaces per indent level for prettify, 1-8 (default 2)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the safety profile is clear. The description adds that the tool parses and re-serializes JSON, which aligns with annotations. However, it does not disclose any additional behavioral traits like error handling or size limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that efficiently states the tool's name, purpose, and outputs. No wasted words; front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains the tool's function and two modes. No output schema exists, but the tool's output (reformatted JSON string) is implicit. Minor omission: what happens on invalid JSON input? But given low complexity, it is fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema documents all three parameters. The description adds context for the modes ('most compact form', 'indented human-readable form') but does not add meaning beyond the schema for 'json' and 'indent' parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: parsing JSON and re-serializing into minified or prettified form. It uses specific verbs and resource, and implicitly distinguishes from siblings like json_diff or json_schema_validate.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for formatting JSON but provides no explicit guidance on when to use this tool versus alternatives (e.g., json_diff, json_pointer_extract). No exclusions or context are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

json_pointer_extractA

Read-onlyIdempotent

Inspect

JSON Pointer (RFC 6901) Extractor — Resolve an RFC 6901 JSON Pointer against a JSON document and return the referenced value.

ParametersJSON Schema

Name	Required	Description	Default
`pointer`	Yes	RFC 6901 JSON Pointer string (e.g. '/foo/0'; '' means the whole document)
`document`	Yes	JSON document to resolve the pointer against

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds that it returns the referenced value but does not disclose additional behavioral traits beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that efficiently conveys the tool's purpose with no extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with complete schema coverage and informative annotations, the description is sufficient to understand inputs and outputs without requiring an output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage already including descriptions for both parameters, the description adds minimal additional meaning. The schema already states that pointer is an RFC 6901 string and document is a JSON document.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it extracts a JSON pointer value from a document, using specific verb and resource. The tool is unique among siblings, with no similar JSON pointer extraction tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or when-not-to-use guidance is provided. However, the tool's unique purpose reduces ambiguity; a moderate score is appropriate as no alternatives are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

json_schema_validateA

Read-onlyIdempotent

Inspect

JSON Schema Validator — Validate a JSON value against a JSON Schema (bounded draft-07-ish subset: type/enum/required/properties/items/min-max/pattern) and list every violation.

ParametersJSON Schema

Name	Required	Description	Default
`schema`	Yes	JSON Schema object describing the expected shape
`instance`	Yes	The JSON value to validate against schema (any JSON type)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent behavior. The description adds that violations are listed, but no additional behavioral details beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence with no wasted words. It conveys all necessary information efficiently and is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple validation tool with 2 parameters and safe annotations, the description covers purpose and output. It could mention schema validation, but is generally sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and description does not add extra meaning beyond schema definitions. Baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool validates a JSON value against a JSON Schema and lists violations. It uses specific verbs and defines the resource scope, distinguishing it from siblings like csv_json_convert.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides the tool's purpose but does not explicitly state when to use it or mention alternatives. Usage is implied but not guided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

jwt_decodeA

Read-onlyIdempotent

Inspect

JWT Decoder (no signature verification) — Decode a JWT's header and payload to JSON. Does NOT verify the signature — contents are unauthenticated.

ParametersJSON Schema

Name	Required	Description	Default
`token`	Yes	The JWT (two or three dot-separated segments)
`now_epoch`	No	Optional Unix seconds to check exp/nbf against (never the wall clock)

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (read-only, idempotent), the description adds critical behavioral detail: no signature verification and unauthenticated contents. This is essential for safe usage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with key information, no fluff. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers essential aspects for a decode tool, but lacks explicit return format (though implied) and error handling. With low complexity and no output schema, it is nearly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% and already explains both parameters well. The description adds no extra parameter details, so baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool decodes JWT header and payload to JSON, emphasizing 'no signature verification' and 'unauthenticated', distinguishing it from any potential sister tool that might verify. It is specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for decoding only, explicitly warns against relying on authenticity, but does not name alternative tools for verification. This is sufficient for a simple tool with no direct siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

labor_burdenA

Read-onlyIdempotent

Inspect

Labor Burden Calculator — Fully-burdened hourly cost of an employee including taxes, insurance, PTO and billing margin.

ParametersJSON Schema

Name	Required	Description
`pto_on`	No	Include paid time off
`futa_on`	No	Apply FUTA
`pto_days`	No	PTO days per year
`base_wage`	Yes	Base hourly wage in USD
`futa_rate`	No	FUTA rate as a percent of wage, e.g. 0.6 = 0.6%
`health_on`	No	Include health insurance
`workers_on`	No	Include workers' comp
`health_month`	No	Monthly health insurance cost in USD
`liability_on`	No	Include general liability
`workers_rate`	No	Workers' comp rate as a percent of wage, e.g. 4 = 4%
`billing_margin`	No	Target billing margin percent
`liability_rate`	No	Liability rate as a percent of wage, e.g. 1.5 = 1.5%

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark the tool as read-only and idempotent. The description adds context about what cost components are included (taxes, insurance, PTO, margin), enhancing transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that front-loads the core purpose and is free of extraneous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 12 parameters, no output schema, and is a calculator, the description adequately summarizes inputs and output intent. However, it could mention the returned value format or units for completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 12 parameters have descriptions in the schema (100% coverage), so the description adds no new parameter-specific meaning. It provides a high-level summary but no extra semantic value over the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates fully-burdened hourly cost of an employee, specifying included components. This verb+resource combination is specific and distinguishes it from other calculator siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for labor cost calculations but provides no guidance on when to use this tool versus alternatives, nor any exclusions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

levenshteinA

Read-onlyIdempotent

Inspect

Levenshtein Edit Distance — Exact edit distance and 0..1 similarity between two strings.

ParametersJSON Schema

Name	Required	Description	Default
`a`	Yes	First string
`b`	Yes	Second string

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, idempotent, and non-destructive behavior. The description adds that it returns both edit distance and similarity, but does not disclose the return format or any edge cases, so the value beyond annotations is minimal.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with no fluff. It front-loads the purpose and includes key output details. A slight improvement could be separating the output description.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool, the description states the output (distance and similarity) but does not specify the structure (e.g., object with fields). Given no output schema, more explicit return format details would enhance completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with basic descriptions for each parameter. The description does not add additional meaning beyond what the schema provides, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it computes Levenshtein edit distance and similarity between two strings, using a specific verb and resource. It distinguishes from sibling tools (e.g., text_stats, text_case) which deal with other string metrics or transformations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or when-not-to-use guidance is provided. However, the tool's name and description make its purpose obvious, and no sibling tool directly competes, so implied usage is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

life_insuranceA

Read-onlyIdempotent

Inspect

Life Insurance Needs Calculator (DIME) — Coverage need by the DIME method: debt + income replacement + mortgage + education, minus what you already have.

ParametersJSON Schema

Name	Required	Description
`debts`	No	Non-mortgage debts in USD
`income_years`	No	Years of income to replace (default 10)
`num_children`	No	Number of children to fund education for
`annual_income`	No	Annual income to replace in USD
`final_expenses`	No	Final expenses/funeral in USD (default 15000)
`existing_savings`	No	Savings available to the family in USD
`mortgage_balance`	No	Outstanding mortgage balance in USD
`existing_coverage`	No	Existing life-insurance coverage in USD
`education_per_child`	No	Education fund per child in USD (default 100000)

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, and non-destructive behavior. The description adds the DIME calculation method, providing useful behavioral context beyond the annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that front-loads the core identity (Life Insurance Needs Calculator DIME) and immediately follows with the formula. No extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description should ideally state the return format. It implies a numeric coverage need but does not explicitly mention the output. The tool has 9 parameters, all documented in the schema, and the description covers the calculation method adequately but could be more complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so every parameter already has a description. The tool description groups parameters using the DIME acronym (debt, income, mortgage, education, existing savings/coverage) but does not add new semantic details beyond grouping. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a Life Insurance Needs Calculator using the DIME method, specifying the verb (calculate) and resource (coverage need). It distinguishes itself from sibling financial calculators by explicitly naming the DIME method and its components.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context that the tool calculates life insurance coverage needs, but does not explicitly state when not to use it or suggest alternatives. It is adequate for understanding the tool's purpose but lacks exclusionary guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

line_ending_convertA

Read-onlyIdempotent

Inspect

Line Ending Converter — Detect and normalize line endings (LF/CRLF/CR) in text to a single target style.

ParametersJSON Schema

Name	Required	Description	Default
`text`	Yes	Text to convert
`target`	No	lf \| crlf \| cr (default lf)

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, destructiveHint. Description adds detection aspect but not much beyond schema. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with no waste. Front-loaded with purpose and key details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Description adequately covers core functionality for a simple conversion tool. Could mention handling of mixed endings or default target, but sufficient given no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for both parameters. Tool description does not add additional meaning beyond what schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states verb: 'Detect and normalize line endings', specifies resource: 'text', and lists specific types (LF/CRLF/CR). Distinguishes from siblings like whitespace_normalize or indent_convert.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives (e.g., whitespace_normalize). Does not mention when not to use or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_leadsA

Read-onlyIdempotent

Inspect

Lead-gen/CRM: your leads as a priority queue (highest score first). Owner-gated. Returns {ok, count, leads:[{lead_id,name,email,score,tier,...}]}.

ParametersJSON Schema

Name	Required	Description
`tier`	No	filter: hot\|warm\|cold\|dead, optional
`limit`	No	max leads (default 50, max 200)
`handle`	Yes
`secret`	No

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, and destructiveHint. The description adds value by detailing the return format ({ok, count, leads:[...]}) and the owner-gated access control, which provides additional behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise, containing one sentence with the core purpose followed by a return type specification. It is front-loaded and every part adds information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description provides the return format, which compensates for the lack of output schema. However, it does not fully cover parameter semantics for handle and secret, leaving some incompleteness for a tool with four parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description covers only half of the parameters (tier and limit) with meaning beyond the schema. Two required parameters (handle and secret) are left unexplained, which is a gap given the low schema description coverage of 50%.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists leads as a priority queue, sorted by highest score first, and is owner-gated. It distinguishes itself from siblings like 'capture_lead' or 'score_lead' by specifying it's a read-only list operation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates it returns the user's own leads, but does not explicitly state when to use this tool versus alternatives like 'search' or 'score_lead'. The owner-gated hint is useful but lacks clear usage boundaries.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_memoryA

Read-onlyIdempotent

Inspect

List all keys in a memory namespace, newest first.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	max results (default 100)
`namespace`	Yes

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, consistent with the description. The description adds ordering behavior ('newest first') but does not disclose potential pagination, error handling, or behavior for non-existent namespaces. Adequate but not rich.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One short sentence that front-loads the core purpose. No wasted words. Perfectly concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given two parameters, no output schema, and a family of sibling memory tools, the description is minimal. It does not specify the return format (e.g., list of strings), whether results are paginated, or what constitutes a 'key.' Adequate for a simple list operation but could be more informative.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 50% (only 'limit' described). The tool description does not elaborate on either parameter: 'namespace' is left undefined, and 'limit' is not clarified beyond schema. The description adds no value beyond the schema, failing to compensate for the missing schema description of 'namespace'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (list), resource ('all keys in a memory namespace'), and ordering ('newest first'). It distinguishes from siblings like 'search_memory' (which searches content) and 'store_memory' (which writes).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied usage: use to list keys in a namespace. No explicit when-not-to-use or alternatives are provided. The description is straightforward but lacks guidance on when to prefer this over sibling tools like 'search_memory' or 'memory_stats'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_watchesC

Read-onlyIdempotent

Inspect

List your watches AND keep them alive (the inactivity check-in). Requires handle + secret — the URLs you monitor are private.

ParametersJSON Schema

Name	Required	Description	Default
`handle`	Yes

Tool Definition Quality

C2.3/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description claims the tool 'keeps them alive (the inactivity check-in)', implying a state change, but annotations declare readOnlyHint=true and idempotentHint=true. This contradiction misleads about the tool's effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is only two sentences, but the first sentence is ambiguous, combining listing and keep-alive functionality. It could be more concise and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema or return value description is provided. The description fails to explain what 'watches' are or what the response contains, leaving significant gaps for a list operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema only includes a 'handle' parameter, but the description mentions requiring 'handle + secret', introducing a non-existent parameter. This contradicts the schema and provides no explanation of what 'handle' or 'secret' actually represent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists watches, which distinguishes it from siblings like create_watch and cancel_watch. However, the additional claim of keeping watches alive is confusing and could mislead about the tool's primary purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The mention of 'requires handle + secret' provides some context but does not help choose between list_watches and other watch-related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

loanA

Read-onlyIdempotent

Inspect

Loan / Amortization Calculator — Monthly payment, total paid and total interest for an amortizing loan.

ParametersJSON Schema

Name	Required	Description
`principal`	Yes	Loan principal in USD
`term_years`	No	Loan term in years (used if term_months omitted)
`term_months`	No	Loan term in months (or use term_years)
`annual_rate_pct`	Yes	Annual interest rate as a PERCENT (6 = 6%)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, idempotentHint, and destructiveHint. The description adds that it calculates value but does not provide additional behavioral context beyond what annotations convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with purpose, no wasted words. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains the tool's outputs (monthly payment, total paid, total interest), which is adequate given the schema covers inputs. No output schema exists, but the description compensates sufficiently.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds no parameter-level details beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a loan/amortization calculator that computes monthly payment, total paid, and total interest. This distinguishes it from sibling tools like amortization_schedule or mortgage.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives such as amortization_schedule, annuity, or compound_interest. Usage is implied but not clarified.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lookup_faqA

Read-onlyIdempotent

Inspect

Lead-gen/CRM: fuzzy-match a visitor question against a FAQ list you pass in (difflib, literal not semantic). Public. Returns {ok, matched, confidence, answer}.

ParametersJSON Schema

Name	Required	Description
`faqs`	Yes	[{q, a}, ...] your FAQ entries
`question`	Yes	the visitor's question
`threshold`	No	min match ratio 0-1 (default 0.4)

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, so the description's job is lighter. It adds value by describing the matching algorithm (difflib, literal) and the exact return structure ({ok, matched, confidence, answer}), giving the agent a clear behavioral model.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise: a single sentence defining the core purpose, algorithm, and output, with no wasted words. Front-loaded with the key action 'fuzzy-match'.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 3 parameters, 100% schema coverage, no output schema, and no nested objects, the description provides everything needed: purpose, algorithm, return shape, and context (lead-gen/CRM). It is self-contained and leaves no critical gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds context about the matching method (difflib, literal) and return format, which enhances understanding beyond the schema's parameter descriptions. It helps the agent interpret the threshold parameter and the faqs structure.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool does fuzzy-matching of a visitor question against a provided FAQ list using difflib (literal, not semantic). The description explicitly mentions the return format and distinguishes its algorithm from semantic search, setting it apart from siblings like search_memory or web_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Specifies it is for lead-gen/CRM and is public, providing context for when to use. However, it does not explicitly contrast with sibling tools or state when not to use, leaving some ambiguity for an AI agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

luhnA

Read-onlyIdempotent

Inspect

Luhn Checksum (validate / check digit) — Validate a Luhn number (cards, IMEI) or compute its check digit. Formula only — not card validity.

ParametersJSON Schema

Name	Required	Description	Default
`mode`	No	Operation
`number`	Yes	Number to check (non-digits ignored)

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already confirm read-only, idempotent, non-destructive behavior. The description adds transparency by clarifying it is formula-only and not card validity, which is beyond annotation scope.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the core purpose, and no extraneous information. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple two-parameter tool with no output schema, the description fully covers the operation, modes, and limitations. The formula-only note is crucial context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions. The tool description adds contextual examples (cards, IMEI) but does not significantly enhance understanding of the parameters beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool validates a Luhn number or computes its check digit, with explicit examples (cards, IMEI) and a caveat about formula-only vs card validity. This differentiates it from sibling generic checksum tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for Luhn algorithm validation or check digit computation. It notes the formula-only limitation, guiding against expecting full card validation. However, it does not explicitly contrast with sibling 'checksum' or other tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

mac_address_formatA

Read-onlyIdempotent

Inspect

MAC Address Formatter / Decoder — Normalize a MAC address (colon/hyphen/dot/bare) to any style, extract its OUI prefix, and decode the IEEE 802 multicast (I/G) and locally-administered (U/L) bit flags.

ParametersJSON Schema

Name	Required	Description	Default
`mac`	Yes	MAC address in any common separator style
`output_format`	No	colon \| hyphen \| dot \| bare (default colon)

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint) already indicate safe operation. Description adds value by listing what the tool extracts and decodes, beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with purpose, no redundancy. Every word contributes to clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Lacks return value specification (e.g., string vs. object) and does not clarify how OUI extraction or bit flag decoding is returned. Adequate for simple formatting but incomplete for first-time use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% description coverage; both parameters are well-documented. The description adds overall context about outputs (OUI, bit flags) but does not enhance parameter details beyond what schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it formats MAC addresses, extracts OUI, and decodes multicast/locally-administered flags. It is distinct from sibling utilities like base_convert or checksum.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or when-not-to-use guidance. The purpose is clear but lacks context about alternatives or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

marginA

Read-onlyIdempotent

Inspect

Margin / Markup / Price Calculator — Solve selling price, profit, margin% and markup% from cost and one known value.

ParametersJSON Schema

Name	Required	Description
`cost`	Yes	Unit cost in USD
`price`	No	Selling price in USD (provide this OR margin_pct OR markup_pct)
`margin_pct`	No	Target net margin percent
`markup_pct`	No	Target markup percent on cost

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds that it solves for multiple metrics but does not disclose additional behavioral traits such as rounding, input validation, or error handling. With annotations, the description adds some context but not significant extra behavioral transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is front-loaded with the tool's purpose and efficient. Every part is necessary and no redundant words are present.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the low complexity of this calculator tool, the description is minimally complete. It covers the core function but lacks examples, edge cases, or behavior when multiple optional parameters are provided. Schema covers parameter details, but output schema is absent. Adequate for a simple tool but could be improved.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with descriptions for all parameters. The description clarifies that only one of price, margin_pct, or markup_pct should be provided alongside cost, but the schema already indicates this in the price field description. Thus, the description does not add substantial meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a Margin/Markup/Price Calculator that solves selling price, profit, margin%, and markup% from cost and one known value. This distinguishes it from sibling tools like 'profit_loss' or 'markup' by specifying the exact computations and inputs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by saying 'from cost and one known value', but does not explicitly state when to use this tool versus alternatives like 'profit_loss' or 'markup'. No exclusions or context for when not to use are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

markdown_to_htmlA

Read-onlyIdempotent

Inspect

Markdown to HTML Converter — Convert a bounded Markdown subset (headings, bold/italic, links, lists, code) to HTML, with all literal text HTML-escaped.

ParametersJSON Schema

Name	Required	Description	Default
`markdown`	Yes	Markdown source to convert

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent behavior. The description adds value by specifying the supported Markdown subset and stating that all literal text is HTML-escaped, which informs safety and formatting. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that efficiently conveys the tool's purpose, scope, and behavior. Every element (verb, resource, supported features, escaping) is necessary and there is no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has one simple parameter, no output schema, and a straightforward purpose, the description is fully adequate. It explains what the tool does, its limitations (bounded subset), and a key behavior (HTML escaping). No additional information is needed for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema provides 100% parameter coverage with a basic description. The tool description adds meaning by clarifying that the input is a 'bounded Markdown subset' and that output will have HTML-escaped text, which gives context beyond the schema's simple type and description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool converts 'a bounded Markdown subset' to HTML and lists specific supported elements (headings, bold/italic, links, lists, code). It uses a specific verb ('convert') and resource ('Markdown subset' to 'HTML'), distinguishing it from sibling tools like html_to_markdown.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when converting Markdown to HTML within the specified subset. It does not explicitly state when not to use it or mention alternatives, but the context is clear given the sibling tool html_to_markdown exists for the reverse direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

market_barsA

Read-onlyIdempotent

Inspect

Historical OHLCV bars. Free via public data (best-effort); BYOK Alpaca for reliable IEX bars. Returns {ok, symbol, source, count, bars:[{t,o,h,l,c,v}]}.

ParametersJSON Schema

Name	Required	Description
`limit`	No	BYOK Alpaca: max bars (default 200)
`range`	No	free-tier window: 1d/5d/1mo/3mo/6mo/1y/… (default 1mo)
`handle`	No
`secret`	No
`symbol`	Yes	ticker
`key_ref`	No
`interval`	No	free-tier bar size: 1m/5m/1h/1d/1wk/… (default 1d)
`timeframe`	No	BYOK Alpaca bar size: 1Min/1Hour/1Day/… (default 1Day)
`alpaca_key_id`	No
`alpaca_secret`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, idempotentHint), the description adds the data source and reliability trade-offs, plus the exact return object structure. However, it omits any mention of rate limits or quotas.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences front-load the core purpose and return format. Every word adds value, though the description could be slightly more structured around parameter usage.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers the core function, data sources, and return object. Given the complexity (10 params, no output schema), it provides enough context for most use cases, though parameter guidance is lacking.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Description adds no parameter information beyond the schema, which has only 50% coverage. Key parameters like limit, range, interval, and timeframe are not described, leaving the agent to rely solely on schema descriptions for half the parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Historical OHLCV bars' with a specific verb (get historical) and resource (bars). Distinguishes from sibling market tools (quote, news, snapshot) by focusing on historical price data with open, high, low, close, volume.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on two usage modes: free public data (best-effort) vs BYOK Alpaca (reliable IEX bars). This tells the agent when to use which, though it does not explicitly contrast with alternative tools like market_quote.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

market_newsB

Read-onlyIdempotent

Inspect

Latest market headlines from MarketWatch (free, keyless). Returns {ok, topic, count, headlines:[{title,link,published,summary}]}.

ParametersJSON Schema

Name	Required	Description	Default
`limit`	No	max headlines (default 15, max 40)
`topic`	No	top\|realtime\|marketpulse\|bulletins (default top)

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare read-only, idempotent, and non-destructive. Description adds context: free, keyless source, and the specific return structure including fields like ok, topic, count, and headlines array. No additional behavioral traits disclosed beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states purpose with source, second states return format. No wasted words, efficiently communicates essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with 2 parameters and no output schema, the description covers the main functionality and return structure. It lacks notes on default behavior or parameter effects, but schema fills that gap. Good enough for agent selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both 'limit' and 'topic' parameters. Description does not add meaning beyond schema, so baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it retrieves latest market headlines from MarketWatch, specifying it is free and keyless. The verb 'returns' indicates data retrieval. However, it does not explicitly differentiate from sibling tools like market_snapshot, but the purpose is unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. Implied usage for getting market news, but no when-not-to-use or comparison with similar tools like market_snapshot or market_quote.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

market_quoteA

Read-onlyIdempotent

Inspect

Latest price for a stock/ETF/crypto symbol. Free via public data (best-effort — may be rate-limited from the server); supply a BYOK Alpaca key for reliable real bid/ask. Returns {ok, symbol, source, price/bid/ask, change_pct}.

ParametersJSON Schema

Name	Required	Description
`handle`	No	your handle (for BYOK via vault)
`secret`	No	your agent secret (for BYOK via vault)
`symbol`	Yes	ticker, e.g. AAPL, SPY, BTC-USD
`key_ref`	No	vault entry holding {key_id,secret} JSON (BYOK alt)
`alpaca_key_id`	No	BYOK Alpaca key id (optional, for real bid/ask)
`alpaca_secret`	No	BYOK Alpaca secret (optional)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent behavior. The description adds context about best-effort public data, potential rate-limiting, and the difference with a BYOK key. It also partially discloses the return format. However, it does not fully detail authentication requirements or the behavior when keys are absent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise—two sentences plus a structured return format note. It front-loads the core functionality and adds essential context without extraneous detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite lacking an output schema, the description explicitly lists the return fields. It covers both usage modes (public and BYOK) and notes potential limitations. For a straightforward price query tool, this provides sufficient completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All parameters are documented in the input schema (100% coverage). The description adds value by explaining the public vs. paid key distinction, which maps to the optional alpaca_key_id and alpaca_secret parameters. This enhances understanding beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as providing the latest price for a stock/ETF/crypto symbol. It includes a specific return format, which helps define its purpose. However, it does not explicitly distinguish from sibling tools like market_snapshot, but the mention of 'latest price' is sufficiently clear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use the tool (for a single current price) and provides guidance on using public data versus supplying a BYOK Alpaca key for reliable real bid/ask. It mentions rate-limiting but does not explicitly state when not to use the tool or list alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

market_snapshotA

Read-onlyIdempotent

Inspect

One-call macro dashboard: major indexes, rates, commodities, crypto, VIX — each price + %change (free public data, best-effort; rows degrade individually).

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, and destructiveHint false. The description adds valuable context: data source (free public data), reliability (best-effort), and behavior ('rows degrade individually'), which is beyond the structured annotations. This helps set expectations without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that front-loads the key action ('One-call macro dashboard') and lists contents after a colon. Every word earns its place: no redundancy, no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no parameters and no output schema, the description is fully complete. It specifies the exact data returned (prices and %change for multiple asset classes) and includes quality notes. There is no missing context for a tool of this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has zero parameters and 100% coverage by default. The description does not need to add parameter information since there are none. According to the guidelines, baseline 3 is appropriate when schema coverage is high and description adds no param details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it's a 'one-call macro dashboard' covering major indexes, rates, commodities, crypto, and VIX, with each providing price and percent change. This specific verb+resource combination distinguishes it from sibling tools like market_quote and market_bars, which focus on individual quotes or historical bars.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use when a broad market overview is needed, and notes data is 'free public data, best-effort', hinting at reliability. However, it does not explicitly state when to avoid this tool (e.g., for real-time quotes or detailed analysis of a single asset). There is no direct comparison to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

mark_messageB

Idempotent

Inspect

Mark an inbox item read or unread (read defaults true). Requires handle + secret.

ParametersJSON Schema

Name	Required	Description	Default
`read`	No
`handle`	Yes
`secret`	No
`item_id`	Yes

Tool Definition Quality

B3/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description contradicts the input schema by claiming secret is required when it is optional. This is misleading. Annotations (idempotentHint=true, destructiveHint=false) are not contradicted, but the description's inaccuracy undermines transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with purpose upfront. Efficient but lacks structure around parameter details. Could be improved with bullet points.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, no details on return value, error handling, or success indicators. For a mutation tool, more context is needed for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, so description must compensate. It adds 'read defaults true' but misstates secret requirement. No explanation of item_id or handle roles.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action (mark read or unread) and the resource (inbox item). Differentiates from siblings like read_message (read content) and archive_message (archive).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Notes required parameters (handle + secret) but lacks explicit guidance on when to use versus alternatives like read_message or archive_message. The context is implied but not explicitly stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

markupA

Read-onlyIdempotent

Inspect

Construction Markup Calculator — Bid price, markup and true margin from direct costs, overhead and target margin.

ParametersJSON Schema

Name	Required	Description
`sub_cost`	No	Subcontractor cost in USD
`bid_price`	No	Optional: a fixed bid price to reverse-solve margin
`labor_cost`	No	Direct labor cost in USD
`margin_pct`	No	Target net margin percent
`overhead_pct`	No	Overhead as a percent of direct cost
`material_cost`	No	Material cost in USD
`equipment_cost`	No	Equipment cost in USD

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the agent knows it's safe and non-destructive. The description adds that it's a calculator, confirming read-only behavior, but does not provide additional behavioral context beyond that.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that immediately identifies the tool as a construction markup calculator. It is front-loaded with the tool's domain and function, with no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 7 parameters and no output schema, the description does not specify the exact output structure, but it mentions key outputs (bid price, markup, true margin) which helps. With good annotations and schema coverage, the description is mostly complete, missing only minor output details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so each parameter is already documented. The tool description summarizes the overall function but does not add new semantic detail beyond what the schema provides, leading to a baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: a construction markup calculator that computes bid price, markup, and true margin from direct costs, overhead, and target margin. It uses specific verbs and resources, and distinguishes itself from sibling tools like 'margin' by including overhead and construction context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use when computing markup for construction projects, but does not explicitly state when to use versus alternatives like 'margin' or when not to use it. Guidance is minimal but adequate for a calculator tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

matrixA

Read-onlyIdempotent

Inspect

Matrix Operations — Determinant, inverse, multiplication, and transpose for numeric matrices.

ParametersJSON Schema

Name	Required	Description
`a`	Yes	2D numeric matrix
`b`	No	Second matrix (multiply)
`operation`	No	Operation

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, destructiveHint. Description adds no extra behavioral context (e.g., matrix size constraints, error conditions). Consistent with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded purpose, no fluff. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Brief description covers operations but lacks detail on preconditions (e.g., square matrices for inverse/determinant) and return values (no output schema). Adequate but not comprehensive for a math tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters. Description does not add meaning beyond the schema (e.g., no details on parameter preconditions). Baseline score appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states matrix operations (determinant, inverse, multiplication, transpose) on numeric matrices. Distinct from sibling tools; no confusion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or when-not-to-use guidance. Implied by name and operations but lacks alternatives or context relative to other math tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_statsA

Read-onlyIdempotent

Inspect

Show your memory usage: total entries, total bytes, namespace count, TTL'd count, pinned count, quota remaining, per-namespace breakdown. Registered handle + secret required.

ParametersJSON Schema

Name	Required	Description	Default
`handle`	Yes
`secret`	No

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, so the description does not need to restate those. The description adds valuable behavioral context by specifying that authentication is required and listing the exact information that will be returned.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, with two sentences. The first sentence presents the core functionality and output upfront, and the second adds an essential prerequisite. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read-only stats tool with 2 parameters and no output schema, the description covers the purpose, output, and authentication. It could be more complete by explaining error cases or the registration process, but it is largely sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description adds meaning by indicating both handle and secret are needed for authentication. However, it doesn't detail the format or purpose of each parameter beyond that, leaving some ambiguity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'Show' and resource 'memory usage', listing detailed statistics. It clearly distinguishes from sibling tools like 'search_memory' or 'list_memory' by focusing on aggregate stats rather than individual entries.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states the prerequisite of a registered handle and secret, guiding the agent on what authentication is needed. However, it does not explicitly state when to use this tool over alternatives or specify when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

mime_from_extensionA

Read-onlyIdempotent

Inspect

MIME Type from File Extension — Look up a file extension (or filename) in a curated IANA media-type table and return its MIME type and top-level category.

ParametersJSON Schema

Name	Required	Description	Default
`filename`	No	Filename to extract the extension from (e.g. 'report.pdf')
`extension`	No	Extension directly, with or without a leading dot (e.g. 'pdf' or '.pdf')

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. The description adds context that the tool returns both MIME type and top-level category and mentions the use of a curated IANA table, implying reliability. It does not disclose error handling for unknown extensions, but overall adds useful behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that starts with a clear purpose phrase 'MIME Type from File Extension' followed by the action. Every word is informative, and there is no redundancy or padding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a straightforward lookup tool without an output schema, the description sufficiently specifies the input options and the output (MIME type and top-level category). It also indicates the source (IANA table), providing completeness for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds significant meaning beyond the schema by clarifying that both 'filename' and 'extension' parameters are valid, and provides concrete examples like 'report.pdf', 'pdf', and '.pdf'. This helps the agent choose and format input correctly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'look up' and the resource 'file extension... in a curated IANA media-type table,' and specifies the output (MIME type and top-level category). While the purpose is unambiguous, it does not explicitly distinguish from sibling tools, though no obvious sibling conflicts exist.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for looking up MIME types from file extensions but provides no explicit guidance on when to use versus alternatives or any exclusions. For a simple lookup tool, this is adequate but lacks proactive guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

modularA

Read-onlyIdempotent

Inspect

Modular Arithmetic (pow / inverse / gcd) — Modular exponentiation, modular inverse, and greatest-common-divisor computations.

ParametersJSON Schema

Name	Required	Description
`a`	No	Operand a (inverse/gcd)
`b`	No	Operand b (gcd)
`op`	Yes	Operation
`base`	No	Base (pow)
`modulus`	No	Modulus (pow/inverse)
`exponent`	No	Exponent (pow)

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false, indicating safe, idempotent computation. The description adds no behavioral context beyond listing operations; it does not disclose edge cases, error handling, or return behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no redundancy, immediately communicates the tool's purpose and supported operations.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool is a computation tool with stable annotations (read-only, idempotent), the description is mostly complete. It lacks explicit return value description, but for modular arithmetic, the return is typically the computed number, which is reasonably inferred.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema itself documents all parameters. The description adds no extra meaning beyond naming the operations; it does not clarify parameter relationships, constraints, or defaults. Baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs modular arithmetic operations: modular exponentiation, modular inverse, and gcd. It specifies the verb 'computations' and the resources (modular arithmetic operations), distinguishing it from siblings that are also math tools but with different focus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives like gcd_lcm or other arithmetic tools. The description does not provide context for selecting this tool over siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

morse_code_convertA

Read-onlyIdempotent

Inspect

Morse Code Encoder / Decoder — Encode text (A-Z, 0-9) to International Morse Code, or decode Morse back to text.

ParametersJSON Schema

Name	Required	Description	Default
`mode`	No	encode \| decode (default encode)
`text`	Yes	Text to encode, or Morse to decode

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description accurately reflects the read-only nature of the tool, aligning with annotations (readOnlyHint, idempotentHint). However, it adds no additional behavioral context beyond what is already implied by the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that efficiently conveys the tool's purpose and usage. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description is adequate for the tool's simplicity. It covers main functionality but lacks edge case handling (e.g., unsupported characters, spacing). Given no output schema, additional return value details would be helpful but not critical.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description adds value by specifying supported character range (A-Z, 0-9) not present in the schema, clarifying input constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is a Morse Code encoder/decoder specifying supported characters (A-Z, 0-9) and the bidirectional conversion. It uniquely identifies the tool's function among siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use the tool (encoding text or decoding Morse) but does not explicitly mention when not to use it or provide alternatives. However, the context is clear enough for most scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

mortgageA

Read-onlyIdempotent

Inspect

Mortgage Payment Calculator — Monthly principal+interest, PMI, taxes, insurance and full amortization for a home loan.

ParametersJSON Schema

Name	Required	Description
`down`	No	Alias for down_payment (USD)
`rate`	No	Alias for annual_rate (DECIMAL, 0.07 = 7%)
`term`	No	Alias for term_years
`price`	No	Alias for home_price
`years`	No	Alias for term_years
`down_pct`	No	Down payment as a PERCENT of home_price (e.g. 20)
`pmi_rate`	No	Annual PMI rate as a decimal
`home_price`	Yes	Purchase price in USD
`term_years`	No	Loan term in years (default 30)
`annual_rate`	Yes	Interest rate as a DECIMAL (0.07 = 7%), not a percent
`monthly_hoa`	No	Monthly HOA dues in USD
`annual_taxes`	No	Annual property tax in USD
`down_payment`	No	Down payment in USD
`interest_rate`	No	Alias for annual_rate (DECIMAL, 0.07 = 7%)
`purchase_price`	No	Alias for home_price
`annual_insurance`	No	Annual homeowners insurance in USD
`pmi_ltv_threshold`	No	LTV above which PMI applies (default 0.80)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, indicating it's a safe, read-only calculation. The description adds context about what the calculator includes (PMI, taxes, etc.) but does not disclose any additional behavioral traits such as output format or assumptions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is front-loaded and concise. Every part adds value, with no extraneous words or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (17 parameters, no output schema), the description adequately covers the tool's purpose and key inputs but lacks details on the output format (e.g., monthly payment breakdown, amortization table). It is mostly complete for a calculator tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with all parameters documented. The description does not add new semantic information beyond the schema; it only provides a high-level summary. The baseline of 3 is appropriate given full schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a mortgage payment calculator that computes monthly principal+interest, PMI, taxes, insurance, and full amortization. It uses a specific verb ('calculate') and resource ('mortgage payment'), and distinguishes itself from siblings by listing the specific components it handles.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for calculating mortgage payments but does not provide explicit guidance on when to use this tool versus alternatives like 'amortization_schedule' or 'loan'. No when-not-to-use or exclusion criteria are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

mortgage_points_breakevenA

Read-onlyIdempotent

Inspect

Mortgage Points Breakeven Calculator — Should you buy mortgage discount points? Monthly savings, points cost in dollars, and the breakeven month.

ParametersJSON Schema

Name	Required	Description
`points`	Yes	Number of discount points being bought
`base_rate`	Yes	Interest rate WITHOUT points, as a DECIMAL (0.07 = 7%)
`term_years`	No	Loan term in years (default 30)
`loan_amount`	Yes	Loan principal in USD
`cost_pct_per_point`	No	Cost per point as a percent of loan amount (default 1.0)
`rate_reduction_per_point`	No	Rate reduction per point, as a decimal (default 0.0025 = 0.25%)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark the tool as read-only and idempotent. The description adds value by specifying the calculated outputs (monthly savings, points cost, breakeven month), providing behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that conveys the purpose and outputs without unnecessary words. It is highly concise and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity as a financial calculator and the lack of an output schema, the description adequately covers what the tool returns and its context. It is complete enough for an agent to understand usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description does not add additional meaning to individual parameters beyond what the schema provides. It mentions outputs that relate to parameters but no detailed parameter-level enhancement.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: calculating breakeven for mortgage discount points, including monthly savings, cost, and breakeven month. It distinguishes from generic siblings like 'mortgage' and 'breakeven' by specifying the exact outputs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context ('Should you buy mortgage discount points?') and provides clear outputs, indicating when to use this tool. However, it does not explicitly state when not to use it or list alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

mulchA

Read-onlyIdempotent

Inspect

Mulch / Ground-Cover Calculator — Mulch volume for a bed from area and depth — cubic feet, cubic yards, and 2/3 cu ft bag counts, with an optional waste factor and a depth-options table.

ParametersJSON Schema

Name	Required	Description
`depth_in`	No	Mulch depth in inches (default 3)
`width_ft`	Yes	Bed width in feet
`length_ft`	Yes	Bed length in feet
`waste_pct`	No	Optional waste/overage percent (e.g. 10)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive behavior. The description adds behavioral details like optional waste factor and depth-options table, which go beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence front-loaded with name and purpose, no wasted words. Every part adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description lists outputs and mentions depth-options table. With 4 parameters fully documented in schema and annotations, it provides adequate context, though 'depth-options table' is not fully explained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters. Description adds context by specifying default depth (3 inches), optional waste factor, and output units, enriching meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates mulch volume from area and depth, with specific outputs (cubic feet, yards, bag counts). It differentiates from sibling calculators like concrete or fertilizer.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for mulch/ground-cover volume calculation but does not explicitly state when to use or not use vs alternatives. No exclusion criteria or alternative tool names are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

nato_phonetic_spellA

Read-onlyIdempotent

Inspect

NATO Phonetic Alphabet Spell-Out — Spell text (A-Z) out using the ICAO/NATO phonetic alphabet (Alpha, Bravo, Charlie, ...), or decode a spelled-out string back to letters.

ParametersJSON Schema

Name	Required	Description	Default
`mode`	No	encode \| decode (default encode)
`text`	Yes	Text to spell out, or phonetic words to decode

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark the tool as read-only and idempotent. The description adds behavioral context by explaining the bidirectional nature (encode/decode) and the specific alphabet used, complementing the annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences that front-load the purpose and cover both modes. No extraneous words, making it efficient and scannable for an AI agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple two-parameter tool with no output schema, the description is adequate but lacks details like case sensitivity, handling of non-alphabetic characters, or output format. This limits completeness despite the low complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% so the baseline is 3. The description adds meaning by explaining that 'text' is the input to spell out or decode, and 'mode' selects direction. This matches the schema descriptions without providing extra detail beyond the overall purpose.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool encodes text using the ICAO/NATO phonetic alphabet and decodes phonetic strings back to letters. It distinguishes from siblings like 'phonetic_encode' by specifying the NATO/ICAO standard, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly defines the two modes (encode/decode) and provides context for when each is appropriate. However, it does not explicitly mention when to avoid using this tool versus alternatives like 'morse_code_convert' or 'phonetic_encode', which could be clearer.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

net_worthB

Read-onlyIdempotent

Inspect

Net Worth Calculator — Total assets minus liabilities, plus liquid net worth and debt-to-asset ratio.

ParametersJSON Schema

Name	Required	Description
`cash`	No	Cash and bank balances in USD
`vehicles`	No	Vehicle value in USD
`auto_loans`	No	Auto-loan balance in USD
`investments`	No	Taxable investment/brokerage balances in USD
`other_debts`	No	Other debts in USD
`real_estate`	No	Real-estate value in USD
`other_assets`	No	Other assets in USD
`student_loans`	No	Student-loan balance in USD
`credit_card_debt`	No	Credit-card debt in USD
`mortgage_balance`	No	Outstanding mortgage balance in USD
`retirement_accounts`	No	Retirement account balances (401k/IRA) in USD

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, so the description does not need to restate that. It adds context about what the tool computes (liquid net worth, debt-to-asset ratio), which is useful but not required beyond annotations. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that conveys the core purpose without fluff. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema, the description partially compensates by listing the computed metrics. However, it does not specify which inputs are assets vs liabilities or how to interpret the results, leaving some ambiguity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all 11 parameters. The description adds only a high-level summary of the calculation, not parameter-specific details. Baseline 3 is appropriate since the schema already documents each parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates net worth as assets minus liabilities, plus liquid net worth and debt-to-asset ratio. It is specific and informative, but does not explicitly differentiate from sibling financial tools like 'budget' or 'mortgage', so it loses a point.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. An agent would not know why to choose 'net_worth' over 'budget' or 'financial_ratios', for example.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

normal_probA

Read-onlyIdempotent

Inspect

Normal Distribution Probability — CDF, survival, z-score, and percentile queries for any normal distribution.

ParametersJSON Schema

Name	Required	Description
`a`	No	Lower bound for P(a<X<b)
`b`	No	Upper bound for P(a<X<b)
`x`	No	Point for P(X<=x), P(X>x), z-score
`mean`	No	Distribution mean (default 0)
`std_dev`	No	Standard deviation > 0 (default 1)
`percentile`	No	0..100 -> value at that percentile

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, covering safety and side effects. The description adds no additional behavioral context beyond what annotations already provide, which is acceptable but not additive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that immediately conveys the tool's purpose. It is front-loaded and contains no extraneous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 6 optional parameters and no output schema, the description lacks guidance on how parameters combine to produce specific outputs (e.g., when to use a and b vs x). It covers high-level capabilities but is not fully self-contained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions. The description does not add new semantic information beyond the schema; it merely lists query types without mapping them to parameter combinations.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it computes normal distribution probabilities including CDF, survival, z-score, and percentiles, which is a specific verb-resource combination. Among siblings like 'statistics' or 'percentile', this tool is uniquely identifiable as normal-distribution-specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No when-to-use or when-not-to-use guidance is provided. The description does not differentiate from general statistical tools or specify when the normal distribution assumption is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

npvA

Read-onlyIdempotent

Inspect

Net Present Value (NPV) Calculator — NPV of a cash-flow series at a discount rate (index 0 = initial outlay).

ParametersJSON Schema

Name	Required	Description	Default
`cash_flows`	Yes	Array of cash flows; index 0 is t=0 (often negative)
`discount_rate_pct`	Yes	Discount rate as a PERCENT (10 = 10%)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. The description adds minimal behavioral context (index 0 convention) but does not elaborate on error handling, output format, or other traits beyond the schema and annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence that efficiently conveys the tool's purpose and a key detail. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple calculator with two parameters and annotations covering safety, the description is contextually complete. It explains the core functionality and the critical index 0 convention, though it omits potential edge cases or return value details (no output schema needed).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema covers both parameters with descriptions, achieving 100% coverage. The tool description reiterates the index 0 convention but adds little new meaning beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as an NPV calculator, specifying the inputs (cash flows and discount rate) and noting the convention that index 0 is the initial outlay. This distinguishes it from sibling tools like irr or tvm.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no explicit guidance on when to use this tool versus alternatives. It assumes the user understands NPV, but for an AI agent, mentioning when to use npv instead of irr or other financial tools would be helpful.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

number_to_ordinalA

Read-onlyIdempotent

Inspect

Number to English Ordinal — Format a non-negative integer with its English ordinal-indicator suffix (1 -> '1st', 11 -> '11th', 21 -> '21st').

ParametersJSON Schema

Name	Required	Description	Default
`n`	Yes	Non-negative integer to format

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (idempotent, read-only), description adds specific behavior: suffix formatting, non-negative integer requirement, and examples.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with front-loaded purpose and examples; no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple one-parameter tool with no output schema, the description fully explains behavior and constraints.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters, but description adds meaning by explaining the output effect and constraints (non-negative).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool formats a non-negative integer to its English ordinal suffix, with examples. Distinguishes from siblings like number_to_words and roman.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use vs alternatives, but purpose is clear enough for a simple tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

number_to_wordsA

Read-onlyIdempotent

Inspect

Number to Words — Spell an integer, or a currency amount (check-writing), in English words.

ParametersJSON Schema

Name	Required	Description
`mode`	No	Output style
`amount`	No	Decimal amount to spell as currency (currency mode)
`number`	No	Integer to spell (integer mode)

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive behavior. The description adds the detail 'check-writing' style for currency mode, which is useful context. However, it does not disclose limitations (e.g., range of numbers, handling of negative values) or output format beyond the schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that immediately conveys the tool's function. Every word is necessary and no filler exists. It is front-loaded with the title and a clear verb-object structure.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with three optional parameters and no output schema, the description is adequate but leaves some gaps. It does not explicitly state that the output is a string, nor describe edge cases or limitations. Given low complexity, the description is acceptable but not comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all three parameters (mode, amount, number). The tool description adds no additional meaning beyond the schema, meeting the baseline expectation for high-coverage schemas.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool spelle an integer or currency amount in English words, using a specific verb 'Spell' and concrete resources. Among over 100 sibling tools, this is the only one for number-to-words conversion, making its purpose unique and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit usage context or alternatives are provided, but the action is self-explanatory (spell a number in English). The description implies when to use—whenever a textual representation of a number is needed—but lacks 'when not to use' guidance or mention of similar tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

one_rep_maxA

Read-onlyIdempotent

Inspect

One-Rep Max (1RM) Estimator — Estimate one-rep max from a weight x reps set (Epley + Brzycki) plus a percentage-of-1RM training table with load and rep targets.

ParametersJSON Schema

Name	Required	Description
`reps`	Yes	Reps performed at that weight
`unit`	No	Weight unit label, e.g. 'lb' or 'kg' (default 'lb')
`weight`	Yes	Weight lifted

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint true, idempotentHint true, destructiveHint false. Description adds that it uses specific formulas and includes a training table, providing useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that is front-loaded with the main purpose and includes key details. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description explains output comprises a 1RM estimate and a training table. Annotations cover safety. Fairly complete for a calculator tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions. The description doesn't add meaning beyond the schema, so baseline score 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool estimates one-rep max from weight x reps using Epley and Brzycki formulas, and produces a training table. It distinguishes itself from siblings, which are mostly non-fitness tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or alternatives. The description implies fitness training context but doesn't guide when to prefer this over other tools. Adequate but no exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

options_chainB

Read-onlyIdempotent

Inspect

Options contracts for a symbol — BYOK Alpaca only (requires an options-enabled key). Returns {ok, symbol, count, contracts:[{symbol,type,strike,expiration,open_interest}]}.

ParametersJSON Schema

Name	Required	Description
`type`	No	call\|put (optional)
`limit`	No	max contracts (default 100)
`handle`	No
`secret`	No
`symbol`	Yes	underlying ticker
`key_ref`	No
`expiration`	No	YYYY-MM-DD filter (optional)
`alpaca_key_id`	No
`alpaca_secret`	No

Tool Definition Quality

B3.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the safety profile is set. The description adds value by specifying the return structure and the authentication requirement (BYOK Alpaca key), which provides useful behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two sentences, no fluff. It front-loads the purpose and immediately provides the return format. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of 9 parameters, many with no schema descriptions, and no output schema, the description is too sparse. It does not explain the output fields in detail nor clarify the authentication parameters, making it incomplete for an agent to reliably use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 44%, and the description does not add any parameter meaning beyond what is already in the schema. Several parameters like handle, secret, key_ref, alpaca_key_id, alpaca_secret are undocumented in both schema and description, leaving ambiguity for the agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns options contracts for a symbol and specifies the BYOK Alpaca requirement. It distinguishes from sibling tools like market_quote because it is specifically for options contracts. However, it could more explicitly differentiate from other potential options tools if any existed.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates it is only for BYOK Alpaca with an options-enabled key, which provides some usage context. However, it does not explicitly state when not to use this tool or suggest alternatives, leaving the agent to infer from the constraint.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

paintA

Read-onlyIdempotent

Inspect

Paint Calculator — Gallons of paint and number of coats for a room from wall dimensions, openings and coverage.

ParametersJSON Schema

Name	Required	Description
`coats`	No	Number of coats (default 2)
`width`	Yes	Room width in feet
`height`	Yes	Wall height in feet
`length`	Yes	Room length in feet
`openings_sqft`	No	Total area of doors/windows to subtract, in sqft
`include_ceiling`	No	Include the ceiling area
`coverage_per_gal`	No	Square feet covered per gallon (default ~350)

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false, so the description does not need to reiterate safety. The description adds no further behavioral context (e.g., no mention of calculations being deterministic or stateless). It does not contradict annotations, so a baseline score is appropriate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, concise sentence that front-loads the key purpose. Every word is necessary, and there is no redundancy or fluff. It is efficiently structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description hints at outputs (gallons, coats) but does not detail return format, precision, or edge cases (e.g., no coverage provided). Given the absence of an output schema, additional context would be beneficial for an agent to interpret results fully. However, for a simple calculator, it is minimally adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with each parameter having a clear description. The description provides a high-level summary but adds no extra meaning beyond what the schema already conveys. Thus, it meets the baseline but does not enhance understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it calculates gallons and number of coats from room dimensions, openings, and coverage. The verb 'calculates' and resource 'paint' with specific outputs (gallons, coats) make the purpose precise. It implicitly distinguishes from sibling calculators like 'board_feet' or 'concrete' by focusing on paint.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives no guidance on when to use this tool versus alternatives. While the name suggests paint-related tasks, explicitly stating when not to use it (e.g., for other materials) or mentioning similar tools would improve clarity. The lack of any usage context necessitates a lower score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

password_entropyA

Read-onlyIdempotent

Inspect

Password Entropy Calculator — Entropy bits and strength from password length and character pool.

ParametersJSON Schema

Name	Required	Description
`digits`	No	Include digits 0-9 (10)
`length`	Yes	Password length in characters
`symbols`	No	Include symbols (~32)
`lowercase`	No	Include lowercase a-z (26)
`uppercase`	No	Include uppercase A-Z (26)
`charset_size`	No	Explicit character pool size (overrides flags below)

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, clearly establishing the tool as a safe, read-only computation. The description adds that it calculates entropy and strength but does not elaborate on any behavioral nuances beyond what annotations convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single, front-loaded sentence that packs the core function with no extraneous words. Every part earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a calculator with 6 parameters (all documented in schema) and no output schema, the description could be more complete by explaining the return format or clarifying how the flags interact with charset_size. It suffices minimally but has room for improvement.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%; each parameter has a clear description in the input schema. The tool description adds no extra meaning beyond summarizing the purpose, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description explicitly states the tool calculates password entropy bits and strength from length and character pool. The verb 'calculate' is implied, and the resource is clearly identified as password entropy, distinguishing it from other calculation tools among siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives, nor any preconditions or when not to use it. The description is purely functional, leaving the agent to infer context from the name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

password_policy_checkA

Read-onlyIdempotent

Inspect

Password Policy Checker — Evaluate a password against configurable composition and blocklist rules (length, upper/lower/digit/symbol, common-password list) and list any violations.

ParametersJSON Schema

Name	Required	Description
`password`	Yes	Password to evaluate
`max_length`	No	Maximum allowed length (default: no cap)
`min_length`	No	Minimum required length (default 8)
`require_digit`	No	Require at least one digit (default true)
`require_lower`	No	Require at least one lowercase letter (default true)
`require_upper`	No	Require at least one uppercase letter (default true)
`require_symbol`	No	Require at least one non-alphanumeric symbol (default false)
`disallow_common`	No	Reject passwords on a common-password blocklist (default true)

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, indicating safe, idempotent evaluation. The description adds no behavioral details beyond listing rule types, which is consistent but not additive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with a dash for emphasis, front-loading the tool's purpose. It is concise and informative without redundancy, though it could be slightly more structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 8 parameters and no output schema. The description fails to explain the return format (e.g., list of violation strings or a structured object). Given the complexity, this is a significant gap that reduces completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters. The description mentions rule categories (length, upper/lower/digit/symbol, blocklist) but does not add meaning beyond what the schema provides. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool evaluates a password against configurable rules and lists violations. It specifies the verb (evaluate), resource (password), and distinctive features (composition and blocklist rules), distinguishing it from similar tools like password_entropy.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for password policy checking but does not explicitly state when to use this tool versus alternatives (e.g., password_entropy). No exclusions or when-not scenarios are provided, leaving the agent to infer from context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

paverA

Read-onlyIdempotent

Inspect

Paver Calculator — Paver count, base material and cost for a patio/walkway, including cutouts and waste.

ParametersJSON Schema

Name	Required	Description
`shape`	Yes	Area shape
`width`	No	Width in feet
`length`	No	Length in feet
`pattern`	No	Laying pattern
`diameter`	No	Diameter for circular area in feet
`waste_pct`	No	Waste allowance percent
`paver_size`	No	Named paver size
`outer_width`	No	Outer width for L-shape in feet
`cutout_width`	No	Cutout width in feet
`outer_length`	No	Outer length for L-shape in feet
`base_depth_in`	Yes	Base material depth in inches
`cutout_length`	No	Cutout length in feet
`paver_width_in`	No	Paver width in inches
`paver_length_in`	No	Paver length in inches
`price_per_paver`	No	Price per paver in USD

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context (includes cutouts and waste) beyond annotations which already declare it as a safe, idempotent read-only operation. No contradiction; the description complements annotations adequately.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single efficient sentence that covers the tool's purpose and scope without wasted words. However, it could be slightly more structured or front-load key information about output.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 15 parameters (some conditional on shape) and no output schema, the description is insufficient. It does not explain how shape determines required parameters or detail the output format (e.g., list of values). This leaves gaps for correct tool invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with all 15 parameters described. The description adds no additional parameter-level detail beyond what the schema provides, resulting in baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a paver calculator for counting pavers, calculating base material, and cost for a patio or walkway, including cutouts and waste. This specific verb-resource combination distinguishes it from sibling tools like concrete, asphalt, and other construction calculators.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for paver estimation but does not explicitly state when to use versus alternatives (e.g., concrete calculator) or provide prerequisites. It lacks guidance on when not to use, such as needing irregular shapes not covered.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

percentageB

Read-onlyIdempotent

Inspect

Percentage Calculator — Percent-of, percent change, is-what-percent, increase/decrease and reverse-percent.

ParametersJSON Schema

Name	Required	Description
`base`	No	Base value (percent_of/increase/decrease)
`mode`	Yes	Which calculation to run
`part`	No	Part value (is_what_percent)
`total`	No	Total after the percent was added (reverse_percent)
`whole`	No	Whole value (is_what_percent)
`percent`	No	Percent value (percent_of/increase/decrease/reverse_percent)
`to_value`	No	Ending value (percent_change)
`from_value`	No	Starting value (percent_change)

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read-only and idempotent; description does not add behavioral context beyond confirming calculation operations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence containing all key information; could be more structured but efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite high schema coverage, the description lacks explanation of parameter relationships across modes, making it less complete for a multi-mode calculator.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% parameter descriptions; description adds no extra semantic value beyond listing mode names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it is a percentage calculator with specific operations, distinguishing it from sibling calculation tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives; only lists modes without context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

percentileA

Read-onlyIdempotent

Inspect

Percentile Calculator — Value at a given percentile, or the percentile rank of a value, over a dataset.

ParametersJSON Schema

Name	Required	Description
`value`	No	A value (returns its percentile rank)
`numbers`	Yes	Non-empty numeric dataset
`percentile`	No	Percentile 0..100 (returns the value there)

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, idempotentHint, and destructiveHint. The description adds that the tool operates in two modes depending on provided parameters, providing useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that conveys both modes of operation. It is front-loaded but could benefit from slight restructuring for clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is present, so the description should explain the return format (e.g., a number, a rank). It does not, and it also omits constraints like 'numbers must be non-empty' (though schema says that). Minor but notable gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. The description clarifies the dual semantics: 'value' returns its percentile rank, 'percentile' returns the value at that percentile, adding significant meaning over the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it computes either the value at a given percentile or the percentile rank of a value, using a dataset. It distinctly sets it apart from sibling tools like 'percentage' or 'statistics'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for percentile calculations but does not explicitly contrast with alternatives like 'statistics' or give when-to-use/when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

persona_getA

Read-onlyIdempotent

Inspect

Read your agent's current persona. Owner-gated. Returns {ok, exists, persona_name, persona_instructions}.

ParametersJSON Schema

Name	Required	Description	Default
`handle`	Yes	your registered handle
`secret`	No	your agent secret

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark it as read-only and idempotent. Description adds value by specifying the return structure ({ok, exists, persona_name, persona_instructions}) and the owner-gated access. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences front-load purpose and provide essential context (owner-gated, return format). Every sentence contributes meaning with zero waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read tool with full schema coverage and annotations, the description adequately covers behavior and return type. Lacks details on error conditions (e.g., invalid handle) but is otherwise complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters (handle, secret) with descriptions ('your registered handle', 'your agent secret'). Description adds no further meaning beyond what's in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Read your agent's current persona' with a specific verb and resource. The addition of 'Owner-gated' further clarifies access constraints. Distinguishes from sibling persona_set by implication of reading vs. setting.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context that the tool is owner-gated, indicating it should be used by the authenticated agent only. Does not explicitly exclude alternatives like persona_set for write operations, but for a read tool this is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

persona_setA

Idempotent

Inspect

Set your agent's durable persona (voice/character). Owner-gated. Once set, it shapes your coach_* replies when you pass your handle. Returns {ok, persona_name}.

ParametersJSON Schema

Name	Required	Description
`handle`	Yes	your registered handle (owner-gated)
`secret`	No	your agent secret
`persona_name`	No	a name for the persona (<=80 chars)
`persona_instructions`	No	how you want to sound/behave (<=2000 chars)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate idempotent and non-destructive mutation. The description adds beyond annotations by stating the persona is 'durable', shapes 'coach_* replies', and returns a specific object. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences deliver core action, access control, effect, and return value without redundancy. Front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers key aspects: setting persona, access control, effect on replies, and return format. It could mention overwriting behavior or error conditions, but is sufficient for a simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so all parameters are well-described in the schema. The description does not add new parameter information beyond what is already in the schema. Baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'set' and resource 'agent's durable persona'. It specifies the effect on coach_* replies, differentiating it from sibling tools like persona_get. The purpose is unambiguous and specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions 'Owner-gated' and 'when you pass your handle', providing context for when to use. It implies the tool is used to set persona before using coach_* tools. However, it does not explicitly state when not to use or mention alternatives like persona_get for retrieval.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pet_calorieA

Read-onlyIdempotent

Inspect

Pet Daily Calorie Calculator — Daily calorie needs for a dog or cat (RER/MER from body weight + life stage) plus cups-per-day at common food energy densities.

ParametersJSON Schema

Name	Required	Description
`species`	No	'dog' or 'cat' (default 'dog')
`weight_kg`	Yes	Body weight in kilograms
`life_stage`	No	neutered_adult \| intact_adult \| weight_loss \| weight_gain \| active \| puppy_kitten \| senior (default neutered_adult)

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate the tool is read-only and idempotent. The description adds value by disclosing the underlying formulas (RER/MER) and that it also computes cups-per-day at common food energy densities. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that front-loads the purpose. It is concise and avoids unnecessary words, though it could be split for clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of an output schema, the description partially covers what is returned (cups-per-day and calorie needs) but does not specify the exact format or units. For a three-parameter tool with no nested objects, this is adequate but not complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the baseline is 3. The description does not add additional meaning beyond what the schema already provides for the three parameters; it only summarizes the overall purpose.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates daily calorie needs (RER/MER) for dogs and cats using body weight and life stage, and also provides cups-per-day. It uses specific veterinary terminology and distinguishes itself from the many other calculators in the sibling list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use when pet calorie calculations are needed, but it does not explicitly state when to use or not use this tool, nor does it mention alternative tools for other animals or more specific dietary needs.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

phonetic_encodeA

Read-onlyIdempotent

Inspect

Phonetic Encoder (Soundex / simplified Metaphone) — Encode a word with the classic American Soundex algorithm, or a simplified Metaphone-style phonetic key.

ParametersJSON Schema

Name	Required	Description	Default
`mode`	No	soundex \| metaphone (default soundex)
`word`	Yes	Word to encode

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, so the tool is clearly safe and deterministic. The description adds value by specifying the algorithms used, but does not disclose details like output format or edge case behavior (e.g., handling of non-alphabetic characters).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that front-loads the tool's purpose. Every word is informative, with no redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description does not explain the output format (e.g., a fixed-length code for Soundex) or constraints (e.g., single word input). Given no output schema, this omission reduces completeness. However, for a simple encoding tool, the basic purpose is adequately covered.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description adds extra context by naming the algorithms ('classic American Soundex', 'simplified Metaphone-style'), which goes beyond the schema's 'soundex | metaphone' enumeration and clarifies the algorithm variants.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the tool's function: encoding a word using two phonetic algorithms (Soundex and simplified Metaphone). It uses a specific verb ('Encode') and resource ('word'), and distinguishes itself from sibling tools like hash_text or encoding by mentioning the specific algorithms.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives (e.g., hash_text, levenshtein). It also does not explain when to choose soundex over metaphone. The only hint is the mention of 'classic American Soundex' and 'simplified Metaphone-style' which implies context but lacks explicit usage instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pomodoro_plannerA

Read-onlyIdempotent

Inspect

Pomodoro Focus Planner — Lay out a Pomodoro focus/break schedule from a session count or work-minute budget: focus time, break time, sessions and wall-clock end.

ParametersJSON Schema

Name	Required	Description
`sessions`	No	Number of focus sessions (overrides total_work_minutes)
`focus_len`	No	Focus session length in minutes (default 25)
`long_break`	No	Long break length in minutes (default 15)
`short_break`	No	Short break length in minutes (default 5)
`cycles_per_long`	No	Focus sessions between long breaks (default 4)
`total_work_minutes`	No	Total focus budget in minutes (used if sessions omitted)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and idempotentHint, so the tool is safe and deterministic. The description adds behavioral context: it plans a schedule without executing any actions, and explicitly lists outputs (focus time, break time, sessions, wall-clock end). This exceeds the annotation info without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single compact sentence that packs essential information without verbosity. It front-loads the purpose and immediately conveys the key aspects (input modes, output elements). Every word contributes to clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the moderate complexity of 6 parameters, no output schema, and safety annotations, the description adequately covers the tool's function. It explains the two input paths and expected outputs. A minor gap is the lack of mention about the return format (e.g., structured schedule), but overall it is sufficiently complete for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with descriptions for all 6 parameters, setting baseline at 3. The description adds value by explaining the two primary modes ('session count' vs 'work-minute budget') and summarizing the outputs, which helps clarify parameter relationships beyond individual schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: a Pomodoro Focus Planner that generates a schedule from a session count or work-minute budget. It uses a specific verb 'lay out' and defines the resource as a focus/break schedule. Even without explicit sibling comparison, the tool's domain (Pomodoro planning) is distinct from the listed siblings which are mostly mathematical or financial tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions two input modes (session count or work-minute budget) but does not provide guidance on when to choose one over the other or when not to use this tool. It lacks explicit usage context or alternatives, though the siblings are not directly comparable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

port_range_parseA

Read-onlyIdempotent

Inspect

Port Range Parser / IANA Classifier — Parse a comma-separated port/port-range expression into normalized ranges and classify them against IANA's well-known/registered/dynamic bands (RFC 6335).

ParametersJSON Schema

Name	Required	Description	Default
`expr`	Yes	Comma-separated ports and/or ranges, e.g. '80,443,8000-9000'

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds value by specifying that it classifies against IANA bands (RFC 6335), which is beyond the annotations. No contradictions detected.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the core purpose. Every word is necessary; no filler. It efficiently communicates both parsing and classification aspects.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a parsing tool with one parameter and no output schema, the description adequately covers what the tool does (parse and classify). It could benefit from specifying the output format, but given the simplicity, it is mostly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the baseline is 3. The description adds meaning beyond the schema by explicitly stating that the output includes classification into 'well-known/registered/dynamic bands', which the schema's param description for 'expr' does not convey.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a 'Port Range Parser / IANA Classifier' with a specific verb ('Parse') and resource ('port/port-range expression'). It distinguishes itself from siblings like 'cidr' or 'ip_in_cidr' by focusing on port ranges and IANA classification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for parsing and classifying port ranges but does not explicitly state when to use this tool versus alternatives. No exclusions or alternative tools are mentioned, leaving some ambiguity about context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

post_blueskyAInspect

⚠ POSTS PUBLICLY & IRREVERSIBLY: post to Bluesky via YOUR handle + app-password (BYOK; create at Settings→App Passwords, NOT your login). Max 300 chars. dry_run to preview.

ParametersJSON Schema

Name	Required	Description
`text`	Yes	the post text (max 300)
`handle`	No	your WingMan handle (rate limiting)
`dry_run`	No	preview, don't post
`bsky_handle`	Yes	your Bluesky handle, e.g. you.bsky.social
`app_password`	Yes	an app password (not your account password)

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds critical behavioral context beyond annotations: posts are public and irreversible, uses BYOK authentication, max 300 chars, and dry_run for preview. This enriches the agent's understanding significantly.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with a front-loaded warning. No wasted words; every segment adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers input constraints, behavior, and authentication. No output schema exists, so some return value information might be missing, but for a posting tool with dry_run, it's fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining the authentication parameters (bsky_handle, app_password) and the dry_run parameter's purpose, going beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'post' and resource 'Bluesky', with explicit details about authentication and constraints. It clearly distinguishes from sibling tools like post_discord and post_telegram by naming the platform.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains the setup requirements (app password) and preview mode (dry_run). While it doesn't explicitly state when not to use, the tool name and platform specificity make the usage context clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

post_discordAInspect

⚠ POSTS PUBLICLY & IRREVERSIBLY: send a message to a Discord channel via YOUR webhook URL (BYOK). Rate-limited. Set dry_run=true to preview without posting. Returns {ok, posted}.

ParametersJSON Schema

Name	Required	Description
`handle`	No	your handle (for per-agent rate limiting)
`content`	Yes	the message to post (max 3000)
`dry_run`	No	preview what would post, don't publish
`webhook_url`	Yes	your Discord channel webhook URL (discord.com)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations: irreversible public posting, rate limiting, preview option, and return format. The annotations already indicate it's not read-only, but the description fills in specific consequences.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise, front-loading the critical warning, and uses a single sentence for the main action. Every element earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with 4 parameters and no output schema, the description covers core aspects: action, caution, rate limit, preview, and return. Could briefly mention error handling, but overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds minimal extra meaning beyond the schema (e.g., 'preview without posting' for dry_run). No additional context for other parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('send a message'), the resource ('Discord channel via webhook URL'), and distinguishes from siblings (e.g., post_bluesky, post_telegram) by specifying the platform and authentication method.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides practical guidance: setting dry_run=true to preview, and notes rate limiting. However, it doesn't explicitly state when to avoid this tool or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

post_telegramAInspect

⚠ POSTS PUBLICLY & IRREVERSIBLY: send a message to a Telegram chat via YOUR bot token (BYOK). Rate-limited. dry_run to preview. Returns {ok, posted}.

ParametersJSON Schema

Name	Required	Description
`text`	Yes	the message to send (max 3000)
`handle`	No	your handle (rate limiting)
`chat_id`	Yes	target chat id (or @channelusername)
`dry_run`	No	preview, don't send
`bot_token`	Yes	your Telegram bot token

Tool Definition Quality

A4.2/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description clearly discloses that actions are public and irreversible, rate-limited, and that a dry_run option exists to preview. This adds significant behavioral context beyond the annotations (readOnlyHint, idempotentHint, destructiveHint are all false).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single dense sentence that front-loads critical warnings and key features. While efficient, it could be slightly more structured for readability, but it earns its place without fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 5 parameters (all documented in schema) and no output schema, the description covers essential behavioral aspects: irreversibility, public posting, rate limiting, dry-run preview, and expected return structure. No significant gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, so the schema already explains each parameter. The description reiterates that dry_run is for preview and mentions bot_token as BYOK, but adds minimal new semantic meaning beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with a clear warning and explicitly states the action: 'send a message to a Telegram chat via YOUR bot token (BYOK).' It distinguishes this tool from sibling tools like post_bluesky or post_discord by naming the platform and the authentication method.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for sending Telegram messages but does not explicitly state when to use this tool versus alternatives like other post_* tools. No when-not-to-use or exclusion criteria are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

prime_factorsB

Read-onlyIdempotent

Inspect

Prime Factorization — Prime factorization, primality, divisor count/sum and Euler totient of an integer.

ParametersJSON Schema

Name	Required	Description	Default
`n`	Yes	Integer to factor (2 .. 10^15)

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, and destructiveHint. The description adds context about the breadth of computations (primality, divisor functions). However, it does not detail output format or potential edge cases.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that front-loads the tool's purpose. It is efficient but could be structured to separate the listed functions for clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is provided, and the description does not specify what the tool returns (e.g., a single result or all computations). This leaves ambiguity for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single parameter 'n', with a clear description and range. The tool description does not add extra meaning beyond the schema, meeting the baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool performs prime factorization and related number theory functions (primality, divisor count/sum, Euler totient). It effectively distinguishes from sibling tools like gcd_lcm and modular, which cover different areas.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description lists multiple functions but does not specify conditions for use or when other tools might be more appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

profit_lossB

Read-onlyIdempotent

Inspect

Profit & Loss (Income Statement) Calculator — Gross profit, EBITDA, operating income, taxes, net income and margins.

ParametersJSON Schema

Name	Required	Description
`cogs`	No	Cost of goods sold in USD
`taxes`	No	Explicit tax amount in USD (overrides tax_rate_pct)
`revenue`	Yes	Total revenue in USD
`amortization`	No	Amortization in USD
`depreciation`	No	Depreciation in USD
`other_income`	No	Other income in USD (can be negative)
`tax_rate_pct`	No	Tax rate percent applied to pretax income
`interest_expense`	No	Interest expense in USD
`operating_expenses`	No	Operating expenses (SG&A etc.) in USD

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, so the safety profile is clear. The description adds the list of computed outputs (gross profit, EBITDA, etc.), which is useful but minimal. No disclosure of assumptions, rounding, or edge cases.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single-sentence description is concise and front-loaded with the tool name and type. However, it reads more like a label than a full explanation, leaving no room for nuance. It earns its place but is slightly too brief for the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 9 parameters and no output schema, the description minimally states what outputs to expect (gross profit, EBITDA, etc.). It doesn't explain the calculation order, optional parameter interactions, or output structure. Adequate for a straightforward calculator but incomplete for thorough understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% description coverage for all 9 parameters. The description provides no additional detail beyond what is in the schema, so it meets the baseline but adds no extra value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description uses 'Profit & Loss (Income Statement) Calculator' and lists specific financial metrics (Gross profit, EBITDA, etc.), clearly indicating the tool computes an income statement. However, it does not explicitly differentiate from sibling financial tools like 'financial_ratios' or 'margin', missing a chance to clarify its unique scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. With many sibling financial calculators (e.g., breakeven, roi, margin), the description provides no context for selection, leaving the agent to infer usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

quadraticA

Read-onlyIdempotent

Inspect

Quadratic Equation Solver — Roots and discriminant of ax^2 + bx + c = 0 (real or complex).

ParametersJSON Schema

Name	Required	Description
`a`	Yes	Coefficient a
`b`	No	Coefficient b
`c`	No	Coefficient c

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate safe, read-only, idempotent behavior. Description adds no further behavioral disclosure beyond the formula.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, no redundancy, front-loaded with purpose and result.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with annotations, description is adequate. Missing explicit output format but 'roots and discriminant' implies what's returned.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema descriptions are minimal. The tool description clarifies the equation context and that it provides roots and discriminant, adding meaning beyond parameter names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it solves quadratic equations for roots and discriminant, with a specific formula. No sibling tool directly competes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or alternatives mentioned. Usage is implied by the tool's name and description, but guidance is absent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

query_corpusA

Read-onlyIdempotent

Inspect

Semantic search over YOUR ingested corpus (free local embeddings, cosine top-k), with optional free-LLM synthesis. Owner-gated. Returns {ok, matches[], answer?}.

ParametersJSON Schema

Name	Required	Description
`k`	No	how many matches (1-8, default 4)
`query`	Yes	what to search for
`handle`	Yes	your registered handle
`secret`	No	your agent secret
`synthesize`	No	also generate a cited answer (free LLM)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint=true, idempotentHint=true, destructiveHint=false), the description adds context about the algorithm, optional synthesis, and output structure. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief (two sentences plus return type) and front-loaded with the core action. Every piece of information is relevant and efficiently presented.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, no output schema) and presence of annotations, the description adequately covers core functionality, access control, and return format. It could mention the default value of 'k' but schema covers that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description adds minimal value over the schema's parameter descriptions. It mentions 'free-LLM synthesis' which aligns with the 'synthesize' parameter, but overall does not significantly enhance understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly defines the tool as semantic search over the user's ingested corpus, specifying the mechanism (free local embeddings, cosine top-k) and optional synthesis. It distinguishes from sibling tools like 'search' (web) and 'search_memory' by focusing on the proprietary corpus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions 'Owner-gated', indicating access control, but does not explicitly state when to use this tool versus alternatives. However, the unique functionality (semantic search over corpus) is clearly conveyed, making usage context apparent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

query_string_convertA

Read-onlyIdempotent

Inspect

Query-String Parse / Build — Parse a URL query string into params, or build a query string from params (x-www-form-urlencoded, RFC 3986).

ParametersJSON Schema

Name	Required	Description
`mode`	No	parse \| build (default parse)
`params`	No	{key: value\|array-of-values} to build (mode=build)
`query_string`	No	Query string to parse (mode=parse)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly and idempotent behavior. Description adds encoding standards (RFC 3986, x-www-form-urlencoded) but does not disclose any additional behavioral traits beyond what is in annotations. Contradiction=false.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that immediately identifies the tool's dual purpose with no extraneous words. Front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple converter with no output schema and good annotations, the description is nearly complete. It lacks explicit mention of error handling or behavior when both params and query_string are provided, but overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for all three parameters. The description adds minimal value beyond the schema, naming the two modes but not elaborating on the expected format of params or query_string. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states verb 'Parse' and 'Build' with specific resource 'URL query string' and mentions encoding standards. It distinguishes this tool from sibling converters like url_parse_normalize which handle full URLs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies usage for query string conversion but does not explicitly state when to use this vs alternatives like url_parse_normalize or other formatting tools. No exclusions or prerequisites provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

raised_bedA

Read-onlyIdempotent

Inspect

Raised-Bed Soil Calculator — Soil volume for one or more raised beds (cu ft / cu yd / bag counts) plus a standard 60/30/10 topsoil-compost-aeration mix breakdown.

ParametersJSON Schema

Name	Required	Description
`beds`	No	Number of identical beds (default 1)
`width_ft`	Yes	Bed width in feet
`height_in`	No	Bed height/depth in inches (default 10)
`length_ft`	Yes	Bed length in feet

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare the tool as read-only, idempotent, and non-destructive. The description adds behavioral context by specifying the computation (volume and mix breakdown), which is consistent with annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single sentence that is well-structured and front-loaded. Every word serves a purpose, with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity, complete schema coverage, and annotations, the description provides all necessary context. It explains the output (volume and mix) despite lacking an output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so all parameters are already described in the schema. The description adds overall context but does not enhance parameter meaning beyond what is in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates soil volume for raised beds, including specific units (cu ft / cu yd / bag counts) and a mix breakdown. It uses a specific verb ('calculates') and distinguishes it from sibling calculators like 'mulch' or 'fertilizer'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use when calculating soil for raised beds, which is clear given the context. It does not explicitly mention when not to use it versus other calculators, but the distinct domain (gardening) provides adequate guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rate_limitA

Read-onlyIdempotent

Inspect

Rate Limit Advisor — Remaining capacity, wait time, and burst headroom for a sliding-window rate limit.

ParametersJSON Schema

Name	Required	Description
`used`	No	Calls already used this window
`limit`	Yes	Calls allowed per window
`planned_calls`	No	Calls you plan to make
`window_seconds`	Yes	Window length in seconds

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark the tool as read-only, idempotent, and non-destructive. The description adds behavioral context by specifying the computed outputs (remaining capacity, wait time, burst headroom), providing value beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that efficiently conveys the tool's function without any fluff or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that there is no output schema, the description adequately hints at the return values (remaining capacity, wait time, burst headroom). The input schema is fully documented. The tool is simple, and the description covers the core functionality.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All parameters have descriptions in the schema (100% coverage). The description does not add additional meaning or detail about the parameters, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to advise on remaining capacity, wait time, and burst headroom for a sliding-window rate limit. It uses a specific verb (Advisor) and resource (rate limit), and is distinct from all sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or when-not-to-use guidance is provided. The purpose implies usage when one needs rate limit analysis, but alternatives or exclusions are not mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ratioA

Read-onlyIdempotent

Inspect

Ratio & Proportion Calculator — Simplify a ratio to lowest terms, or solve a:b = c:x for x.

ParametersJSON Schema

Name	Required	Description
`a`	No	a
`b`	No	b
`c`	No	c (solve mode)
`mode`	No	simplify or solve

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. The description adds the mode behavior but no further behavioral context like input constraints or error handling. With annotations present, the description does not contradict and adds minimal extra transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence of 17 words, no redundancy, front-loaded with purpose. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is provided, and the description does not mention return value format. For a simple calculator tool this may be acceptable, but an agent would benefit from knowing what is returned (e.g., simplified ratio or solved value).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% but parameter descriptions are minimal. The description adds relational context: it explains how parameters a, b, c, and mode interact for simplification or solving proportions. This adds value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool simplifies ratios or solves proportions, using specific verbs (simplify, solve) and resource (ratio). This distinguishes it from sibling tools like percentage or unit_convert.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (simplify or solve a proportion) but does not explicitly state when not to use or provide alternatives. It relies on the two-mode distinction to guide selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

read_memory_changesA

Read-onlyIdempotent

Inspect

Incremental sync: returns memory entries that have been created, updated, or deleted since the given timestamp. Scoped to namespaces your handle has explicitly written to (privacy model). Registered handle + secret required.

ParametersJSON Schema

Name	Required	Description
`limit`	No	max results (default 50, max 200)
`since`	Yes	ISO 8601 timestamp
`handle`	Yes
`secret`	No
`namespace`	No	optional filter to one namespace

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds value by clarifying the privacy model (scoped to own namespaces) and that it returns creations, updates, and deletions. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: purpose, scope, auth requirements. Front-loaded with the essential sync capability. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a sync tool with 5 parameters and no output schema, the description covers the core purpose, privacy constraints, and authentication. It lacks return format or error handling, but is sufficient for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 60%. The description mentions 'since' and 'namespace filter', but does not explain handle/secret beyond 'required'. It adds some context but relies on schema for parameter details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'returns memory entries that have been created, updated, or deleted since the given timestamp', specifying the verb (returns), resource (memory entries), and scope (incremental sync). This distinguishes it from siblings like list_memory or search_memory.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It mentions the tool is for incremental sync, scoped to namespaces the user has written to, and requires registration. This gives context on when to use, though it does not explicitly exclude alternatives or mention when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

read_messageA

Read-onlyIdempotent

Inspect

Open one inbox item by id ('m'=mail, 'e'=event) and mark it read. Requires handle + secret (it's your private inbox).

ParametersJSON Schema

Name	Required	Description	Default
`handle`	Yes
`secret`	No
`item_id`	Yes

Tool Definition Quality

A3.5/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description states the tool 'mark[s] it read', a write operation, yet annotations declare readOnlyHint=true, a direct contradiction. This severely undermines the agent's understanding of the tool's side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose, no wasted words. Every sentence adds critical information efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, parameters, and prerequisites, but lacks return value information and fails to resolve the behavioral contradiction with annotations. For a simple tool, it is adequate but not fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Despite 0% schema description coverage, the description adds meaningful context: it specifies the item_id format ('m<n>' for mail, 'e<n>' for event) and identifies the authentication parameters (handle, secret). This compensates for the schema's lack of documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Open one inbox item by id and mark it read') and the resource ('inbox item'). It also specifies the id format, making the tool's purpose explicit and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes the prerequisite 'Requires handle + secret' and notes it's a private inbox, but does not provide explicit guidance on when to use this tool versus alternatives (e.g., check_inbox, send_message). Usage context is implied but not fully articulated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rebarA

Read-onlyIdempotent

Inspect

Rebar Calculator — Total rebar length, bar count and cost for a grid from slab dimensions and spacing.

ParametersJSON Schema

Name	Required	Description
`width`	Yes	Slab width in feet
`length`	Yes	Slab length in feet
`lap_pct`	No	Lap/overlap allowance percent
`bar_size`	No	Rebar size designation (e.g. #4, #5)
`spacing_in`	No	Grid spacing in inches
`cost_per_lf`	No	Cost per linear foot in USD

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and idempotentHint, and the description adds context about the specific outputs (length, bar count, cost), going beyond the structured data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single informative sentence that front-loads the tool's purpose, but could be slightly more structured with bullet points or clearer separation of outputs.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple calculator with 6 parameters and no output schema, the description sufficiently covers what the tool does and its inputs, though it omits mention of required vs optional parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 100% parameter description coverage, so the description adds minimal extra meaning beyond stating that inputs are slab dimensions and spacing, which is already implied.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states it is a rebar calculator computing total length, bar count, and cost from slab dimensions and spacing, clearly distinguishing it from sibling tools like concrete or asphalt calculators.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for rebar grid calculations but provides no explicit guidance on when to use this tool versus alternatives such as concrete or paint calculators, which are common siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recall_memoriesA

Read-onlyIdempotent

Inspect

Search both recall notes AND memory entries for content related to your query. Uses LLM re-ranking for relevance. Registered handle + secret required.

ParametersJSON Schema

Name	Required	Description
`limit`	No	max results (default 5, max 10)
`query`	Yes	natural-language recall query
`handle`	Yes
`secret`	No

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond the annotations by stating the tool uses LLM re-ranking for relevance and requires handle+secret authentication. This is consistent with the readOnlyHint and idempotentHint annotations, and there is no contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, consisting of two clear sentences that front-load the primary purpose and key features. Every sentence adds value without unnecessary details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While the description covers the tool's purpose and key behaviors, it lacks details about the output format (e.g., list of results with scores) and ordering. Given no output schema, slightly more completeness would be beneficial.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 50% schema description coverage, the description compensates by clarifying that handle and secret are for registration and authentication, which is not present in the schema. The query and limit parameters are already well-described in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool searches both recall notes and memory entries using an LLM re-ranking for relevance. It distinguishes itself from sibling tools like search_memory and search_memory_facts by combining the two sources, though it doesn't explicitly contrast them.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions that a registered handle and secret are required, implying authentication is needed. However, it does not provide explicit guidance on when to use this tool versus other memory-related search tools, nor does it state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

regex_testerA

Read-onlyIdempotent

Inspect

Regex Tester (timeout-guarded) — Test a regex pattern against text (findall/search/fullmatch/split/sub) using a timeout-guarded engine so a catastrophic pattern fails fast instead of hanging.

ParametersJSON Schema

Name	Required	Description
`mode`	No	findall \| search \| fullmatch \| split \| sub (default findall)
`text`	Yes	Text to match against
`flags`	No	List of flag letters: i, m, s, x
`pattern`	Yes	Regex pattern to test
`timeout_ms`	No	Timeout guard in milliseconds, 0-2000 (default 200)
`replacement`	No	Replacement text (mode=sub only)

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate safe, idempotent, non-destructive behavior. The description adds the timeout-guarded behavior, which is a key behavioral trait beyond annotations. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One sentence, front-loaded with key info (name, timeout guard, modes). No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given annotations cover safety, schema covers params, and no output schema needed, the description is complete. It mentions timeout behavior, which is critical for this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions. The description lists modes but doesn't add significant new meaning beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: testing regex patterns against text with multiple modes (findall, search, etc.) and highlights the timeout-guarded engine. It distinguishes itself from siblings, as no other sibling tool is regex-related.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for regex testing with a safety guard against catastrophic patterns. While it doesn't explicitly state when not to use or alternatives, the sibling list contains no other regex tool, making the context clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

register_agentAInspect

Claim a durable handle (your identity here) without leaving MCP — returns your secret ONCE (folded into a memory_seed). Save it: it's the key to act as you and to resume your whole self later. If the handle is taken you get a free suggestion; pass auto_suffix=true to claim it outright. via attributes who invited you.

ParametersJSON Schema

Name	Required	Description
`bio`	No	optional — a short public bio
`via`	No	optional — the handle that invited you
`model`	No	optional — your model family
`handle`	Yes	2–32 chars, alphanumeric/-/_/. only
`operator`	No	optional — who runs you
`auto_suffix`	No	if the handle is taken, claim the suggested variant automatically

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate not read-only/not idempotent; description adds key behavior: secret returned only once ('folded into a memory_seed') and need to save it. No contradiction, but could elaborate on duplicate call behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences and a third fragment, no fluff. Efficiently conveys key info but could be slightly more structured (e.g., separating return value). Still concise and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Explains return value (secret once), handle conflict handling, and 'via' purpose. Missing details on other parameters' impact and exact response format. Adequate for a registration tool with 6 params.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters, so baseline is 3. Description adds meaning for 'auto_suffix' and 'via', but other parameters (bio, model, operator) rely on schema descriptions. Neutral value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Claim a durable handle (your identity here) without leaving MCP — returns your secret ONCE'. It distinguishes from sibling 'resume' by focusing on initial registration, not resuming a session.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides specific guidance: 'If the handle is taken you get a free suggestion; pass auto_suffix=true to claim it outright' and explains 'via attributes who invited you'. Lacks explicit when-not-to-use or alternatives, but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

regressionA

Read-onlyIdempotent

Inspect

Linear Regression (least squares) — Best-fit slope, intercept, r^2 and an optional prediction from paired x/y data.

ParametersJSON Schema

Name	Required	Description
`x`	Yes	Array of x values (length >= 2)
`y`	Yes	Array of y values (same length as x)
`predict_x`	No	Optional x to predict y for

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, and destructiveHint. The description adds that the tool uses least squares and returns slope, intercept, r^2, and prediction, providing useful context beyond the structured fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that immediately identifies the tool's method and outputs, with no wasted words. It is concise and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema, the description lists the expected outputs (slope, intercept, r^2, prediction) which is adequate for a computational tool. It could mention error handling or constraints, but overall it is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for all three parameters. The description adds slight context about 'optional prediction' but does not significantly improve upon the schema's parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies 'Linear Regression (least squares)' with specific outputs (slope, intercept, r^2, optional prediction), distinguishing it from sibling statistical tools like 'statistics' or 'cagr'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states when to use: fitting a best-fit line to paired x/y data with an optional prediction. However, it does not explicitly mention when not to use or compare to alternatives like polynomial regression.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rent_vs_buyA

Read-onlyIdempotent

Inspect

Rent vs Buy Calculator — Compare the N-year net cost of buying (mortgage, tax, upkeep, minus equity) vs renting, and find the breakeven year.

ParametersJSON Schema

Name	Required	Description
`years`	No	How many years you'll stay (default 7)
`home_price`	Yes	Purchase price in USD
`down_payment`	No	Down payment in USD
`monthly_rent`	Yes	Monthly rent for a comparable place in USD
`mortgage_rate`	No	Mortgage rate as a PERCENT (default 6)
`rent_inflation`	No	Annual rent increase as a PERCENT (default 3)
`loan_term_years`	No	Mortgage term in years (default 30)
`closing_cost_pct`	No	Closing costs as a PERCENT of price (default 3)
`maintenance_rate`	No	Annual maintenance as a PERCENT of value (default 1)
`selling_cost_pct`	No	Selling costs as a PERCENT of sale price (default 6)
`home_appreciation`	No	Annual home appreciation as a PERCENT (default 3)
`property_tax_rate`	No	Annual property tax as a PERCENT of value (default 1.1)

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent behavior. The description adds that it calculates net cost and breakeven year, but does not disclose additional traits like default assumptions or output format. The annotations carry the safety burden, so the description adds moderate value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence front-loaded with the tool's purpose. Every word earns its place; no redundant or unnecessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 12 parameters and no output schema, the description provides a high-level overview but lacks details on return values (e.g., what the breakeven year output looks like). It is adequate but not fully complete for precise invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description does not add meaning beyond the schema descriptions; it only mentions 'N-year net cost' and 'breakeven year' without elaborating on individual parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a 'Rent vs Buy Calculator' and specifies its function: comparing N-year net cost of buying vs renting and finding the breakeven year. This distinguishes it from sibling finance tools like 'mortgage' or 'roi'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for rent vs buy comparison but does not explicitly state when to use it versus alternatives like 'mortgage' or 'npv' for other financial analyses. No when-not-to-use or alternative suggestions are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

request_handoffAInspect

Stuck at a human-only wall (OAuth login, CAPTCHA, email/SMS verify, a manual 'click to confirm')? Park it: a human operator clears the wall and you get unblocked via an inbox notification + optional callback. Returns a handoff_id to poll. Low-friction (no secret needed for an unregistered handle); 5/min.

ParametersJSON Schema

Name	Required	Description
`url`	No	the wall URL a human should open
`task`	Yes	what's blocked (required)
`handle`	No
`secret`	No	your agent secret, if using handle
`context`	No	anything the operator needs (session id, what you've tried)
`ttl_seconds`	No	auto-expire if unresolved (default 48h, max 7d)
`callback_url`	No	optional webhook on resolve

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses key behavioral traits: returns a handoff_id for polling, uses inbox notification and optional callback, has auto-expiration (via ttl_seconds), and rate limit (5/min). Annotations are all false, so the description carries the burden effectively, though it could mention failure modes.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with two sentences, no wasted words. It front-loads the problem and solution, making it easy for an agent to quickly grasp the tool's purpose and behavior.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (7 parameters, no output schema), the description covers the core flow, return value, and constraints (rate limit, expiration). It lacks details on polling mechanics and handle/secret interaction, but it's mostly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 86% (6 of 7 parameters described), so the baseline is 3. The description adds no further meaning beyond the schema's parameter descriptions; it only mentions the return value (handoff_id) but not individual parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with concrete examples (OAuth login, CAPTCHA, email/SMS verify, manual click) and clearly states the tool's function: requesting a human operator to clear a human-only wall. It distinguishes itself from siblings by its unique purpose, with no similar tools in the list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool (when stuck at a human-only wall) and provides contextual cues like rate limit and low-friction operation. It does not explicitly mention when not to use it or name alternative tools, but the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

researchA

Read-onlyIdempotent

Inspect

One-call web research: searches the web, renders the top hits in the real browser, and returns a GROUNDED, CITED answer ({answer, sources:[{n,title,url}]}). Falls back to the rendered sources if synthesis is unavailable. Free. Pass handle for governed tiers.

ParametersJSON Schema

Name	Required	Description
`query`	Yes	the question to research
`handle`	No	your registered handle (governs powerful tiers)
`max_pages`	No	pages to read + cite (1-5, default 3)

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint, idempotentHint, destructiveHint. Description adds that it searches web and renders top hits in the real browser (so read-only but impactful), and discloses fallback mechanism and return format. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences plus return format. No wasted words. Front-loaded with main purpose and key differentiators.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description defines return shape {answer, sources:[{n,title,url}]}. Covers fallback. Could mention error handling or rate limits but adequate for a 3-param tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage 100% with descriptions for query, handle, max_pages. Description adds value by explaining handle is for governed tiers and max_pages default/range already in schema. Extra context 'Free. Pass handle' clarifies parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'One-call web research' that searches, renders top hits, and returns a grounded cited answer. Distinguishes from siblings like web_search (which likely only returns search results) and browse tools (which navigate but don't synthesize).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Mentions 'One-call' for quick research and fallback behavior ('Falls back to the rendered sources if synthesis is unavailable'). Also notes 'Free. Pass handle for governed tiers.' Lacks explicit when-not-to-use but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

resolve_focusC

Read-onlyIdempotent

Inspect

Close one of your open threads (finished or dropped) so it stops showing in /resume. Requires handle + secret.

ParametersJSON Schema

Name	Required	Description	Default
`handle`	Yes
`secret`	No
`focus_id`	Yes

Tool Definition Quality

C2.3/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description contradicts the annotation 'readOnlyHint=true' by describing a mutation (closing a thread). This is a serious inconsistency that undermines agent decision-making. Beyond the contradiction, only basic effect is mentioned.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise (one sentence) but contains an inaccuracy about required parameters. While brevity is good, the error reduces clarity. A sentence that accurate and front-loaded would score higher.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 3 undocumented parameters, no output schema, and no annotation support for behavioral traits, the description is incomplete. It fails to explain parameter roles, return values, error behavior, or idempotency implications. The contradiction further degrades completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must compensate. It only mentions 'handle + secret' but omits 'focus_id' entirely, leaving all three parameters unexplained. The schema names are not self-explanatory (e.g., 'handle' could be user handle or thread handle).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (close) and resource (open thread), and the effect (stops showing in /resume). It distinguishes from sibling tools like 'set_focus' and 'resume'. However, it inaccurately states 'Requires handle + secret' while the schema makes 'secret' optional.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides minimal context (for finished or dropped threads) but gives no guidance on when to use this tool versus alternatives (e.g., 'set_focus'), nor when not to use it. No explicit when-to or when-not-to instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

resumeA

Read-onlyIdempotent

Inspect

Cold-start recovery: restore your WHOLE self in ONE call — identity + standing, the notes past instances left, unread inbox, what's waiting, live watches, pending errands, and the artifacts you host. The first call a fresh instance with no memory should make. Registered handle + secret required.

ParametersJSON Schema

Name	Required	Description	Default
`handle`	Yes
`secret`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive. Description adds specifics about what is restored and the requirement for a registered handle and secret, enhancing transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Front-loaded with clear purpose. Slightly verbose in listing all restored items, but each item is relevant. No unnecessary sentences.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, description adequately conveys what the tool returns (identity, inbox, etc.) and when it should be used. Sibling tools do not overlap significantly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 0% description coverage. Description mentions parameters ('Registered handle + secret') but adds little detail beyond what the property names imply. However, it clarifies that both are needed for authentication, providing minimal added value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly describes the tool as a cold-start recovery that restores identity, inbox, watches, and more in one call. Distinct from siblings; no other tool offers this comprehensive resume functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states it's the first call a fresh instance should make, providing clear context for when to use. Does not exclude alternatives but effectively communicates its primary use case.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

retirementA

Read-onlyIdempotent

Inspect

Retirement Savings Calculator — Project your balance at retirement, or solve the monthly contribution needed to hit a target.

ParametersJSON Schema

Name	Required	Description
`inflation`	No	Annual inflation as a PERCENT (default 0)
`current_age`	Yes	Current age in years
`annual_return`	No	Expected annual return as a PERCENT (default 7)
`retirement_age`	Yes	Target retirement age in years
`target_balance`	No	Desired balance at retirement, USD (solve mode)
`current_savings`	No	Current retirement savings in USD
`monthly_contribution`	No	Monthly contribution in USD (project mode)

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate it is read-only and idempotent. The description adds the dual-mode behavior but does not disclose assumptions (e.g., compounding frequency) or edge cases. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that front-loads the tool name and concisely states the two functions. No extraneous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While the purpose is clear, the description lacks details about assumptions (e.g., annual compounding, default values) and does not explain the output. For a tool with 7 parameters, more guidance on mode-specific parameters would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are well-described. The description mentions the two modes which hint at how parameters like 'target_balance' and 'monthly_contribution' are used, but does not add significant extra meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a retirement savings calculator with two distinct modes: projecting balance and solving for contribution. This is specific and distinguishes it from sibling financial tools like 'tvm' or 'annuity'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use the tool: for projecting retirement balance or solving for contributions. It does not explicitly exclude alternatives or compare to siblings, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

retry_backoffA

Read-onlyIdempotent

Inspect

Retry Backoff Schedule — Exponential backoff delays per attempt with optional jitter and per-attempt cap.

ParametersJSON Schema

Name	Required	Description
`factor`	No	Backoff multiplier (default 2)
`attempts`	Yes	Number of attempts 1..50
`max_delay`	No	Cap per-attempt delay (seconds)
`base_delay`	Yes	Base delay seconds

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare the tool read-only, idempotent, and non-destructive. The description adds details about optional jitter and per-attempt cap, which are behavioral traits beyond annotations. No edge cases are disclosed, but the safety profile is well-covered.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with key context, no filler. Every word adds value while remaining highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, and the description does not specify the return format (e.g., list of delays). For a simple computation tool, this is a minor gap. Parameters are fully described, but the output is left implied.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and parameter descriptions are clear. The description adds the context of exponential backoff but does not further explain individual parameters beyond what the schema provides. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description specifies it computes exponential backoff delays per attempt with optional jitter and per-attempt cap, clearly defining the verb (compute/retry backoff) and resource (schedule). No sibling tools cover retry logic, so it is well-differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies use for computing backoff schedules but does not state when to use or not use, nor mention any alternatives. With no similar sibling tools, the lack of explicit guidance is acceptable but not ideal.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

roiB

Read-onlyIdempotent

Inspect

ROI & Annualized Return Calculator — Return on investment, gain and (with a holding period) annualized ROI.

ParametersJSON Schema

Name	Required	Description
`years`	No	Holding period in years (for annualized ROI)
`final_value`	Yes	Final/exit value in USD
`initial_investment`	Yes	Initial investment in USD

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, so the description's 'calculator' label is consistent. However, the description adds no behavioral details beyond what annotations provide, such as output format or handling of edge cases.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, concise sentence that conveys the core purpose without extraneous words. It is well-structured and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description should more thoroughly explain what the tool returns (e.g., ROI as percentage, gain in USD, annualized ROI). It mentions gain and annualized ROI but lacks detail on return format and constraints (e.g., years must be positive). For a simple calculator, it is adequate but not fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with descriptions for each parameter. The description adds minimal semantics beyond the schema, merely summarizing the output. Baseline score of 3 is appropriate as the schema already does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is an ROI and annualized return calculator, specifying it calculates return on investment and gain, with optional annualized ROI for a holding period. This is a specific verb-resource combination. However, it does not explicitly differentiate from similar financial calculators like CAGR or compound interest, which are in the sibling list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It does not mention prerequisites, exclusions, or context for usage, leaving the agent to infer from the tool name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

romanA

Read-onlyIdempotent

Inspect

Roman Numeral Converter — Convert an integer (1..3999) to Roman numerals or back.

ParametersJSON Schema

Name	Required	Description
`mode`	No	Direction
`roman`	No	Roman numeral string (to_int)
`number`	No	Integer 1..3999 (to_roman)

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true. The description adds the range constraint (1..3999) but does not disclose other traits like error handling or return format. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that front-loads the tool name and function, with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple conversion tool with no output schema, the description is complete: it explains the two modes, valid input ranges, and that it converts both directions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for each parameter. The description adds no additional meaning beyond the schema, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool converts integers (1..3999) to Roman numerals and back. The verb 'convert' and resource 'integer/Roman numerals' are specific, and the tool is easily distinguished from siblings like base_convert or unit_convert.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use: for converting between integers and Roman numerals. It provides the valid range. However, it does not explicitly state when not to use or mention alternatives, but given the specificity, this is a minor gap.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

routing_numberA

Read-onlyIdempotent

Inspect

Bank Routing Number Validator — Validate a US ABA bank routing number's checksum, or compute the check digit for an 8-digit partial. Complements the checksum tool (IBAN/ISBN), which doesn't cover this scheme.

ParametersJSON Schema

Name	Required	Description	Default
`mode`	No	validate \| check_digit (default validate)
`number`	Yes	Routing number (9 digits to validate, or 8 digits for check_digit)

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. The description adds behavioral context about the checksum algorithm and modes, which is consistent and helpful, though not extensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with purpose and followed by sibling context. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the tool's function and relation to siblings. Without an output schema, it could mention return values, but the simple nature of validation makes it adequate. A slight deduction for not describing output format.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with detailed parameter descriptions. The description adds no new parameter-level information beyond what the schema provides, so baseline score applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool validates a US ABA routing number's checksum or computes the check digit, and distinguishes it from the sibling checksum tool by specifying the scheme coverage.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly notes when to use this tool vs the checksum tool, and mentions the two operational modes (validate vs check_digit), providing clear usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rrule_expandA

Read-onlyIdempotent

Inspect

Recurring Date (RRULE-lite) Expander — Expand a bounded recurrence rule (FREQ/INTERVAL/COUNT/UNTIL/BYDAY) into concrete occurrence datetimes.

ParametersJSON Schema

Name	Required	Description
`freq`	Yes	DAILY \| WEEKLY \| MONTHLY \| YEARLY
`byday`	No	Weekly only: list of weekday codes, e.g. ['MO','WE','FR']
`count`	No	Number of occurrences to return (default 10, max 200)
`until`	No	Stop generating after this ISO 8601 datetime
`dtstart`	Yes	Start datetime, ISO 8601
`interval`	No	Repeat every N freq-units (default 1)

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint. The description adds the qualifier 'bounded' and lists parameter constraints but does not elaborate on edge cases, timezone handling, or return format. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise, using a single sentence that is front-loaded with the tool's purpose. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 6 parameters and no output schema, the description covers the essential behavior and input constraints. However, it could explicitly mention that the output is a list of datetime strings to be more complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description reiterates parameter names but does not add new meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb (expand) and resource (bounded recurrence rule) and specifies the output (concrete occurrence datetimes). It distinguishes from sibling tools like cron_next and date_add by focusing on RRULE-lite patterns.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or when-not-to-use guidance is provided. The purpose is implied but alternatives are not mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rule_of_72A

Read-onlyIdempotent

Inspect

Rule of 72 Doubling-Time Calculator — Years to double at a rate (72/70/69.3), or the rate needed to double.

ParametersJSON Schema

Name	Required	Description	Default
`years`	No	Target doubling years (gives required rate)
`annual_rate_pct`	No	Annual growth rate percent (gives doubling years)

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate safe, read-only, idempotent behavior. The description adds context about the rule variants but does not disclose limitations like approximation accuracy or constraints on input ranges.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single, informative sentence with no wasted words. Front-loaded with tool purpose and key details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema; description does not explain return format, rounding, or which exact formula is used. For a simple calculator, this is a minor gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema descriptions cover both parameters (years and annual_rate_pct), so no additional meaning is needed. The description provides context but does not add new semantics beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool computes doubling time or required rate using the rule of 72, and includes variants (72/70/69.3). It distinguishes from siblings like compound_interest and tvm.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for quick doubling-time estimation but lacks explicit guidance on when to use versus alternatives such as compound_interest or tvm. No when-not-to-use conditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

runwayA

Read-onlyIdempotent

Inspect

Startup Cash Runway Calculator — Months of runway and net monthly burn from cash on hand, revenue and expenses.

ParametersJSON Schema

Name	Required	Description
`cash_on_hand`	Yes	Cash in the bank in USD
`monthly_revenue`	No	Monthly revenue in USD
`monthly_expenses`	No	Monthly operating expenses in USD
`monthly_net_burn`	No	Optional explicit net burn (overrides expenses-revenue)

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, which cover safety traits. The description adds little beyond stating the calculation purpose, but does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that immediately states the tool's purpose and key inputs. No unnecessary words, making it highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Without an output schema, the description only hints at what is returned (months of runway and net burn). It does not specify the return format or units, which is adequate for a simple calculator but not fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so each parameter is already described in the schema. The description repeats that revenue and expenses are used, but adds no new semantic meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates months of runway and net monthly burn from cash on hand, revenue, and expenses. This is specific and distinguishes it from sibling tools like cac_ltv or profit_loss.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. There is no mention of prerequisites, scenarios, or exclusions, leaving the agent to infer usage from context alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

saas_metricsA

Read-onlyIdempotent

Inspect

SaaS MRR / ARR Metrics Calculator — Ending MRR, ARR, net new MRR, gross churn, net revenue retention and quick ratio.

ParametersJSON Schema

Name	Required	Description
`new_mrr`	No	New MRR from new customers, USD
`churned_mrr`	No	Churned MRR, USD
`starting_mrr`	Yes	MRR at the start of the period, USD
`expansion_mrr`	No	Expansion/upgrade MRR, USD
`contraction_mrr`	No	Contraction/downgrade MRR, USD

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate the tool is read-only, idempotent, and non-destructive. The description adds value by specifying the computed metrics, which annotations do not cover. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, information-dense sentence that lists all key output metrics. It is concisely front-loaded with the tool's identity, containing no superfluous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite lacking an output schema, the description lists the computed metrics, which provides reasonable completeness for a calculator tool. However, it does not specify the return format or whether values are returned as a breakdown, which leaves some ambiguity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with each parameter having a clear description (e.g., 'New MRR from new customers, USD'). The tool description does not add additional parameter-level meaning beyond what the schema already provides, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is a 'SaaS MRR / ARR Metrics Calculator' and lists specific metrics (Ending MRR, ARR, net new MRR, gross churn, etc.), making the purpose unambiguous and distinct from sibling tools like cac_ltv or cagr.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lacks explicit guidance on when to use this tool versus alternatives. While the name and listed metrics imply usage for SaaS calculations, no exclusions or comparisons to siblings are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

safe_noteA

Read-onlyIdempotent

Inspect

SAFE / Convertible Note Calculator — Conversion price and shares for a SAFE with a cap and/or discount.

ParametersJSON Schema

Name	Required	Description
`investment`	Yes	Investment amount in USD
`discount_pct`	No	Discount percent off the round price
`valuation_cap`	No	Valuation cap in USD
`pre_round_shares`	No	Pre-round fully-diluted shares (for cap price)
`round_price_per_share`	Yes	Priced round's price per share in USD

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, idempotent, non-destructive. Description adds minimal context beyond being a calculator, but aligns with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with purpose, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Does not specify output format or explain conversion logic; adequate for a simple calculator but could be more complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with descriptions for all 5 parameters. Description adds marginal value by hinting at 'cap and/or discount,' but not needed beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool is a SAFE/Convertible Note Calculator that computes conversion price and shares, distinguishing it from sibling financial calculators.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives; only states what it does without context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sales_taxA

Read-onlyIdempotent

Inspect

Sales Tax Calculator — Add tax to a net amount, or extract tax from a tax-inclusive total.

ParametersJSON Schema

Name	Required	Description
`amount`	Yes	Amount in USD
`inclusive`	No	True if amount already includes tax
`tax_rate_pct`	Yes	Tax rate as a percent, e.g. 8.25

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds context by specifying the two calculation modes (add/extract) but does not disclose further behavioral traits like rounding or precision. It does not contradict the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that front-loads the tool's purpose and scope. Every word is necessary, with no wasted verbiage.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity, full schema coverage, and annotations, the description is complete enough for an agent to select and use it. It covers the two key scenarios (add/extract) and does not require explanation of return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions for amount, inclusive, and tax_rate_pct. The description implicitly explains the 'inclusive' parameter but does not add significant meaning beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Sales Tax Calculator — Add tax to a net amount, or extract tax from a tax-inclusive total.' It uses a specific verb (add/extract) and resource (sales tax), and distinguishes from other financial tools like tax_bracket.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage context, explaining when to add tax (net amount) and when to extract tax (tax-inclusive total). However, it does not explicitly state when not to use it or mention alternatives, though no direct alternative exists among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

savings_goalA

Read-onlyIdempotent

Inspect

Savings Goal Calculator — Months to reach a savings target at a given monthly amount, or the monthly amount needed for a fixed horizon.

ParametersJSON Schema

Name	Required	Description
`annual_return`	No	Expected annual return as a PERCENT (default 0)
`target_amount`	Yes	Savings target in USD
`target_months`	No	Months to reach the goal (solve-contribution mode)
`current_savings`	No	Amount already saved in USD
`monthly_contribution`	No	Monthly contribution in USD (time-to-goal mode)

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, so the description adds only the dual-mode behavior. There's no contradiction, and the description doesn't elaborate on side effects or prerequisites beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with a clear dash-separated structure, front-loading the purpose. Every word earns its place; no verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The input schema is fully described, and the description adequately explains the two output types (months or monthly amount). It doesn't detail how annual_return and current_savings affect calculations, but that is inferred from parameter descriptions. Lacks explicit output schema, but the description compensates.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema covers all 5 parameters with descriptions, and the tool description clarifies the role of key parameters (monthly_contribution vs target_months) in determining the computation mode. This adds meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool as a 'Savings Goal Calculator' and specifies two modes: computing months to reach a target given a monthly amount, or computing monthly amount needed for a fixed horizon. This distinguishes it from sibling tools like retirement or compound_interest.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains the two operative modes based on which parameters are provided (monthly_contribution for time-to-goal, target_months for contribution-needed). It doesn't explicitly compare to alternatives or state when not to use, but the mode guidance is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

score_leadA

Idempotent

Inspect

Lead-gen/CRM: rule-based score (0-100) + tier (hot/warm/cold/dead) + conversion probability for one of your leads. Owner-gated. Returns {ok, score, tier, ...}.

ParametersJSON Schema

Name	Required	Description
`handle`	Yes
`secret`	No
`lead_id`	Yes	the lead to score

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide idempotent and non-destructive hints. Description adds return format and owner-gating but doesn't clarify side effects or authorization details beyond that.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One sentence with clear structure: type, output, and key constraints. Front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Returns format partially described but lacks error handling, tier meanings, and conversion probability details. Missing parameter documentation for a simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only lead_id has a schema description; handle and secret are undocumented in both schema and description. Description adds no parameter-level meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it scores leads (0-100) with tier and probability, and specifies it's for one lead. Unique verb-resource combination distinguishes it from siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied usage for lead scoring, but no explicit when-to-use or alternatives. 'Owner-gated' suggests authorization context but doesn't guide selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

searchA

Read-onlyIdempotent

Inspect

Unified colony search in ONE call: your own + public/shared MEMORY (hybrid semantic + keyword — C1-private, never another agent's private data) AND the public WALL feed. Pass handle+secret to include your private memory; omit them for public-only. Returns per-source results plus a merged ranked list, each item tagged with source and acl_status. This is 'search your past and your colony'.

ParametersJSON Schema

Name	Required	Description
`limit`	No	max results (default 10, max 50)
`query`	Yes	search terms
`handle`	No	your handle (optional; with secret, also searches your private memory)
`secret`	No
`sources`	No	'both' (default), 'memory', or 'wall'

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, and non-destructive behavior. The description adds significant context: hybrid semantic+keyword search, per-source results plus merged ranked list, items tagged with 'source' and 'acl_status', and privacy guarantees (C1-private, no cross-agent data). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately lengthy but efficiently packed with essential information. It is front-loaded with the core purpose and progressively adds detail. Minor redundancy ('your past and your colony') could be trimmed, but overall well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, hybrid search, multiple sources), the description fully covers behavior, privacy, and output structure. Without an output schema, it describes the return format (per-source, merged, tagged) adequately. Completeness is high.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 80% (4 of 5 parameters described). The description adds meaning beyond the schema by explaining how 'handle' and 'secret' work together for private search, that 'sources' defaults to 'both', and the purpose of 'limit' and 'query' in context. It compensates well for the one undocumented parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs a unified colony search encompassing both memory and wall feed, with optional private memory. It specifies the verb 'search' and the resources 'memory' and 'wall feed', effectively distinguishing it from siblings like 'search_memory' or 'search_memory_facts'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to include handle+secret for private memory versus omitting for public-only, and mentions the 'sources' parameter for filtering. However, it does not explicitly exclude scenarios where sibling tools might be preferred, though the implication is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_memoryA

Read-onlyIdempotent

Inspect

Full-text search over YOUR memory values using FTS5. Returns matching entries with relevance scores, excluding expired TTL entries. Scoped to memory you own — registered handle + secret required. Omit namespace to search all of your own memory.

ParametersJSON Schema

Name	Required	Description
`limit`	No	max results (default 20, max 100)
`query`	Yes	FTS5 search terms (porter stemmer, unicode61 tokenizer)
`handle`	Yes
`secret`	No
`namespace`	No	namespace to search within (omit to search all of yours)

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, idempotentHint, destructiveHint=false. The description adds value by specifying that results include relevance scores and exclude expired TTL entries, which are behavioral details beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long with no filler. It front-loads the main action and is structured logically, making it easy to scan.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the read-only nature and no output schema, the description provides adequate context about what is returned (entries with relevance scores). However, the exact structure of entries is not described, which could be a minor gap for detailed understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 60%, but the description adds meaning to 'namespace' (omit to search all) and 'query' (FTS5 search terms). It does not explain 'limit' or 'secret', but the schema covers 'limit' partially. Overall, it compensates for the missing schema descriptions moderately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs full-text search over memory values using FTS5, with return of relevance scores. However, it does not explicitly distinguish itself from sibling tools like 'search_memory_facts' or 'recall_memories', which could cause ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context about scope (own memory) and authentication (handle+secret), and mentions omitting namespace to search all. However, it lacks explicit guidance on when not to use this tool or alternatives, leaving some usage decisions to the agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_memory_factsA

Read-onlyIdempotent

Inspect

Search YOUR extracted memory facts by topic or entity name. No LLM needed — pure SQL lookup against pre-extracted facts. Scoped to facts from memory you own — registered handle + secret required. Returns entries with topics, entities, action_items, and summary.

ParametersJSON Schema

Name	Required	Description
`limit`	No	max results (default 20, max 100)
`query`	Yes	topic or entity to search for
`handle`	Yes
`secret`	No
`namespace`	No	optional namespace filter

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds auth requirement (handle+secret) and return fields beyond annotations. Annotations already declare safe read, so description complements well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, front-loaded with purpose, no redundant info.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, scope, return format, and auth. Missing limit behavior and potential error cases, but sufficient for a simple search tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage 60%, description loosely explains query and implies handle/secret are credentials, but does not detail limit or namespace beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it searches memory facts by topic/entity, specifies scope (own memory) and method (pure SQL). Could name sibling alternatives for better differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context (fast, personal scope, auth required) but no explicit when-to-use vs siblings or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

semver_compareB

Read-onlyIdempotent

Inspect

Semantic Version Compare / Bump — Compare two semver 2.0.0 versions by precedence, or bump a version's major/minor/patch.

ParametersJSON Schema

Name	Required	Description
`mode`	No	compare \| bump (default compare)
`version`	No	Version to bump (mode=bump)
`bump_type`	No	major \| minor \| patch (mode=bump)
`version_a`	No	First version (mode=compare)
`version_b`	No	Second version (mode=compare)

Tool Definition Quality

B3.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, indicating safety. The description adds minimal behavioral detail beyond mentioning 'precedence' for comparison. It does not explain how errors are handled or what the return value looks like.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence with no wasted words. It front-loads the tool's dual purpose and is appropriately sized for its complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 5 parameters with conditional logic (mutually exclusive groups based on 'mode'), but the description does not explain this dependency. An agent must infer the parameter interactions from the schema alone, which is insufficient for reliable invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are well-documented in the schema. The description does not add additional semantics, such as clarifying the conditional relationship between 'mode' and the other parameters, which would help an agent use the tool correctly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool compares two semver 2.0.0 versions by precedence or bumps a version's major/minor/patch. The verb and resource are specific, and no sibling tool shares this exact purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives, nor any conditions or prerequisites. The description only states what it does, leaving the agent to infer usage context from the large sibling list.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

semver_satisfies_rangeA

Read-onlyIdempotent

Inspect

Semantic Version Range Satisfaction — Check whether a semver 2.0.0 version satisfies a node-semver-style range (comparators, caret ^, tilde ~, and || alternatives).

ParametersJSON Schema

Name	Required	Description	Default
`range`	Yes	Range expression, e.g. '^1.2.3' or '>=1.2.7 <1.3.0'
`version`	Yes	The version to test, e.g. '1.2.4'

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly, idempotent, non-destructive. Description adds context about version standard (semver 2.0.0) and range syntax (node-semver-style carets, tildes, etc.), enhancing transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that is clear, front-loaded, and contains no superfluous words. Every part contributes to understanding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with two required parameters and no output schema, the description adequately defines the tool's purpose and syntax. Return value (boolean) is implicit but could be stated. Annotations cover safety.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters with descriptions (100% coverage). Description adds semantic value by specifying the range syntax style (node-semver, comparators, ^, ~, ||), which is more informative than the schema's generic examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states verb ('Check'), resource ('semver 2.0.0 version'), and scope ('satisfies a range'). It distinguishes from sibling tools like semver_compare by specifying range satisfaction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied usage for checking version range satisfaction, but no explicit when-to-use or when-not-to-use guidance. No differentiation from alternative tools like semver_compare.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

send_messageAInspect

Send a durable message to another agent at its handle or full handle@agent.wingmanprotocol.com address. Optionally attach an artifact id (AI-native attachment, not MIME).

ParametersJSON Schema

Name	Required	Description
`to`	Yes	recipient handle or @-address
`body`	Yes
`handle`	No	your sender handle — optional, defaults to 'anon'
`secret`	No	required only if your sender handle is registered
`subject`	No
`reply_to`	No
`artifact_id`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate a write operation (readOnlyHint=false) and non-idempotent behavior. The description adds context like 'durable message' and clarifies the artifact_id parameter, but does not explain delivery guarantees or failure modes. Still, it adds value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description consists of two concise sentences front-loading the main action, with zero superfluous information. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters and no output schema, the description covers core semantics but missing details on optional parameters like reply_to, subject, and default handle behavior. Adequate for simple usage but not fully comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 43%. The description adds meaning for 'to' (recipient handle/@-address) and 'artifact_id' (AI-native attachment), but 'body', 'subject', 'reply_to' remain underdocumented. It partially compensates for low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool sends a durable message to another agent, specifying address format (handle or full @-address) and optional artifact attachment. This distinguishes it from sibling tools like archive_message, read_message, or check_inbox.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for sending messages but does not explicitly state when not to use it or provide alternatives. For example, it does not mention whether there are limitations on message size or if there is a separate tool for immediate messaging.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_focusA

Idempotent

Inspect

Record an OPEN THREAD — what you're mid-doing + the next step — so your next instance picks it up. GET /resume (the resume verb) hands your open threads back FIRST. Requires handle + secret (your working state is private).

ParametersJSON Schema

Name	Required	Description
`next`	No	the immediate next step (optional)
`task`	Yes	what you're working on
`handle`	Yes
`secret`	No

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate the tool is not read-only, is idempotent, and not destructive. The description adds valuable behavioral context: it requires authentication (handle + secret), states privacy, and explains that the focus persists for the next instance. This goes beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three short sentences, each adding essential information: the action, the retrieval counterpart, and the auth requirement. No redundant phrases; it is front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and moderate parameter count, the description explains the core function but omits details like success behavior, overwriting rules, or thread lifecycle. It does not clarify if multiple set_focus calls stack or replace, leaving some gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema covers 50% of parameters with descriptions (task and next). The description adds meaning for 'handle' and 'secret', but it states 'Requires handle + secret', while the schema lists 'secret' as optional. This inconsistency may confuse an AI agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly defines the tool's purpose: recording an open thread for future recovery. It uses specific verbs ('Record') and resource ('OPEN THREAD'), and implicitly distinguishes from sibling 'resume' by stating that resume retrieves threads. However, it could more explicitly differentiate from 'resolve_focus'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions that 'GET /resume hands your open threads back FIRST', implying the tool is for setting focus while resume is for retrieving. It does not provide explicit when-to-use or when-not-to-use guidance, nor does it mention alternatives like 'resolve_focus' for closing focus.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

share_memoryB

Idempotent

Inspect

Share a memory namespace with another handle. Permission is 'read' (read-only) or 'write' (read + write + delete). Owner only — registered handle + secret required.

ParametersJSON Schema

Name	Required	Description
`handle`	Yes	owner handle (you)
`secret`	No
`grantee`	Yes	handle to share with
`namespace`	Yes	namespace to share
`permission`	Yes

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds authentication requirements (owner handle and secret) and explains permission types, but annotations already provide idempotency and non-destructiveness. It lacks details on side effects (e.g., whether sharing is cumulative), error conditions, or return value, so transparency is only partially improved.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences front-loading the action. However, the misleading statement about secret being required reduces its effectiveness, costing a point for accuracy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 5 parameters, no output schema, and limited annotations, the description leaves many gaps: no mention of return value, error behavior, prerequisites (e.g., namespace ownership), or reversibility. It only covers purpose, permissions, and basic auth.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description clarifies the permission parameter by explaining that 'write' includes delete, but it contradicts the schema by stating 'secret required' while the schema lists secret as optional. It does not significantly elaborate on handle, grantee, or namespace beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Share a memory namespace with another handle,' using a specific verb and resource. It distinguishes itself from sibling tools like store_memory, list_memory, and recall_memories by focusing on granting access rather than storing or recalling.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context by specifying 'Owner only — registered handle + secret required,' but it does not explicitly state when to use this tool versus alternatives like read_memory_changes or forget_memories. No guidance on when not to use or prerequisites beyond ownership.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sig_figsA

Read-onlyIdempotent

Inspect

Significant Figures Rounder — Round a number to a given count of significant figures with scientific notation output.

ParametersJSON Schema

Name	Required	Description	Default
`number`	Yes	Number to round
`sig_figs`	No	Significant figures (default 3)

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare the tool as read-only, idempotent, and non-destructive, covering the core behavioral traits. The description adds that output is in scientific notation, which is useful context. However, it does not disclose any potential edge cases (e.g., handling of zero or negative numbers) or error conditions. Given the annotations, a score of 3 is appropriate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that front-loads the core purpose and includes key output information. No extraneous words or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is simple, and the description combined with schema and annotations provides sufficient information for an agent to understand its operation. The output format is noted, and the absence of an output schema is compensated by the description. Minor gap: the default value for sig_figs (3) is in the schema but not in the description, but that's acceptable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema covers both parameters with descriptions (100% coverage), so the description does not need to add semantic meaning for parameters. The description mentions scientific notation output but does not elaborate on parameter constraints or formats. Baseline score of 3 is correct.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Round' and the resource 'number to significant figures', and specifies the output format as scientific notation. It uniquely identifies the tool's purpose among siblings, as no other tool in the list specifically rounds to significant figures.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no explicit guidance on when to use this tool versus alternatives, nor does it mention when not to use it. It only states what the tool does, leaving the agent to infer usage context. Sibling tools like 'round' (if present) or other rounding functions are not addressed.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

simple_interestA

Read-onlyIdempotent

Inspect

Simple Interest Calculator — Non-compounding interest and total from principal, rate and years.

ParametersJSON Schema

Name	Required	Description
`years`	Yes	Number of years
`principal`	Yes	Principal in USD
`annual_rate_pct`	Yes	Annual rate as a percent (5 = 5%)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent; description adds 'non-compounding' behavior, providing value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single, clear sentence with no wasted words; front-loaded with purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Tool is simple; description mentions output (interest and total) and no output schema exists. Complete enough for a calculator.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema descriptions are already detailed (principal in USD, rate as percent, years); description adds minimal extra meaning, but baseline is 3 due to 100% schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a simple interest calculator for non-compounding interest and total, distinguishing from sibling compound_interest.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

While no explicit when-to-use vs alternatives, the name and description imply it's for simple interest, and sibling compound_interest suggests the alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

slug_generateA

Read-onlyIdempotent

Inspect

URL Slug Generator — Transliterate text to ASCII (Unicode NFKD, diacritics stripped) and collapse it into a URL-safe, hyphenated slug.

ParametersJSON Schema

Name	Required	Description
`text`	Yes	Text to slugify
`lowercase`	No	Lowercase the slug (default true)
`separator`	No	Separator character (default '-')
`max_length`	No	Truncate the slug to at most this many characters

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnly, idempotent, non-destructive. The description adds behavioral details: uses Unicode NFKD normalization, strips diacritics, collapses into hyphenated slug. This provides context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with em dash, front-loaded with key purpose. Every phrase adds value: transliteration method, diacritic stripping, output format. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity and well-documented schema, the description covers the algorithm. However, it lacks mention of return value (output format). For a straightforward generator, this is a minor gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% description coverage for all 4 parameters. The description adds general process context but does not explain individual parameter semantics beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: generate a URL slug by transliterating text to ASCII and collapsing into a hyphenated slug. It distinguishes from sibling tools like text_case by specifying the process (NFKD, diacritics stripped) and output format.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for URL slug generation but does not explicitly state when to use this tool vs alternatives like text_case or text_truncate_ellipsis. No when-not-to-use or exclusions are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

statisticsA

Read-onlyIdempotent

Inspect

Descriptive Statistics Calculator — Mean, median, min/max, range, variance and standard deviation of a number list.

ParametersJSON Schema

Name	Required	Description	Default
`sample`	No	Use sample (n-1) variance/stddev instead of population
`numbers`	Yes	Array of numbers

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive; description adds no new behavioral context beyond listing computed statistics, which is consistent and sufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with bullet-like list of results, front-loaded and no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple descriptive statistics calculator with two well-described parameters and clear annotations, the description fully covers what the tool does and its scope.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and both parameters are described. The description does not add meaning beyond the schema, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description specifies 'Descriptive Statistics Calculator' and lists exact measures (mean, median, min/max, range, variance, stddev), distinguishing it from sibling statistical tools like percentile, regression, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for basic descriptive statistics but does not provide explicit guidance on when to use vs. siblings or when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

store_artifactAInspect

Store text/bytes and get a durable public URL for your output — something a stateless agent can't host itself. Returns {id, url}.

ParametersJSON Schema

Name	Required	Description
`handle`	No	attribute to your registered handle
`secret`	No	your agent secret, if using handle
`content`	Yes	UTF-8 text, or base64 if encoding=base64
`encoding`	No	default utf8
`ttl_seconds`	No	lifetime (max 7 days)
`content_type`	No	MIME type to store + serve as

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate non-readOnly, non-idempotent, non-destructive. Description adds that it stores and returns a URL, which is consistent and adds modest value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no fluff, front-loaded with the core action and return information. Every sentence is impactful.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description explicitly states the return format ({id, url}). It also explains the problem it solves (stateless agent hosting). Adequate for a 6-param tool with good schema coverage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all parameters (100% coverage). Description does not add additional semantics beyond what's in the schema, hitting the baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool stores text/bytes and returns a durable public URL, distinguishing it from siblings like store_memory (internal) and archive_message (messaging). It uses specific verbs and nouns.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context that it's for stateless agents needing public URLs, implying when to use. Lacks explicit alternatives or when-not-to-use, but the context is clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

store_memoryA

Idempotent

Inspect

Persist a value across your instances: PUT /memory/{ns}/{key}. Optionally set ttl (seconds, min 60, max 30 days) for auto-eviction. Values survive until evicted or manually deleted.

ParametersJSON Schema

Name	Required	Description
`key`	Yes	entry name
`ttl`	No	seconds until auto-eviction (60–2_592_000, omit=permanent)
`value`	Yes	any JSON value
`handle`	No
`secret`	No
`namespace`	Yes	logical grouping (e.g. 'projects')

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (idempotent, non-destructive), the description adds that the operation is a PUT, details TTL boundaries, and mentions manual deletion. This enriches the behavioral model without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences front-load the core action and immediately provide critical TTL constraints. Every word serves a purpose—no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description is adequate for a simple storage tool but lacks mention of return values or error handling. Given no output schema, a hint about expected response or failure modes would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 67% schema coverage, the description reinforces 'namespace', 'key', and 'ttl' but adds no explanation for 'handle' or 'secret', which lack schema descriptions. It does not fully compensate for missing parameter details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool persists a value via PUT with namespace and key. It distinguishes the storing action from reading, searching, or deleting memories, though it does not explicitly name sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides TTL constraints (min 60, max 30 days) and explains that values survive until evicted or deleted. However, it lacks guidance on when to use this tool versus alternatives like list_memory or recall_memories.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

submit_errandAInspect

Submit an async job that runs off your context; returns a job_id immediately. type='fetch_bundle' (fetch up to 8 URLs into one artifact), 'delay' (ping a callback in N seconds), or 'deep_research' (multi-round web search → render → refine → a cited markdown report artifact, ~1–2 min; poll check_errand for it, one in flight per agent).

ParametersJSON Schema

Name	Required	Description
`type`	Yes
`handle`	No
`inputs`	Yes	fetch_bundle: {urls:[...]}; delay: {seconds:N}; deep_research: {query:str, max_rounds?:1-3}
`secret`	No
`callback_url`	No	optional completion webhook

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses that the tool is asynchronous, returns immediately with a job_id, and imposes a concurrency limit of one deep_research per agent. This adds value beyond the sparse annotations, which only indicate non-read-only, non-idempotent, and non-destructive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, consisting of two sentences that efficiently convey the core functionality, job types, and key constraints. Every part is informative and earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a medium-complexity tool with three job types and no output schema, the description covers the return value (job_id), basic behavior of each type, and the polling requirement. It could improve by mentioning error handling or limits, but it is largely complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 40% schema description coverage, the description should clarify the parameters. It adds some context for the 'type' enum but does not explain 'handle', 'secret', or elaborate on 'inputs' beyond what the schema provides. This is insufficient to compensate for the low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool submits an async job and returns a job_id. It enumerates three distinct job types with brief explanations, making the purpose specific and differentiable from sibling tools like check_errand.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use each job type and mentions a concurrency limit for deep_research. However, it does not explicitly state when not to use the tool or suggest alternatives beyond check_errand.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

subnetA

Read-onlyIdempotent

Inspect

IPv4 Subnet / CIDR Calculator — Network, broadcast, netmask, usable host range and counts for an IPv4 CIDR block.

ParametersJSON Schema

Name	Required	Description
`ip`	No	IPv4 address, e.g. '192.168.1.10'
`cidr`	No	CIDR string, e.g. '192.168.1.0/24' (or use ip + prefix)
`prefix`	No	Prefix length 0-32

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, idempotent, non-destructive. Description adds value by specifying exactly what is computed (network, broadcast, etc.) beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that is clear and to the point, with all essential information front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity of the tool and the 100% schema coverage, the description is complete. It lists the key outputs despite no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are fully documented. Description does not add significant new meaning beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it calculates network, broadcast, netmask, usable host range and counts for an IPv4 CIDR block. Distinct from sibling tools which are other calculators.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or alternatives provided. Usage is implied by the tool's function, but no guidance on when to use this vs other tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

summarize_memoryA

Read-onlyIdempotent

Inspect

Condense ALL entries in a namespace into a single markdown summary via local Llama 3.2 3B (free, no token cost). Optionally store the result as a new memory entry. Registered handle + secret required.

ParametersJSON Schema

Name	Required	Description
`handle`	Yes
`secret`	No
`store_as`	No	if set, stores the summary as a memory entry with this key
`namespace`	No	namespace to summarize, or '' for all (default '')

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds context about the local model (free, no token cost) and required credentials, enhancing transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the main purpose, no unnecessary words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool that summarizes a namespace, the description adequately covers the model, cost, optional storage, and authentication needs. No output schema exists, so no need to explain return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 50% description coverage (store_as and namespace have descriptions). The description clarifies handle and secret as required, and namespace defaults to '*', adding value over the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it condenses all entries in a namespace into a markdown summary using a specific local model. It distinguishes from sibling tools like search_memory and store_memory by focusing on summarization.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for condensing namespace entries but does not explicitly mention when to use vs alternatives like search_memory or memory_stats. No exclusions or when-not-to-use guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tax_bracketA

Read-onlyIdempotent

Inspect

Progressive Tax Calculator — Total tax, effective and marginal rate from a marginal bracket table.

ParametersJSON Schema

Name	Required	Description	Default
`income`	Yes	Taxable income
`brackets`	Yes	Marginal brackets: [{up_to: number\|null, rate: decimal}, ...]

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive. Description adds that it uses a 'marginal bracket table' but does not detail edge cases or error handling. Acceptable given annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single, front-loaded sentence that succinctly conveys purpose. No redundant words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequately describes inputs and outputs (total tax, effective and marginal rate) despite no output schema. Could mention return format or constraints, but sufficient for a simple calculator.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters with adequate descriptions (100% coverage). Tool description adds no extra parameter info, so baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it's a progressive tax calculator that computes total tax, effective rate, and marginal rate from a bracket table. Distinct from sibling tools like 'effective_rate' which may only compute one value.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or alternatives. Implicitly for tax calculations with marginal brackets, but lacks guidance on when not to use (e.g., flat tax or non-progressive systems).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tdeeA

Read-onlyIdempotent

Inspect

Calorie Needs (BMR + TDEE) — Daily calorie needs: Mifflin–St Jeor basal metabolic rate, total daily energy expenditure by activity level, and a cut/maintain/bulk goal table.

ParametersJSON Schema

Name	Required	Description
`age`	Yes	Age in years
`sex`	No	'male' or 'female' (default male)
`activity`	No	sedentary \| light \| moderate \| active \| very_active (default moderate)
`height_cm`	Yes	Height in centimetres
`weight_kg`	Yes	Body weight in kilograms

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare the tool as read-only and idempotent. The description adds value by naming the specific equation (Mifflin-St Jeor) and clarifying output includes BMR, TDEE by activity, and a goal table. This provides behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that front-loads the key purpose. It is efficient but could be better structured (e.g., bullet points) for readability. There is no wasted text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, and the description only hints at the return format ('goal table'). It does not specify whether all activity levels are returned or just based on input. For a 5-parameter tool, more detail on output fields would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for each parameter. The description does not add extra meaning to parameters beyond what the schema already provides, staying at the baseline of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates BMR and TDEE using the Mifflin-St Jeor equation and includes a goal table. It specifies the resource (daily calorie needs) and verb (calculate), distinguishing it from sibling tools like bmi or calories_burned.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool versus alternatives like calories_burned or body_fat. The context implies it's for daily calorie needs, but no guidance on when not to use or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

text_caseA

Read-onlyIdempotent

Inspect

Text Case Converter — Convert any identifier or sentence to snake_case, kebab-case, camelCase, PascalCase, CONSTANT_CASE, or Title Case.

ParametersJSON Schema

Name	Required	Description	Default
`text`	Yes	Text to convert
`target`	Yes	Target case

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, so the description need not repeat safety. It adds context about converting identifiers or sentences, aligning with annotations. No contradictions, and the behavior is straightforward.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single concise sentence that front-loads the purpose and lists all supported cases. Every word earns its place; no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description sufficiently explains input and output. It covers all target cases and input type. Some might want examples, but for a simple converter it is adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not add parameter-level details, but the schema already fully describes 'text' and 'target' with enums and types. No additional value needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Convert' and the resource 'any identifier or sentence', explicitly listing the six target cases. This provides a specific purpose that distinguishes it from other text-related tools like text_stats or hash_text.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use vs alternatives. The description implies usage for converting text cases, but does not mention when not to use or suggest alternatives, which is adequate given the tool's simplicity and clear purpose.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

text_diffA

Read-onlyIdempotent

Inspect

Text Diff (unified) — Unified line diff between two texts, with a similarity ratio and added/removed line counts.

ParametersJSON Schema

Name	Required	Description
`text_a`	Yes	First text
`text_b`	Yes	Second text
`context`	No	Context lines around each change (default 3)
`to_label`	No	Label for text_b in the diff header (default 'b')
`from_label`	No	Label for text_a in the diff header (default 'a')

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, covering safety aspects. The description adds that the tool produces a unified diff with similarity ratio and line counts, which is beyond annotations but not extensively detailed (e.g., no mention of performance, limits, or edge cases). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that is front-loaded with the tool's purpose and key outputs. Every word contributes to understanding, with no redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description hints at return values (diff, ratio, counts) but does not fully specify the output format. However, for a straightforward diff tool, this is nearly complete. Could be slightly more explicit about the exact structure of the returned diff.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with each parameter having a description. The tool description does not add significant meaning beyond what the schema already provides (e.g., context lines, labels). Baseline score of 3 is appropriate as schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool computes a unified line diff between two texts, includes a similarity ratio, and added/removed line counts. It specifies the verb 'diff' and the resource 'texts', distinguishing it from siblings like levenshtein (edit distance) and text_stats (statistics).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide any guidance on when to use this tool versus alternatives. It does not mention prerequisites, context for use, or compare to siblings like levenshtein for character-level similarity. There is no explicit or implicit usage direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

text_statsA

Read-onlyIdempotent

Inspect

Text Statistics Analyzer — Word count, sentence count, character count, average word length, and readability metrics for any text.

ParametersJSON Schema

Name	Required	Description	Default
`text`	Yes	Text to analyze

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, idempotentHint=true, and destructiveHint=false, indicating safe, idempotent behavior. The description adds the list of output metrics but does not elaborate on other behavioral aspects like performance, error handling, or scope limitations. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with a dash and a list, making it concise and clear. It front-loads the main purpose and enumerates outputs efficiently. Could be slightly improved with bullet points, but overall it is well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having no output schema, the description lists the metrics returned, which provides sufficient context for expected outputs. The tool is simple (one parameter), so the description is mostly complete, though it does not specify the exact return format (e.g., JSON structure).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage for the single required parameter 'text'. The description does not provide additional meaning beyond what the schema already states (i.e., 'Text to analyze'). Since coverage is high, a baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: providing word count, sentence count, character count, average word length, and readability metrics for any text. It distinguishes itself from sibling tools like 'text_case' (which changes text case) and 'statistics' (which handles numerical data), making its purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for analyzing text statistics but offers no explicit guidance on when to use this tool versus alternatives, nor does it specify when not to use it. It simply says 'for any text,' which is too broad and lacks exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

text_truncate_ellipsisA

Read-onlyIdempotent

Inspect

Text Truncate with Ellipsis — Shorten text to at most max_length characters, replacing the cut tail with an ellipsis marker (optionally at a word boundary).

ParametersJSON Schema

Name	Required	Description
`text`	Yes	Text to truncate
`ellipsis`	No	Ellipsis marker (default the single-char U+2026 '…')
`max_length`	Yes	Maximum output length in characters
`word_boundary`	No	Cut at the last whitespace before the limit instead of mid-word (default false)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, indicating a safe, deterministic transformation. The description adds behavioral context about ellipsis replacement and word boundary option, going beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that front-loads the core purpose and includes optional details efficiently. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately explains input processing. However, it omits edge cases (e.g., behavior when text is shorter than max_length). It is mostly complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining the default ellipsis character (U+2026 '…') and the word boundary option, clarifying param behavior beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (truncate), the resource (text), and the output behavior (add ellipsis). It distinguishes itself from sibling text tools like text_wordwrap and text_diff by focusing on truncation with an ellipsis marker.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for shortening text but does not explicitly state when to use this tool versus alternatives like text_wordwrap or text_case. No exclusions or prerequisites are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

text_wordwrapA

Read-onlyIdempotent

Inspect

Text Word-Wrap / Reflow — Wrap plain text to a fixed column width (stdlib textwrap), returning either an array of lines or a single re-flowed string.

ParametersJSON Schema

Name	Required	Description
`mode`	No	wrap \| fill (default wrap)
`text`	Yes	Text to wrap
`width`	No	Max characters per line, 1-1000 (default 70)
`break_long_words`	No	Split a word longer than width across lines (default true)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive behavior. The description adds context: uses stdlib textwrap, returns either array of lines or single string. No contradictions, and useful behavioral detail is provided.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single, front-loaded sentence. No wasted words, and it efficiently communicates the core functionality and output options.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple tool (4 params, no output schema), the description covers the essentials: wrapping plain text, stdlib, and output types. It omits edge cases like very narrow widths, but that is acceptable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so each parameter is documented. The description adds value by linking the mode parameter to the output format (lines vs. string), which is not explicit in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'wrap/reflow' and the resource 'plain text', and distinguishes from siblings by specifying fixed column width and stdlib textwrap. It also mentions output formats (array of lines or string), leaving no ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool over alternatives like text_truncate_ellipsis or text_stats. The usage is implied by the specific wrapping capability, but no direct guidance is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

timezone_convertA

Read-onlyIdempotent

Inspect

Time Zone Converter — Convert an ISO datetime between IANA time zones with correct DST offsets.

ParametersJSON Schema

Name	Required	Description
`to_zone`	Yes	Target IANA zone, e.g. Asia/Tokyo
`datetime`	Yes	ISO 8601 datetime
`from_zone`	No	Source IANA zone for naive input (default UTC)

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint. Description adds 'correct DST offsets', providing behavioral context beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with purpose, no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Sufficient for a simple conversion tool with 3 parameters and no output schema. No missing information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and description adds little beyond rephrasing 'ISO datetime' and 'IANA zone'. Baseline 3 appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Convert', the resource 'ISO datetime between IANA time zones', and adds value with 'correct DST offsets', distinguishing it from sibling tools like epoch_convert.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use or avoid this tool vs alternatives. The purpose is implied but lacks direct comparative context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tip_splitA

Read-onlyIdempotent

Inspect

Tip & Bill Split Calculator — Tip amount, grand total and per-person share for a bill.

ParametersJSON Schema

Name	Required	Description
`tip_pct`	No	Tip percent (default 18)
`round_up`	No	Round each person's share up to the cent
`num_people`	No	Number of people splitting (default 1)
`bill_amount`	Yes	Bill amount in USD

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, indicating a safe, non-mutating operation. The description adds that it computes tip amount, grand total, and per-person share, which is useful but not extensive. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, succinct sentence that front-loads the key purpose ('Tip & Bill Split Calculator') followed by the outputs. No redundant words or unnecessary details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity and lack of output schema, the description adequately states the return values (tip amount, grand total, per-person share). It does not mention default parameter values (e.g., tip_pct=18, num_people=1) or nuances of rounding, but for a straightforward calculator, the information is nearly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all four parameters (bill_amount, tip_pct, round_up, num_people) with descriptions. The tool description does not add additional meaning or context beyond what the schema provides, earning a baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: calculating tip amount, grand total, and per-person share for a bill. It uses a specific verb ('Calculator') and resource ('Tip & Bill Split'), distinguishing it from sibling tools that cover other financial calculations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives, nor any exclusions or prerequisites. The agent receives no context about scenarios where tip_split is appropriate compared to other financial tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

token_costA

Read-onlyIdempotent

Inspect

LLM Token & API Cost Estimator — Estimate token count from text (or pass exact counts) and compute API cost at per-million prices.

ParametersJSON Schema

Name	Required	Description
`text`	No	Text to estimate input tokens from (~4 chars/token)
`calls`	No	Number of identical calls to total (default 1)
`input_tokens`	No	Exact input token count (overrides text estimate)
`output_tokens`	No	Output token count
`price_per_1m_input`	No	USD price per 1,000,000 input tokens
`price_per_1m_output`	No	USD price per 1,000,000 output tokens (defaults to input price)

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true and idempotentHint=true. The description adds context about estimation heuristics (~4 chars/token) and cost computation per-million prices, which is valuable beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence. It front-loads the purpose and includes essential details. No wasted words, though it could be slightly more structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the main function but omits details about the return value (e.g., total cost breakdown). With 6 parameters and no output schema, more information about what the tool returns is needed for completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with each parameter already described. The description reiterates 'per-million prices' which is already in schema descriptions. It does not add new meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verbs (estimate, compute) and clearly identifies the resource (token count, API cost). It distinguishes from sibling tools as a specialized estimator for LLM tokens and costs, which no other sibling does.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for estimating token count and API cost but does not explicitly state when to use or avoid this tool, nor does it mention alternative tools. Guidance is only implicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

toml_to_jsonA

Read-onlyIdempotent

Inspect

TOML to JSON Converter — Parse a TOML v1.0.0 document into JSON-compatible data (datetimes rendered as ISO 8601 strings).

ParametersJSON Schema

Name	Required	Description	Default
`toml`	Yes	TOML document text

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent behavior; the description adds the specific detail that datetimes are rendered as ISO 8601 strings, which enhances transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, focused sentence with no redundant information, efficiently conveying the tool's purpose and a key behavior.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter and no output schema, the description sufficiently covers the input and behavior, making it complete for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear parameter description ('TOML document text'). The description does not add additional meaning beyond what the schema already provides, meeting baseline expectations.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'TOML to JSON Converter' with specific format details (TOML v1.0.0, datetimes as ISO 8601), clearly distinguishing it from sibling conversion tools like yaml_json_convert and csv_json_convert.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for converting TOML to JSON but provides no explicit guidance on when to use this tool versus alternatives, nor does it mention prerequisites or context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

triangle_solverA

Read-onlyIdempotent

Inspect

Triangle Solver (SSS / SAS / ASA / AAS) — Solve all sides and angles of a triangle from any valid combination of three known values.

ParametersJSON Schema

Name	Required	Description
`A`	No	Angle A in degrees
`B`	No	Angle B in degrees
`C`	No	Angle C in degrees
`a`	No	Side a
`b`	No	Side b
`c`	No	Side c

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, so the description adds no further behavioral traits. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with examples and clear purpose, no wasted words. Front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a mathematical tool with full parameter descriptions and safety annotations, the description adequately covers the purpose. The output is implied ('solve all sides and angles'), and no further details are needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all 6 parameters. The description does not add additional semantics beyond summarizing the tool's function.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool solves triangles given three known values, listing the common cases (SSS, SAS, ASA, AAS). It is specific and distinguishes from siblings like 'geometry' which might be broader.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implicitly indicates when to use: when you have three known triangle values. It does not explicitly exclude cases or mention alternatives, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tts_voiceA

Read-onlyIdempotent

Inspect

Text-to-speech via ElevenLabs. BYOK ONLY. The MP3 is stored as a durable Wingman artifact and its URL returned. Returns {ok, audio_url, voice_id, chars}.

ParametersJSON Schema

Name	Required	Description
`text`	Yes	text to speak (max 5000 chars)
`handle`	No	your handle (BYOK via vault + artifact ownership)
`secret`	No	your agent secret
`api_key`	No	your ElevenLabs key (BYOK, inline)
`key_ref`	No	vault entry holding your ElevenLabs key (BYOK alt)
`model_id`	No	ElevenLabs model (default eleven_multilingual_v2)
`voice_id`	No	ElevenLabs voice id (default: Rachel)

Tool Definition Quality

A3.5/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description states that the MP3 is stored as a durable artifact, indicating a side effect (creation of an artifact). This contradicts the annotation 'readOnlyHint: true', which implies no state modification. Additionally, 'idempotentHint: true' is questionable if each call creates a new artifact. This is a serious inconsistency, severely impacting transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise, with two sentences covering purpose, constraints, and output. No redundant information or repetition of schema details. Perfectly sized and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (7 parameters, no output schema), the description covers the purpose and output format adequately. However, it lacks explanation on authentication parameters (handle, secret) and does not mention error cases or the 5000-character limit (though schema does). The missing output schema is partially compensated by listing return fields. Overall, sufficient but not comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% parameter coverage with descriptions. The tool description adds minimal extra meaning beyond 'BYOK' context and return format; it does not elaborate on parameter interdependencies (e.g., api_key vs key_ref). Hence, baseline score given sufficient schema documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function (text-to-speech via ElevenLabs), key constraint (BYOK only), and output (MP3 artifact URL). It is specific and distinct from sibling tools, as no other TTS tool appears in the sibling list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly highlights the 'BYOK ONLY' prerequisite, guiding users on required setup. It does not, however, contrast with alternatives or specify when not to use, but the context is clear given the uniqueness of the tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tvmA

Read-onlyIdempotent

Inspect

Time Value of Money Solver — Solve for any one of PV, FV, PMT, N, or rate given the other four TVM variables.

ParametersJSON Schema

Name	Required	Description
`n`	No	Number of periods
`fv`	No	Future value (default 0)
`pv`	No	Present value
`pmt`	No	Payment per period
`rate`	No	Rate per period as a decimal
`solve_for`	Yes	Which variable to solve for

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint true, idempotentHint true, and destructiveHint false, so the description doesn't contradict and adds context about the specific variables solved. It adds value beyond annotations by specifying the calculation type.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that conveys the core functionality without any wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description omits important context such as assumed conventions (e.g., ordinary annuity, compounding frequency), output format, or limitations. Given no output schema, it would benefit from mentioning the return value.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description repeats variable names but adds no additional meaning or constraints beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it solves for one TVM variable given the others, with a specific verb and resource. However, it does not explicitly distinguish from sibling financial tools like bond_price or mortgage, but the purpose is clear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when four TVM variables are known and the fifth is needed, but it does not provide explicit when-to-use or when-not-to-use guidance, nor does it mention alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

unit_convertB

Read-onlyIdempotent

Inspect

Unit Converter — Convert length, mass, volume, time, data or temperature between units.

ParametersJSON Schema

Name	Required	Description
`value`	Yes	Value to convert
`to_unit`	Yes	Target unit in the same category
`from_unit`	Yes	Source unit (e.g. km, lb, gal, C, MB)

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, so the description does not need to reiterate safety. The description adds no behavioral context beyond what annotations provide; it does not disclose any quirks or limitations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that front-loads the tool's title and main action. It is concise and contains no redundant information, though it could benefit from bullet points or examples.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple conversion tool with 3 required parameters and full schema coverage, the description is minimally sufficient. However, it lacks details about supported units, error handling, or output format, which could be helpful given no output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and each parameter (value, from_unit, to_unit) has a basic description in the schema. The description adds no extra semantic meaning, such as unit format or case sensitivity, beyond what is already in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool converts between units in specific categories (length, mass, volume, time, data, temperature). It uses a specific verb 'Convert' and resource 'units', and lists categories to distinguish it from generic converters, though it does not explicitly differentiate from sibling converter tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. Sibling tools include other converters (e.g., base_convert, color_convert, timezone_convert), but no conditions or exclusions are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

unit_price_compareA

Read-onlyIdempotent

Inspect

Unit Price Comparator — Compare 2+ products' price-per-unit across mass/volume/count, converting units within the same dimension, and flag the best value.

ParametersJSON Schema

Name	Required	Description	Default
`items`	Yes	Array of 2+ {name, price, quantity, unit} objects

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive. The description adds behavioral context: it converts units within the same dimension and flags the best value. This goes beyond annotations by specifying the conversion constraint and output behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence with a front-loaded title. Every word adds value. No unnecessary content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While the description covers input requirements and behavior, it does not specify the output format beyond 'flag the best value'. Since there is no output schema, the description should clarify whether it returns all comparisons or just the best. This is a minor gap for a simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% but the schema description is minimal ('Array of 2+ {name, price, quantity, unit} objects'). The tool description adds meaning by explaining the purpose of inputs (compare across dimensions, convert units) and the constraint of within-dimension conversion, which is not in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action (compare), resource (price-per-unit of products), and scope (mass/volume/count dimensions with unit conversion and best value flagging). It distinguishes itself from the sibling tool 'unit_convert' by specifying comparison and best-value identification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implicitly tells when to use: when needing to compare price-per-unit across multiple products. It does not explicitly state when not to use or mention alternatives like 'unit_convert' for simple conversions, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

uptime_slaA

Read-onlyIdempotent

Inspect

Uptime / SLA Downtime Calculator — Allowed downtime per day/week/month/year from an availability 'nines' percent.

ParametersJSON Schema

Name	Required	Description	Default
`availability_pct`	Yes	Availability percent, e.g. 99.9

Tool Definition Quality

A3.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, indicating a safe, read-only operation. The description adds minimal behavioral context beyond the annotations, simply restating the computation. No contradictions are present.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that fully communicates the tool's purpose. It is front-loaded with the key verb 'Calculator' and includes the scope, with no extraneous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simplicity (1 parameter, no output schema, no nested objects), the description adequately explains the input and what is calculated. However, it omits specifics about the output format (e.g., units of time) and any constraints on the input range.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage for the single parameter 'availability_pct' with a clear description 'Availability percent, e.g. 99.9'. The description adds nothing beyond this, so the baseline of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is an 'Uptime / SLA Downtime Calculator' that calculates 'Allowed downtime per day/week/month/year from an availability nines percent'. This is a specific verb-resource pair that distinguishes it from sibling tools, most of which are financial or mathematical calculators.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for computing downtime from availability percentage but provides no explicit guidance on when to use this over alternatives, no exclusions, and no context about prerequisites or typical scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

url_parse_normalizeA

Read-onlyIdempotent

Inspect

URL Parser / Normalizer — Split a URL into scheme/host/port/path/query/fragment, or normalize it per RFC 3986 (lowercase host, strip default port, collapse ./.. segments).

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	URL to parse or normalize (must include a scheme)
`mode`	No	parse \| normalize (default parse)

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, idempotent, non-destructive. The description adds value by specifying RFC 3986 compliance, actions like lowercase host, strip default port, collapse segments, and the requirement for a scheme. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence front-loaded with the tool's purpose, followed by specific details. No wasted words, every part contributes to understanding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the main functionality and expected outputs for both modes, but lacks explicit mention of the return format (though it can be inferred). Given the tool's simplicity and the absence of an output schema, it is sufficiently complete for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with both parameters described. The description explains overall behavior but does not add significant new meaning to the parameters beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verbs ('split', 'normalize') and clearly identifies the resource (URL). It distinguishes two modes and lists the components, making it unambiguous and distinct from any sibling tools (none other URL-related).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when URL parsing or normalization is needed, but does not provide explicit when-to-use or when-not-to-use guidance, nor does it mention alternatives. For a focused tool like this, the implied context is adequate but not exemplary.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

uuid5A

Read-onlyIdempotent

Inspect

Deterministic UUID (v5 / v3) — Stable name-based UUID from a namespace and name — same inputs, same UUID (no randomness).

ParametersJSON Schema

Name	Required	Description
`name`	Yes	Name to hash into the UUID
`version`	No	5 (SHA-1, default) or 3 (MD5)
`namespace`	No	dns/url/oid/x500 or a UUID string (default dns)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description emphasizes determinism and no randomness, which aligns with annotations (idempotentHint, readOnlyHint). It adds behavioral context beyond annotations by specifying the version and stability.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a single, well-structured sentence that front-loads key information and has no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of output schema, the description sufficiently explains input parameters and behavior. It covers version options and the deterministic property, making it complete for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%; description mentions namespace and name and version (v5/v3) but adds little extra beyond the schema descriptions. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates deterministic UUIDs (v5/v3) from a namespace and name, distinguishing it from random UUID generators or other hash tools among siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (stable name-based UUID) but does not explicitly exclude other UUID versions or random generation; however, the context is clear enough for an agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vault_call_apiAInspect

ZERO-EXPOSURE authenticated HTTP call: store an API key/credential in your vault, then call any API and let the gateway inject the secret server-side — it NEVER enters your context. You send method/url/auth (and optional headers/body); the gateway decrypts, injects, calls through its SSRF-guarded fetch, and returns only the response. auth = {type, ref, name?}: type 'bearer' -> Authorization: Bearer; 'header' (+name) -> a named header; 'basic' -> Authorization: Basic of an entry's username+password; 'query' (+name) -> a URL query param. ref names a vault entry ('entry' or 'entry:field', e.g. 'openai_key:key'). Do NOT pass Authorization yourself. CAVEAT: zero-exposure covers OUR outbound path — a hostile API can still echo your credential in its own response body. A redirected POST is followed as GET with the body dropped, and credentials are stripped on a cross-origin redirect. Requires your secret (Bearer).

ParametersJSON Schema

Name	Required	Description
`url`	Yes	target URL (https recommended)
`auth`	Yes	{type:'bearer'\|'header'\|'basic'\|'query', ref:'entry[:field]', name?}
`body`	No	optional JSON body (POST/PUT/PATCH)
`handle`	Yes	your registered handle
`method`	Yes	HTTP method
`headers`	No	optional NON-secret request headers (Authorization is forbidden here — use auth)

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint=false, etc.), the description discloses key behavioral traits: secret never enters context, redirect behavior (POST->GET, credential stripping), and the risk of API echo. This fully informs the agent of important side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is dense and well-structured, with clear sections for purpose, auth details, and caveats. It is slightly long but every sentence provides value. Could be trimmed slightly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 6 parameters and no output schema, the description covers input semantics, behavior, and caveats. However, it does not describe the response format (e.g., JSON, status codes), which would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Although schema coverage is 100%, the description adds significant meaning: it explains auth object types in detail, clarifies handle usage, and explicitly forbids Authorization in headers. This goes well beyond the schema's descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'ZERO-EXPOSURE authenticated HTTP call' using vault- stored secrets. It distinguishes itself from sibling vault tools (e.g., vault_get, vault_store) which manage credentials, not make calls.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use the tool (with vault-stored credentials) and explicitly warns against passing Authorization headers manually. It includes caveats about redirect behavior and hostile API echo. It doesn't name alternative tools like web_read for comparison, but the guidance is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vault_deleteA

DestructiveIdempotent

Inspect

Delete a vault entry by name. Requires your secret (Bearer).

ParametersJSON Schema

Name	Required	Description	Default
`name`	Yes	the entry name to delete
`handle`	Yes	your registered handle

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare destructiveHint=true and idempotentHint=true. The description adds value by specifying authentication requirements ('Requires your secret (Bearer)'), which is not in annotations. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise (two sentences), front-loaded with the key action, and every word adds value. No wasted text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple deletion tool with no output schema and good annotations, the description covers the core action, identifier mechanism, and authentication need. It is complete enough for an agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not add meaning beyond the schema for the two parameters; 'name' and 'handle' are already documented in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('delete') and resource ('a vault entry') and specifies it is by name. This distinguishes it from sibling tools like vault_get (read), vault_list (list), vault_store (store), and vault_login (auth).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions a prerequisite ('Requires your secret (Bearer)') but does not explicitly state when to use this tool vs alternatives, nor when not to use it. Usage is implied for deletion but lacks comparative guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vault_getA

Read-onlyIdempotent

Inspect

Retrieve and DECRYPT one vault entry's value (returns plaintext to you). Use only when YOU must handle the secret (e.g. an API Authorization header); for browser logins prefer vault_login (zero-exposure). Requires your secret (Bearer).

ParametersJSON Schema

Name	Required	Description	Default
`name`	Yes	the entry name
`handle`	Yes	your registered handle

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false. Description adds that the tool decrypts and returns plaintext, and requires Bearer token authentication, which is useful beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with key action, no unnecessary words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description explains return value (plaintext). Could benefit from mentioning error conditions or handle validation, but overall sufficient for a simple retrieval tool with good annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters with clear descriptions. Description does not add additional parameter details, but the baseline of 3 is appropriate as schema already defines both parameters adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action: retrieve and decrypt a vault entry's value, returning plaintext. It distinguishes from vault_login by specifying use case for browser logins vs. direct secret handling.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (when agent must handle secret) and when not to (prefer vault_login for browser logins), and provides a clear alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vault_listA

Read-onlyIdempotent

Inspect

List your vault entries — names, kind, metadata, timestamps ONLY (never values). Requires your secret (Bearer).

ParametersJSON Schema

Name	Required	Description	Default
`handle`	Yes	your registered handle

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint false, so the agent knows it's safe and idempotent. The description adds important behavioral info: it never returns values (contrary to vault_get) and requires a secret (Bearer). No contradiction with annotations. The description adds value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with zero waste. First sentence defines function and scope, second adds critical auth requirement. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, so the description carries the burden of explaining the return. It specifies the fields returned (names, kind, metadata, timestamps) and what is not returned (values). It also mentions auth. It could be improved by describing the structure (list, sorted order, pagination) but is adequate for a simple list operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a description for the single parameter 'handle' ('your registered handle'). The tool description does not add additional semantics or format details for this parameter beyond what the schema already provides. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool lists vault entries and specifies exactly which fields are included (names, kind, metadata, timestamps) and crucially states what is not included ('never values'). This distinguishes it from sibling vault_get which likely returns values. The verb 'List' plus resource 'vault entries' is explicit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies when to use: to get a metadata-only summary. It explicitly says 'never values', which hints that for values one should use another tool (like vault_get). However, it does not explicitly name the alternative or state when not to use. Clear context but lacks explicit exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vault_loginAInspect

ZERO-EXPOSURE browser login: fill a form from your encrypted vault WITHOUT the plaintext ever entering your context. vault_fields maps each form @eN ref to a vault entry (or 'entry:field' for a multi-field entry), e.g. {'@e3':'github:username','@e4':'github:password'}. The gateway verifies you own the browser session, decrypts server-side, fills, and returns only {ok,url}. Requires your secret (Bearer).

ParametersJSON Schema

Name	Required	Description
`handle`	Yes	your registered handle (owns the session)
`browser_id`	Yes	from browse_open
`submit_ref`	No	optional @eN ref to click after filling
`vault_fields`	Yes	{'@eN ref': 'entry_name' \| 'entry_name:field', ...}

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond annotations by detailing key behaviors: server-side decryption, gateway verification, return format {ok, url}, and the requirement of a Bearer secret. Annotations (readOnlyHint=false, destructiveHint=false) are consistent, and the description adds critical context about the security model and side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loading the key benefit and purpose. Every sentence is informative: the zero-exposure claim, the mapping format, and the process summary. No unnecessary words or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 params, nested objects, no output schema), the description explains the return value {ok, url}, the mapping, and authentication requirement. It is complete for the core flow, though it does not cover error cases or missing vault entries. Still, it meets the needs for a focused login tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with descriptions for all parameters. The tool description adds some nuance (e.g., the format of vault_fields as 'entry:field'), but since the schema already describes each parameter, the added value is marginal. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'ZERO-EXPOSURE browser login: fill a form from your encrypted vault WITHOUT the plaintext ever entering your context.' It specifies the action (fill form), resource (vault), and unique benefit (no plaintext exposure). This distinguishes it from sibling browser fill tools and vault retrieval tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use this tool (to fill forms securely without plaintext exposure) and provides a usage pattern with the vault_fields mapping. However, it does not explicitly mention when not to use it or alternatives like browse_fill or vault_get. The context is clear but lacks exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vault_storeA

Idempotent

Inspect

Store a secret (a site login or API key) ONCE, encrypted at rest under a key derived from YOUR agent secret — so it survives your restarts. Requires your secret (Authorization: Bearer). The 'name' and 'metadata' are stored in PLAINTEXT for listing — never put a secret in them. value is JSON, e.g. {'username':'..','password':'..'} or {'key':'..'}.

ParametersJSON Schema

Name	Required	Description
`kind`	No	optional hint
`name`	Yes	label, e.g. 'github' (plaintext; no secrets here)
`value`	Yes	the secret payload, e.g. {'username','password'}
`handle`	Yes	your registered handle
`metadata`	No	optional plaintext notes, e.g. {'site':'github.com'}

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds significant behavioral context beyond annotations: encryption at rest, survival across restarts, authentication requirement (Bearer token), and plaintext exposure of name/metadata. No contradictions with annotations (readOnlyHint=false, idempotentHint=true, destructiveHint=false). Minor gap: no mention of success/error response or duplicate handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: first states purpose and encryption, second states auth requirement, third warns about plaintext fields with examples. Front-loaded with key information, no filler. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, nested objects, auth, encryption) and absence of output schema, the description covers necessary aspects: purpose, auth, plaintext risks, and example. It does not detail response behavior or error cases, but those are standard for store operations and can be inferred from sibling tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, providing baseline 3. Description adds value by giving concrete examples for the 'value' parameter (e.g., {'username','password'}) and reinforcing the plaintext warning for 'name' and 'metadata'. This helps the agent form correct inputs beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool's purpose: to store a secret (site login or API key) exactly once, encrypted at rest. It specifies the resource ('secret') and the action ('store'), and distinguishes itself from sibling vault tools (vault_get, vault_list, vault_delete, vault_login) which handle retrieval, deletion, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implicitly indicates usage for initial secret storage ('store ONCE') and warns against putting secrets in plaintext fields. However, it does not explicitly provide when-to-use or when-not-to-use guidance compared to alternatives like vault_login or vault_call_api. No mention of prerequisites beyond the bearer token.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vin_check_digitA

Read-onlyIdempotent

Inspect

VIN Check Digit Validator — Validate a 17-character VIN's ISO 3779/NHTSA check digit (position 9), or compute the check digit a partial VIN would need.

ParametersJSON Schema

Name	Required	Description	Default
`vin`	Yes	17-character VIN
`mode`	No	validate \| check_digit (default validate)

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, so the safety profile is covered. The description adds the standard (ISO 3779/NHTSA) and positional detail, which is useful but not strictly necessary beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded with purpose, no wasted words. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with two parameters and no output schema, the description covers the functionality adequately. It could specify return values (e.g., boolean for validate, digit for compute) but the implicit understanding is sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, providing baseline 3. The description adds ISO standard and position details, enhancing understanding beyond schema descriptions. It also explains the mode enum clearly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description specifies a clear action (validate or compute) on a specific resource (VIN check digit) with a defined standard (ISO 3779/NHTSA). It distinguishes itself from sibling tools like 'luhn' and 'checksum' by focusing on VIN check digits.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use the tool (for VIN check digit operations) but does not explicitly state when not to use it or mention alternative tools. However, the context of VIN-specificity is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

voice_transcribeA

Read-onlyIdempotent

Inspect

Transcribe short audio to text via OpenAI Whisper. BYOK ONLY (your own OpenAI key). Pass 'audio_url' (fetched SSRF-safely, ≤10MB) or 'audio_b64'. Sync — chunk long recordings. Returns {ok, transcript, chars}.

ParametersJSON Schema

Name	Required	Description
`handle`	No	your handle (for BYOK via vault)
`prompt`	No	context/vocabulary hint (optional)
`secret`	No	your agent secret (for BYOK via vault)
`api_key`	No	your OpenAI key (BYOK, inline)
`key_ref`	No	vault entry holding your OpenAI key (BYOK alt)
`language`	No	ISO code hint, e.g. en (optional)
`audio_b64`	No	base64 audio (alternative to audio_url)
`audio_url`	No	http(s) URL of the audio (≤10MB)
`content_type`	No	e.g. audio/mpeg (helps format detection)

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, destructiveHint. The description adds significant behavioral context: BYOK requirement, SSRF-safe URL fetching, 10MB limit, sync operation with chunking for long recordings, and return format {ok, transcript, chars}. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, then practical requirements and behavior. No wasted words; every sentence adds information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description specifies the return format. It covers all critical aspects: input modes, auth requirements, size limits, and synchronous chunking behavior. The tool's complexity is moderate, and the description is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% so baseline is 3. The description adds value by highlighting the two main input methods (audio_url and audio_b64) with extra details (SSRF-safety, size limit), and mentioning BYOK which ties to multiple auth parameters. It doesn't detail every parameter, but the key ones are covered.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool transcribes short audio to text via OpenAI Whisper. It is specific (verb 'transcribe', resource 'audio to text') and inherently distinguishes from siblings as no other transcription tool exists in the list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use: BYOK only, with audio_url or audio_b64, and that it handles long recordings by chunking. It does not explicitly state when not to use or compare to alternatives, but given no similar siblings, it is adequate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

web_discoverA

Read-onlyIdempotent

Inspect

Tier-0 front door: check whether a site offers an AGENT-NATIVE interface (llms.txt / OpenAPI / ai-plugin) and prefer it over scraping. Free.

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	site to probe (http/https; SSRF-guarded)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds that it checks for specific file types and mentions 'SSRF-guarded' in the parameter description, providing useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One sentence that is front-loaded with the core purpose and key constraints. Every word serves a purpose, no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple probe tool with one parameter and no output schema, the description covers purpose, target, and cost. It could mention what the tool returns (e.g., boolean or list), but not required given its simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema covers 100% of the parameters with a description of 'site to probe (http/https; SSRF-guarded)'. The main description does not add extra parameter information, so the baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'check whether a site offers an AGENT-NATIVE interface (llms.txt / OpenAPI / ai-plugin) and prefer it over scraping.' It uses specific verbs and resources, and distinguishes from siblings like browse and web_read.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explicitly says 'prefer it over scraping', giving clear guidance to use this before scraping tools. It also mentions 'Free'. However, it does not explicitly state when not to use or list alternatives, though sibling context is present.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

web_readA

Read-onlyIdempotent

Inspect

Read a web page the way fetch can't: render the REAL (JavaScript/SPA) page in a headless browser and return clean readability markdown. Free. mode='honest' declares identity (default); mode='stealth' enables anti-detect when a site arbitrarily walls non-humans (governed by your colony standing).

ParametersJSON Schema

Name	Required	Description
`url`	Yes	the page to read (http/https; SSRF-guarded)
`mode`	No	default honest
`handle`	No	your registered handle (governs powerful tiers)

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Disclosures: renders JS, returns markdown, free, uses headless browser, anti-detect stealth mode governed by colony standing. Annotations confirm readOnly, idempotent, non-destructive. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two tightly written sentences plus a brief mode explanation. Front-loaded with core purpose, no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers input parameters, output (markdown), behavioral modes, and constraints (free, anti-detect). No output schema but description sufficiently describes return. Complete for a read tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage. Description adds SSRF-guarded for url, default for mode, and handle governs powerful tiers. This enriches meaning beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it reads a web page by rendering JavaScript/SPA in a headless browser and returns clean readability markdown, distinguishing it from basic fetch. It specifies free, two modes (honest/stealth), and purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description explains when to use this tool (when fetch can't render JS) and describes modes for different situations (honest vs stealth). Could more explicitly differentiate from sibling browse_read, but the guidance is clear for most use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

web_searchA

Read-onlyIdempotent

Inspect

Find things on the live web: top results as [{title, url, snippet}]. The discovery front-end for the browser — search, then web_read/browse the URLs. Free.

ParametersJSON Schema

Name	Required	Description	Default
`count`	No	max results (default 8)
`query`	Yes	what to search for

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds output format detail. Annotations already declare readOnly, idempotent, non-destructive. No contradictions. Could mention rate limits or result freshness but not necessary for simple tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two short sentences, no wasted words. Front-loads purpose and key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with 2 params and no output schema, description is complete. Explains role and follow-up actions. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters with descriptions. Description does not add extra meaning beyond 'find things on the live web'. Baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it finds things on the live web and returns top results with title, URL, snippet. Distinguishes from sibling browse/search tools by specifying it's the discovery front-end.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Suggests workflow: search, then use web_read/browse URLs. Implicitly tells when to use (for discovery) and what to do next, but does not explicitly state when not to use or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

weighted_averageA

Read-onlyIdempotent

Inspect

Weighted Average Calculator — Weighted mean, sum of weights, and effective contribution of each value.

ParametersJSON Schema

Name	Required	Description	Default
`values`	Yes	Numeric values
`weights`	Yes	Weights, same length as values

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare the tool as read-only, idempotent, and non-destructive. The description adds value by specifying the three outputs (weighted mean, sum of weights, effective contribution), which is beyond what annotations provide. However, it lacks details on output format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is front-loaded with the core purpose. Every word earns its place, and there is no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of output schema, the description should fully explain return values. It mentions three outputs but not their structure or order. The parameter details are adequately covered by the schema. Overall, it is somewhat complete but could be more detailed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with basic descriptions ('Numeric values', 'Weights, same length as values'). The description does not add further meaning or constraints (e.g., positive weights). Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it calculates weighted mean, sum of weights, and effective contribution. It distinguishes from sibling tools like 'statistics' by being specific to weighted average. However, it could be more explicit about differentiating from similar calculation tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for computing weighted averages but provides no explicit guidance on when to use this tool versus alternatives. There are no exclusions or when-not scenarios mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

whitespace_normalizeA

Read-onlyIdempotent

Inspect

Whitespace Normalizer — Collapse runs of spaces/tabs, trim line edges, and cap consecutive blank lines in plain text.

ParametersJSON Schema

Name	Required	Description
`text`	Yes	Text to normalize
`trim_lines`	No	Strip leading/trailing whitespace from each line (default true)
`collapse_spaces`	No	Collapse space/tab runs within a line to one space (default true)
`max_blank_lines`	No	Max consecutive blank lines to keep, 0-10 (default 1)

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds operational context beyond annotations (specific transformations). Annotations indicate safe, idempotent, non-destructive, which is consistent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence, front-loaded, and efficient. Could add a usage hint but remains appropriately concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple transformation tool, the description covers purpose and operations. No output schema needed. Return type is implicitly clear.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% and includes defaults. Description adds no new parameter meaning beyond summarizing the tool's actions, which is already in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool normalizes whitespace by collapsing spaces/tabs, trimming lines, and capping blank lines. It distinguishes from siblings like indent_convert or line_ending_convert.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Purpose is clear enough to infer when to use (normalizing whitespace), but no explicit when-not or alternative tool is mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

wishlistAInspect

Shape what gets built next. Propose a tool or feature you want, upvote others' ideas, or list the vote-ranked roadmap — all over MCP. action="propose" {title} · action="vote" {wish_id, handle} · action="list" {status?, limit?}. Proposing is anonymous-friendly; voting attributes to your handle (one vote each). Tools agents ask for but we don't have yet are auto-added here, so this is the live demand board.

ParametersJSON Schema

Name	Required	Description
`limit`	No	for list — max rows (default 50)
`title`	No	for propose — the tool/feature you want built
`action`	Yes	propose a wish, vote on one, or list the roadmap
`handle`	No	your registered handle (required to vote)
`status`	No	for list — filter by status (default: open)
`wish_id`	No	for vote — the wish id to upvote

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read/write/destructive hints are false, but the description adds behavioral context: voting is per handle (one vote each), proposing is anonymous-friendly, and tools are auto-added. This goes beyond annotations to explain side effects (e.g., voting mutates the wish's vote count). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences. The first sentence captures the tool's role, the second lists the actions with their parameters. No redundant information, front-loaded with purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having 6 parameters and 3 actions with no output schema, the description adequately covers the tool's usage context (live demand board). It explains each action's purpose and key behavioral traits. Minor gaps like sort order or pagination for list are not critical for selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, but the description adds semantic value by showing the command-like syntax (e.g., action='propose' {title}) and explaining the role of each parameter in context (e.g., handle required to vote). This helps the agent assemble the correct invocation beyond the schema's static descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: shaping what gets built next through proposing, voting, or listing wishes. It distinguishes itself from siblings by specifying its unique actions and domain (feature requests), which are not covered by any sibling tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance for each action (propose, vote, list) with parameter requirements. It also contextualizes usage: proposing is anonymous-friendly, voting attributes to a handle, and the board auto-populates from agent requests. No explicit 'when not to use' but the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

xml_json_convertA

Read-onlyIdempotent

Inspect

XML <-> JSON Converter — Convert XML to a canonical JSON tag/attrib/text/children object, or that shape back to XML. Rejects DOCTYPE/ENTITY declarations (XXE guard).

ParametersJSON Schema

Name	Required	Description
`xml`	No	XML text (direction=xml_to_json)
`data`	No	{tag, attrib, text, children} object (direction=json_to_xml)
`direction`	Yes	xml_to_json \| json_to_xml

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint. Description adds a security guard (rejects DOCTYPE/ENTITY) and mentions canonical structure, which are valuable behavioral traits beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, first sentence covers purpose and format, second adds security. No wasted words, front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but conversion output is implicit. Description explains the canonical JSON structure and security feature. Adequate for selection, though could add return value hints.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions. Description adds context about the canonical JSON shape (tag/attrib/text/children), enhancing understanding of the 'data' parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it converts XML to canonical JSON objects and back, with a specific verb and resource. It distinguishes from sibling converters like csv_json_convert and yaml_json_convert.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied usage for XML<->JSON conversion; mentions two directions. However, lacks explicit guidance on when to use versus other converters or when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

yaml_json_convertA

Read-onlyIdempotent

Inspect

YAML <-> JSON Converter — Convert a YAML document to JSON, or a JSON object to YAML, using a safe (non-code-executing) parser.

ParametersJSON Schema

Name	Required	Description
`data`	No	JSON object to serialize (direction=json_to_yaml)
`yaml`	No	YAML text (direction=yaml_to_json)
`direction`	Yes	yaml_to_json \| json_to_yaml

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, covering safety and idempotency. The description adds the context that the parser is non-code-executing, which is valuable beyond the annotations. There is no contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that immediately states the purpose. It is concise, well-structured, and contains no fluff. Every word adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is a simple conversion with full schema coverage and annotations indicating safety. No output schema is needed because the output format (JSON or YAML) is implicit. The description, schema, and annotations together provide complete context for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema already documents all parameters. The description mentions 'YAML document' and 'JSON object' but does not add meaningful detail about parameter formats or constraints beyond what the schema provides. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool converts between YAML and JSON, specifying both directions. It uses a specific verb 'Convert' and names the resources 'YAML document' and 'JSON object'. This distinguishes it from sibling converters like csv_json_convert and xml_json_convert.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives such as csv_json_convert or xml_json_convert. It mentions safe parsing but does not specify scenarios or prerequisites. This leaves the agent without decision criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.