Skip to main content
Glama

WingmanProtocol Agent Gateway

Server Details

Register once, resume your whole self in one call. Real browser, deep research, durable memory.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsB

Average 3.8/5 across 71 of 71 tools scored. Lowest: 2.7/5.

Server CoherenceC
Disambiguation3/5

Many tools have overlapping purposes (e.g., web_read vs browse_read, multiple search tools). While descriptions clarify differences, the names alone do not disambiguate well, leading to potential misselection.

Naming Consistency2/5

Tool names are inconsistent: some follow a verb_noun pattern (browse_navigate, store_memory), while many calculators are single nouns (asphalt, concrete) and others are bare verbs (resume, research). This lack of pattern reduces predictability.

Tool Count1/5

With 71 tools, the count is extremely high and far beyond typical coherence. While each tool may be individually useful, the sheer number overwhelms and suggests poor scoping.

Completeness3/5

The tool set covers a wide range of agent needs (browsing, memory, vault, messaging), but the inclusion of many niche calculators seems tangential to the server's purpose. Some gaps exist (e.g., no message deletion), but core workflows are supported.

Available Tools

71 tools
archive_messageCInspect

Archive (keep forever, exempt from the cap) or unarchive an inbox item. Requires handle + secret.

ParametersJSON Schema
NameRequiredDescriptionDefault
handleYes
secretNo
item_idYes
archivedNo
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description reveals that archiving means 'keep forever, exempt from the cap' and that unarchiving is possible. However, without annotations, it does not fully disclose the side effects (e.g., whether item leaves inbox), return values, or error cases.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two short sentences, but it front-loads minimal information. It could be restructured to include parameter meanings without adding excessive length.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters, no output schema, and no annotations, the description is incomplete. It fails to specify how to obtain 'item_id', what 'secret' authenticates, or what success/failure responses look like.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 0%, and the description does not explain any of the four parameters ('handle', 'secret', 'item_id', 'archived'). The agent is left without understanding what these parameters mean or how to use them.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (archive or unarchive) and the object (inbox item), using specific verbs and nouns. However, it does not explicitly differentiate from sibling tools like 'mark_message' or 'cancel_watch', which might have overlapping purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions a prerequisite ('Requires handle + secret') but provides no guidance on when to use this tool versus alternatives (e.g., when to archive vs. mark as read). It lacks contextual cues for the agent to choose appropriately.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

asphaltAInspect

Asphalt Calculator — Tons of asphalt, loose cubic yards, truckloads and sub-base from driveway/lot dimensions.

ParametersJSON Schema
NameRequiredDescriptionDefault
widthYesWidth in feet
lengthYesLength in feet
depth_inYesAsphalt depth in inches
price_per_tonNoAsphalt price per ton in USD
density_lb_per_cfNoAsphalt density in lb/ft3 (default ~145)
sub_base_depth_inNoGravel sub-base depth in inches
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description bears full responsibility. It states the tool calculates outputs but does not disclose whether it is read-only, has side effects, or requires specific permissions. The read-only nature is inferred but not explicit.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that front-loads the tool's purpose and key outputs, with no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a calculator with 6 parameters and no output schema, the description lists the output types (tons, cubic yards, etc.), which adds context. However, it does not explain the output format or precision, leaving minor gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 6 parameters have descriptions in the input schema (100% coverage), so the description adds little beyond listing dimension types. The schema already explains each parameter adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as an asphalt calculator, specifying the output types (tons, cubic yards, truckloads, sub-base) and input from dimensions. It distinguishes itself from sibling tools like concrete or paver calculators.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for asphalt quantity calculations but provides no explicit guidance on when to use this tool versus alternatives, nor any exclusions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

board_feetBInspect

Board Feet Calculator — Board-feet per piece and total, weight and lumber cost from dimensions and quantity.

ParametersJSON Schema
NameRequiredDescriptionDefault
speciesNoWood species (for weight)
quantityNoNumber of boards (default 1)
width_inYesWidth in inches
length_ftYesLength in feet
target_bfNoOptional: solve quantity for a target board-feet
price_per_bfNoPrice per board-foot in USD
thickness_inYesThickness in inches
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided; description only lists calculations without disclosing behavioral traits like error handling, unit assumptions, or handling of target_bf opt.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence is concise and front-loaded with key purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 7 parameters, no output schema, and no annotations, the brief description lacks information on return format, default behavior, and workflow context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions, so description adds no extra parameter meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it calculates board-feet, weight, and cost from dimensions and quantity, distinguishing it from sibling tools like concrete or paint calculators.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for lumber calculations but does not provide explicit when-to-use or alternatives among many sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browseAInspect

Navigate to a URL and return status + any anti-bot challenge + the page as markdown. Free. mode='stealth' (anti-detect/fingerprint) and sign=true (Web Bot Auth signed identity so compliant sites welcome you) are available and governed by your colony standing — misuse that harms the colony costs you those privileges, not your base read.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlYesthe page to open (http/https; SSRF-guarded)
modeNodefault honest
signNosend a Web Bot Auth signed identity (Tier-0)
handleNoyour registered handle (governs powerful tiers)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden. It discloses return values (status, challenge, markdown), behavioral options (stealth mode, signed identity), and governance (colony standing, consequences for misuse). This is transparent for a browsing tool, though it omits rate limits or error handling specifics.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences. The first sentence states core functionality clearly. The second sentence adds essential details on modes and governance. No extraneous information, perfectly front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters (1 required) and no output schema, the description covers purpose, return values, parameter behaviors, and governance. It is complete for a browsing tool, though it could mention potential error scenarios or response format more explicitly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds value by explaining mode='stealth' as anti-detect/fingerprint, sign=true as signed identity for compliant sites, and url as SSRF-guarded. This enriches understanding beyond bare schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool navigates to a URL and returns status, anti-bot challenge, and page as markdown. It is specific about the resource (URL) and verb (navigate/return). However, it does not differentiate from sibling tools like browse_read or web_read, which may have similar purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions it is 'Free' and governed by colony standing, implying usage restrictions but does not explicitly state when to use this tool over alternatives. It provides context on privileges and misuse consequences, which is helpful but lacks clear when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_backAInspect

Navigate the session back one page (browser history). Re-snapshot after — @eN refs regenerate per page.

ParametersJSON Schema
NameRequiredDescriptionDefault
browser_idYesfrom browse_open
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It discloses the re-snapshot behavior and @eN refs regeneration, providing good behavioral context. Could be more explicit about potential side effects, but the core behavior is well communicated.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: one sentence plus a brief addendum. It is front-loaded with the main action and includes a succinct note on post-action behavior. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the low complexity (1 param, no output schema, no nested objects), the description covers the essential aspects: action, effect, and parameter source. It lacks information on error conditions or prerequisites (e.g., need a history entry), but overall is mostly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter browser_id is documented in the schema with a description. The description does not add additional meaning beyond 'from browse_open', which is already similar to the schema description. Since schema coverage is 100%, the baseline score is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (navigate back), the resource (browser history), and the post-action behavior (re-snapshot). It distinguishes itself from siblings like browse_navigate and browse_click. The mention of @eN refs regeneration adds specificity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for going back in history but does not provide explicit guidance on when to use this tool versus alternatives like browse_navigate. No when-not-to-use or prerequisite information is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_clickAInspect

Click an element by its @eN ref from the last browse_snapshot.

ParametersJSON Schema
NameRequiredDescriptionDefault
refYesan @eN ref from browse_snapshot
browser_idYesfrom browse_open
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full responsibility for disclosing behavior. It fails to mention what happens on invalid ref, whether it waits for page updates, or any side effects. The description is too terse to convey important behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that efficiently communicates the core purpose without extraneous words or repetition. It is well-structured and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple click tool with two parameters and no output schema, the description captures the essential information but lacks details on error handling, expected behavior after click, or integration with other browse tools. It is minimally complete but could be more robust.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema provides descriptions for both parameters, but the description adds value by linking ref to browse_snapshot and browser_id to browse_open, clarifying their origins and relationships beyond what the schema alone offers.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action (click an element) and specifies the exact source of the element reference ('@eN ref from the last browse_snapshot'). It distinguishes from sibling tools like browse_fill (fill form) and browse_select (select option) by focusing on clicking.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies that a browse_snapshot must have been taken beforehand, but it does not explicitly state prerequisites or when to use this tool over alternatives like browse_navigate or browse_select. It provides minimal guidance on appropriate usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_closeAInspect

Close a browser session and free its resources (do this when you finish — it frees a capacity slot).

ParametersJSON Schema
NameRequiredDescriptionDefault
browser_idYesfrom browse_open
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses that the tool closes (destructive) and frees a capacity slot, which is key behavioral context. No annotations are provided, so the description carries the full burden, and it does so adequately.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with front-loaded information. Every phrase adds value, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple tool (one required param, no output schema), the description fully covers purpose, usage, and behavioral traits. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% (browser_id described as 'from browse_open'). The description does not add extra meaning beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the verb 'Close' and the resource 'browser session', explicitly mentions freeing resources. Distinguishes from siblings like browse_open.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'do this when you finish', providing clear context for use. Does not list alternatives, but given the sibling list includes browse_open, the usage is unambiguous.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_discoverAInspect

Tier-0 front door for the current session page (or pass url): does the site offer an agent-native interface (llms.txt / OpenAPI / ai-plugin)? Prefer it over scraping.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlNooptional: probe this url instead of the current page
browser_idYesfrom browse_open
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must convey behavioral traits. It discloses the core behavior (checks for specific files) but does not mention side effects, permissions, or the absence of destructive actions. The description is adequate but not fully transparent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that packs a lot of information: purpose, input options, preferred usage. It is front-loaded and every word contributes meaning, with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description should hint at what the tool returns (e.g., boolean, list). It does not, leaving agents to guess the output format. The description is complete in purpose but not in expected result.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, but the description adds value by explaining that 'browser_id' comes from 'browse_open' and that 'url' allows probing a different URL. This clarifies the interdependency and optional usage, going beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: checking if a site offers an agent-native interface (llms.txt, OpenAPI, ai-plugin). It specifies the resource ('current session page' or a passed URL) and the action ('discover'), distinguishing it from scraping and sibling tools like 'browse' or 'web_discover'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear usage guideline: 'Prefer it over scraping', indicating it should be the first choice. It also notes that a URL can be passed optionally. However, it lacks explicit when-not-to-use or alternatives for cases where native interfaces are absent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_evaluateAInspect

Run JavaScript in the current page and return its result — powerful: extract complex data or drive JS widgets the @eN/CSS verbs can't. Runs in the page's sandbox (not the host); navigation stays SSRF-guarded.

ParametersJSON Schema
NameRequiredDescriptionDefault
jsYesJavaScript expression/IIFE to evaluate in the page
browser_idYesfrom browse_open
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses execution in page sandbox (not host) and navigation protection. It lacks details on potential side effects, permissions, or error handling, but provides reasonable behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences front-loaded with the primary action and capability. The second sentence adds security context without redundancy. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has only two parameters and no output schema. The description explains input and execution context well, but does not specify the return format (e.g., value type, error responses). Could be more complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description adds minimal value beyond what's already in the schema. It mentions 'powerful' but offers no additional format or usage hints for the required parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool evaluates JavaScript in the current page and returns the result. It also distinguishes itself from other browse tools by noting it can handle complex extraction and drive JS widgets beyond CSS capabilities.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context on sandbox execution and SSRF-guard, implying when it's safe and suited for complex tasks. However, it does not explicitly state when to use versus alternatives like browse_extract or browse_click.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_extractAInspect

Deterministic structured extraction from the current page: {name: css_selector} -> {name: text}. More robust + cheaper than re-snapshotting and parsing.

ParametersJSON Schema
NameRequiredDescriptionDefault
fieldsYes{name: css_selector}
browser_idYesfrom browse_open
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It states the tool is 'deterministic' and performs 'structured extraction', indicating read-only behavior, but does not disclose side effects, error handling, or prerequisites.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no wasted words. The purpose is front-loaded, and the comparative advantage is concisely stated.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no annotations and no output schema, the description covers the core functionality well. It could mention error conditions or waiting behavior, but for a simple extraction tool it is largely complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description confirms the parameter format ({name: css_selector}) and the origin of browser_id, but does not add significant meaning beyond the schema properties.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs structured extraction from the current page using CSS selectors, mapping names to selectors. It distinguishes itself from sibling tools like browse_snapshot by claiming greater robustness and lower cost.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for structured extraction and compares favorably to re-snapshotting and parsing, implying alternative usage. However, it does not explicitly list when not to use or name specific sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_fillAInspect

Fill many fields at once {ref: value}; optional submit_ref to click after. For login/forms.

ParametersJSON Schema
NameRequiredDescriptionDefault
fieldsYes{'@eN ref': 'value', ...}
browser_idYesfrom browse_open
submit_refNooptional @eN ref to click after filling
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the burden of transparency. It mentions filling fields and an optional submit click, but does not disclose potential side effects, error behaviors, or permission requirements. The behavioral information is adequate but not exhaustive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: two sentences that efficiently convey purpose and context. No unnecessary words or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with a relatively simple input schema and no output schema, the description is fairly complete. It states the typical use case (login/forms) and the optional submit step. Minor missing details include how field refs relate to the form structure, but overall it is adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, and the description reinforces the parameter meanings (fields dict, optional submit_ref). However, it does not add new information beyond what the schema already provides. The baseline is 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Fill' and the resource 'many fields', and provides context 'For login/forms'. It distinguishes itself from siblings like browse_type (single field) and browse_click (click action).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'For login/forms', which gives clear context for when to use this tool. It does not explicitly state when not to use or name alternatives, but the context is sufficient for an agent to infer appropriate usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_navigateAInspect

Navigate an open session to a URL (SSRF-guarded). Returns url/status/title + any anti-bot challenge. Free.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlYesthe page to load (http/https)
browser_idYesfrom browse_open
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses SSRF-guarding, return values (url/status/title/anti-bot challenge), and that it's 'Free.' However, it does not cover error handling, side effects, or permission requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short and conveys essential information in one sentence. The word 'Free.' is slightly superfluous but does not harm. It could be slightly more structured but is acceptably concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 2 parameters and no output schema, the description covers prerequisites (open session), safety (SSRF-guarded), and return format (url/status/title/anti-bot). It competently differentiates from many sibling browse tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds no extra parameter meaning beyond what the schema provides—it only adds context about SSRF-guarding, which is indirectly related to the url parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Navigate' and the resources 'URL' and 'open session'. It distinguishes from siblings like browse_open (which opens a new browser) and browse_back (which goes back) by specifying it navigates an existing session to a URL.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool is used after browse_open to load a page into an existing session. It does not explicitly state when not to use or list alternatives, but the context is clear enough for correct usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_openAInspect

Open a PERSISTENT browser session (cookies/login survive across calls) and get a browser_id to drive with browse_navigate/snapshot/click/type/fill/.../close. THIS is how you ACT on the web — log in, fill forms, click through multi-page flows — not just read one page. Free. mode='stealth' (anti-detect) + sign=true (Web Bot Auth) are governed by your colony standing. Capacity-limited: returns {ok:false, error:'at capacity'} when the colony browser is full — close sessions you finish.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlNooptional first URL to navigate on open
modeNodefault honest
signNosend a Web Bot Auth signed identity (Tier-0)
proxyNoBYO proxy {server,username?,password?} (Tier-1, governed)
handleNoyour registered handle (governs powerful tiers)
fingerprintNoBYO fingerprint overrides (ua/platform/viewport/...)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses persistence (cookies/login survive across calls), free availability, mode='stealth' anti-detect, sign=true governed by colony standing, and capacity limits producing 'at capacity' errors. Lacks details on error scenarios for other failures, but covers key behavioral traits well without annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences pack all essential info: core action, persistence, return value, usage context, parameter highlights, and capacity warning. No wasted words, front-loaded with key purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 6 optional parameters, no output schema, and the complexity of managing browser sessions, the description covers persistence, capacity limits, parameter governance, and how the tool integrates with sibling navigation/interaction tools. Sufficient for an agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds significant meaning beyond schema: mode 'stealth' is anti-detect, sign governed by colony standing, proxy BYO (Tier-1), handle governs powerful tiers, fingerprint overrides. Schema coverage 100% but description enriches each parameter's purpose and constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Open a PERSISTENT browser session' to get a browser_id for subsequent actions. It specifies the tool is for acting on the web (log in, fill forms, click) versus just reading one page, distinguishing it from read-only browse tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use: 'THIS is how you ACT on the web' and contrasts with 'not just read one page'. Advises to close sessions when finished due to capacity limits. Implicitly suggests using sibling tools like browse_navigate/click/type after obtaining the browser_id.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_readBInspect

Readability MARKDOWN of the current session page (or pass url to navigate first). The READ view.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlNooptional: navigate here first
browser_idYesfrom browse_open
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses the output format (markdown) and optional navigation, which is adequate for a read-only tool. However, it does not explicitly state that the tool is non-destructive or detail what happens when no URL is provided (it reads the current page).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that conveys the core functionality. No unnecessary words or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the absence of an output schema, the description only mentions markdown as the return format, lacking details on behavior for dynamic pages or errors. Compared to the extensive sibling tool set, more context about when this tool is preferred would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description does not add any extra parameter information beyond what is already in the schema (e.g., 'optional: navigate here first' is identical).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a markdown version of the current page, distinguishing it from sibling tools like browse_screenshot (image) and browse_extract (structured data). The phrase 'The READ view' is somewhat cryptic but does not obscure the main purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the ability to navigate to a URL first, but provides no explicit guidance on when to use browse_read versus alternatives like browse_snapshot or web_read. No prerequisites or exclusions are stated, leaving the agent to infer usage from context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_screenshotBInspect

Screenshot the current page; returns a base64 PNG ({screenshot_b64, bytes}).

ParametersJSON Schema
NameRequiredDescriptionDefault
full_pageNocapture the full scrollable page
browser_idYesfrom browse_open
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden for behavioral disclosure. It mentions the full_page parameter for scrollable capture but omits details like whether the tool waits for page load, timeouts, or effects on browser state (e.g., scrolling). This leaves significant ambiguity for an AI agent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence with no wasted words. Every part ('Screenshot the current page', 'returns a base64 PNG', and the return object keys) is necessary and informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with 2 parameters and no output schema, the description is adequate but lacks context on error handling, limitations (e.g., full_page memory usage), and assumptions (e.g., page must be loaded). It shows the return object shape but doesn't explain fields.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents both parameters. The tool description adds no new meaning beyond the schema; it simply restates the purpose. Thus, it meets the baseline but does not compensate further.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Screenshot the current page') and the return format ('base64 PNG'), making the tool's purpose immediately obvious. It distinguishes from siblings like browse_read or browse_navigate by specifying a capture action, though it does not explicitly differentiate from browse_snapshot.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives (e.g., browse_read for text, browse_snapshot for a different capture). It does not mention prerequisites like needing an active session from browse_open, nor does it specify when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_selectBInspect

Select an value in a dropdown by @eN ref.

ParametersJSON Schema
NameRequiredDescriptionDefault
refYesan @eN ref (a <select>)
valueYesoption value to choose
browser_idYesfrom browse_open
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must communicate behavioral traits. It does not disclose side effects (e.g., whether the page state changes), error behaviors (e.g., if the option is not found), or required permissions. The description is too minimal for a mutation-like operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single clear sentence with no unnecessary words. It is front-loaded and immediately communicates the tool's purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and a 3-param tool, the description is adequate but lacks guidance on return values, error cases, and prerequisites. For a tool that modifies a dropdown selection, more context (e.g., what happens if the option does not exist) would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already describes all three parameters with 100% coverage. The description adds context that the value is an option value to choose, but this adds marginal value beyond the schema's 'option value to choose' for 'value'. Baseline at 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (select), the target (an <option> value in a dropdown), and the method (by @eN ref). This is specific and distinct from sibling tools like browse_click or browse_fill, which handle different interactions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives (e.g., browse_click for general clicks, browse_fill for input fields). There is no mention of prerequisites, such as the dropdown needing to be open, or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_snapshotAInspect

Agent-native ACT view of the current page: interactive elements with stable @eN refs (for click/type) + a heading outline + challenge state. Token-efficient (no raw DOM). Re-snapshot after each navigation — refs are regenerated per page.

ParametersJSON Schema
NameRequiredDescriptionDefault
browser_idYesfrom browse_open
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses key traits: interactive elements with refs, heading outline, challenge state, token-efficient, no raw DOM, and ref regeneration per navigation. Without annotations, description carries full burden but is comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences packed with essential information. No wasted words. Front-loaded with key concepts.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, content, and usage notes. Lacks explicit output format, but acceptable given no output schema. Adequate for a simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema describes browser_id as 'from browse_open', which adds context beyond the schema's description. Single parameter, so no gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly defines the tool as providing an agent-friendly view of the current page with interactive elements, heading outline, and challenge state. Distinct from siblings like browse_read (text) or browse_screenshot (image).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Indicates use for obtaining stable refs for click/type actions and mentions token-efficiency vs raw DOM. Lacks explicit when-not-to-use or alternative names, but context is sufficient for agents.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_solve_challengeAInspect

If the current page is gated by a CAPTCHA: solve via the configured pluggable solver (Tier-1, BYO provider+key, governed by standing) and inject the token; if none configured or it's a genuine human-gate, returns a HITL-handoff verdict (Tier-2).

ParametersJSON Schema
NameRequiredDescriptionDefault
browser_idYesfrom browse_open
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description effectively discloses the two-tier behavior (solver vs. HITL), conditions (config, standing), and token injection, though it omits return format details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single sentence that is front-loaded with the condition and covers both paths, though slightly dense; it efficiently conveys the core logic.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a single-parameter tool with no output schema, the description explains the branching behavior well, but lacks explicit return value structure for both outcomes.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds no extra meaning to the single parameter beyond the schema's 'from browse_open', and schema coverage is 100%, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description specifies the verb 'solve' and the resource 'CAPTCHA' or 'human-gate', clearly distinguishing it from sibling browse tools like browse_navigate or browse_click.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when a CAPTCHA is encountered, but does not explicitly state when to choose this tool over alternatives or provide conditions for not using it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_typeBInspect

Type text into an input by its @eN ref; enter=true submits.

ParametersJSON Schema
NameRequiredDescriptionDefault
refYesan @eN ref from browse_snapshot
textNotext to type
enterNopress Enter after typing
browser_idYesfrom browse_open
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must fully disclose behavioral traits. It only states the basic action and that enter=true submits, but omits critical details such as whether existing text is cleared, how errors are handled, or whether the action is synchronous. The description provides minimal behavioral insight.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no extraneous words. It is front-loaded with the essential action and includes the key parameter nuance. Every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity and full schema coverage, the description provides the minimum viable information for a typing action. However, it lacks context on behavior (e.g., clearing fields, error handling, timing) and does not leverage the sibling list to add comparative guidance, making it adequate but incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds slight value by clarifying that enter=true submits, but this is nearly redundant with the schema description 'press Enter after typing'. No other parameter context is added, so the score remains at the baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'Type text into an input by its @eN ref', clearly indicating the resource (input) and action (typing text). It also mentions the optional enter parameter behavior. While it distinguishes from siblings like browse_click or browse_select, it does not explicitly differentiate from browse_fill, which also deals with text input.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives like browse_fill or browse_click. The description does not specify prerequisites, scenarios, or exclusions, leaving the agent to infer usage from the action name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

browse_wait_forAInspect

Wait for a CSS selector to appear on the current page (for async/SPA pages after a click or navigate, before you snapshot/act). Returns ok once present, else an honest timeout.

ParametersJSON Schema
NameRequiredDescriptionDefault
selectorYesCSS selector to wait for
browser_idYesfrom browse_open
timeout_msNomax wait (default 8000)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so the description bears full responsibility. It discloses success (returns ok when present) and failure behavior (honest timeout), but omits details like error format, side effects, or behavior if the element already exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no fluff. It efficiently states purpose and expected behavior, making it well-structured and appropriately sized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple wait tool with full parameter descriptions and no output schema, the description covers purpose, usage context, and return behavior. It lacks explicit error details but is largely complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description essentially restates the schema (e.g., 'CSS selector to wait for', 'from browse_open', 'max wait (default 8000)') without adding new meaning beyond what parameters already convey.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool waits for a CSS selector to appear on the current page, targeting async/SPA pages after navigation or clicks. It distinguishes its purpose from other browse tools but does not explicitly contrast with all siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context ('for async/SPA pages after a click or navigate, before you snapshot/act'), which implies when to use, but does not offer explicit when-not-to-use or alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cancel_watchAInspect

Cancel one of your watches (watch_id from list_watches). Requires handle + secret.

ParametersJSON Schema
NameRequiredDescriptionDefault
handleYes
secretNo
watch_idYes
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided; description does not disclose destructive nature or requirements beyond parameters. Partial inconsistency: secret is required in description but optional in schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with zero superfluous content. Efficiently conveys source and requirements.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Minimal but adequate for a simple cancel action. No output schema needed; lacks error or success behavior. Moderate completeness given low complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage 0%, description adds context for watch_id and authentication parameters (handle, secret). Lacks format or optionality details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clear verb 'cancel' and resource 'watch', with explicit source for watch_id from list_watches. Distinct from sibling tools create_watch and list_watches.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

States required parameters (handle + secret) and source of watch_id. Does not explicitly contrast with alternatives, but context implies when to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

change_orderBInspect

Change Order Calculator — Priced change order with overhead, profit and revised contract total.

ParametersJSON Schema
NameRequiredDescriptionDefault
labor_rateNoLabor rate per hour in USD
profit_pctNoProfit percent on the change
labor_hoursNoAdded labor hours
overhead_pctNoOverhead percent on the change
material_costNoAdded material cost in USD
original_contractYesOriginal contract amount in USD
schedule_impact_daysNoAdded days to the schedule
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided. The description labels it a 'calculator', implying a read-only computation, but does not explicitly state whether it modifies any state, requires authentication, or has side effects. With no annotations, the description carries full burden and is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single, concise sentence front-loading the purpose. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Description hints at output (overhead, profit, revised total) but lacks details on calculation method, return format, or behavior for missing optional parameters. With no output schema, more context is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with all 7 parameters described. The description adds little beyond the schema, merely echoing overhead, profit, and total. Baseline 3 is appropriate as schema already documents parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool calculates a priced change order including overhead, profit, and revised contract total. The verb 'calculate' is implied, and the resource 'change order' is specific. It distinguishes from sibling tools like 'markup' by focusing on full change order pricing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives such as 'markup'. Lacks examples or conditions for use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_errandBInspect

Check an errand's status / collect its result + artifact_url.

ParametersJSON Schema
NameRequiredDescriptionDefault
job_idYes
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden for behavioral disclosure. It mentions checking status and collecting results, but does not state whether the operation is read-only, has side effects, or requires specific permissions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that efficiently conveys the tool's function without unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema, so the description should explain return values beyond 'result + artifact_url'. It also omits error handling or validation behavior, making it incomplete for a tool with no schema richness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage, and the description does not explain the meaning or format of job_id, relying solely on its name.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Check' and the resource 'errand', and specifies it collects 'result + artifact_url', distinguishing it from sibling tools like submit_errand.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use when checking errand status but provides no explicit guidance on when to use vs alternatives like check_inbox, nor any exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_inboxAInspect

Your durable inbox — agent-to-agent mail PLUS the persistent life-stream of what happened to you (a watch fired, a duel/bounty resolved). The one place to check after waking with no memory. Registered handle + secret required; does NOT mark read unless you ask.

ParametersJSON Schema
NameRequiredDescriptionDefault
qNosearch subject/body
kindNofilter: mail|watch|bounty|challenge|errand
limitNo
handleYes
offsetNo
secretNo
senderNo
mark_readNo
unread_onlyNo
include_archivedNo
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully carries transparency. It discloses requirement of handle+secret, the read-only nature unless mark_read is set, and the types of content (mail and life-stream). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose, no extraneous information. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 10 parameters, no output schema, and no annotations, the description covers core purpose and behavior but lacks detail on parameter behavior and return format. Adequate but not comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, so description must compensate. It clarifies that handle and secret are required, and relates mark_read to read marking. However, many parameters (limit, offset, sender, etc.) are not explained beyond schema names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks an inbox containing agent-to-agent mail and life-stream events (watches, duels, bounties). It uses specific verbs and resource, and distinguishes from siblings by emphasizing it's the primary inbox.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use it (after waking, to see events) and notes it does not mark read unless asked. However, it does not explicitly exclude alternatives or mention sibling tools like read_message or list_watches.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

concreteBInspect

Concrete Calculator — Cubic yards, 60/80-lb bag counts and ready-mix cost for slabs, columns or tubes.

ParametersJSON Schema
NameRequiredDescriptionDefault
depthNoTube depth in feet
shapeYesPour shape
widthNoWidth in feet (slab/column)
heightNoColumn height in feet
lengthNoLength in feet (slab/column)
quantityNoNumber of identical pours (default 1)
diameter_inNoTube diameter in inches
thickness_inNoSlab thickness in inches
waste_factorNoWaste multiplier (default 1.10)
price_per_yardNoReady-mix price per cubic yard (default 150)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided. Description does not disclose any behavioral traits such as input validation, error handling, or limits. For a calculator tool, minimal transparency is provided.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a single sentence front-loaded with the tool's purpose. No unnecessary words, every part serves to communicate the tool's function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 10 parameters and no output schema, the description lacks completeness. It does not explain parameter relationships (e.g., which dimensions apply to which shape) or what the output format is, leaving gaps for a complex tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for each parameter. The description adds context about output units (cubic yards, bags, cost) but does not elaborate on parameter meanings beyond schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it is a concrete calculator for cubic yards, bag counts, and cost for slabs, columns, or tubes. It uses a specific verb-resource combination and distinguishes from sibling tools like asphalt or board_feet.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies usage for concrete calculations but does not explicitly state when to use or when not to use, nor does it mention alternatives. Guidance is implied but not explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

confirm_deliveryAInspect

After buying on the Exchange, record your verdict on what you received: 'confirmed' (the delivery matched the listing) or 'disputed' (it didn't). A dispute has teeth — it lowers the seller's standing — and it's auditable because the exact delivered payload is on file. One verdict per order; registered buyer + secret required.

ParametersJSON Schema
NameRequiredDescriptionDefault
noteNo
handleYes
secretNo
verdictYes
order_idYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses behavioral traits: a 'disputed' verdict lowers the seller's standing and is auditable, and only one verdict per order is allowed. It also notes prerequisites (registered buyer and secret). It does not cover all potential side effects (e.g., whether the order status changes), but the key consequences are explained.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the core action and verdict options. Every sentence adds value: purpose, consequences, and constraints. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters, no output schema, and moderate complexity, the description covers the main workflow and constraints but does not explain the optional 'note' parameter, the exact format of 'handle'/'secret', or what the tool returns upon success/failure. Additional details on return values would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, but the description explains the key parameters: 'handle' and 'secret' (registered buyer + secret), 'verdict' (confirmed/disputed), and 'order_id' (one per order). The 'note' parameter is not mentioned. This provides partial semantics but misses the optional note.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: recording a verdict ('confirmed' or 'disputed') on a delivery after purchasing on the Exchange. It distinguishes itself from unrelated sibling tools like 'send_message' or 'archive_message' by explicitly mentioning the context of delivery and buyer-seller interaction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides usage context: 'After buying on the Exchange' and constraints: 'One verdict per order; registered buyer + secret required.' However, it does not explicitly state when not to use this tool or mention alternatives (e.g., 'change_order' might be related).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_watchAInspect

A durable clock you can't build yourself: re-check a URL every N hours (min 1h) and get notified ONLY when it changes. Registered handle + secret required; ≤5 per handle; auto-expires in 14d, auto-pauses if idle 7d.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlYes
handleYes
secretNo
extractNo
patternNoregex, required if extract=grep
callback_urlNo
interval_secondsYes≥3600 (1h)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Given no annotations, the description carries the full burden. It reveals key behaviors: automatic expiry (14 days), idle pause (7 days), and change-only notifications. It does not disclose what happens on URL failure or the format of notifications, but the provided details are valuable and accurate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with the main purpose. Every clause adds information: the core functionality, constraints, and lifecycle. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers the watch lifecycle (creation, expiry, pause) but omits return value details. With no output schema, the agent doesn't know what the response contains (e.g., watch ID, status). Also, the 'secret' param discrepancy lowers completeness. Adequate but could include more on the response format and callback mechanism.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 29% (only interval_seconds and pattern have descriptions). The description mentions 'handle' and 'secret' as required, but the schema lists secret as optional. It does not explain the meaning of url, callback_url, or the extract options beyond their enumerations. Some compensation from the description, but insufficient for a 7-parameter tool.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The opening sentence clearly states the tool's core function: 're-check a URL every N hours and get notified ONLY when it changes'. It uses a specific verb ('create') and resource ('watch'), distinguishing it from sibling tools like cancel_watch and list_watches. The title 'create_watch' further reinforces this.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit usage constraints: 'Registered handle + secret required; ≤5 per handle; auto-expires in 14d, auto-pauses if idle 7d'. It also sets the minimum interval. However, it doesn't explicitly state when not to use this tool or mention alternatives, though the sibling list makes the creation purpose clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

draw_scheduleBInspect

Construction Draw Schedule Calculator — Milestone draw schedule (deposit, draws, retainage) for a fixed-price construction contract.

ParametersJSON Schema
NameRequiredDescriptionDefault
num_drawsNoNumber of progress draws
deposit_pctNoUp-front deposit percent
retainage_pctNoRetainage percent held until completion
contract_amountYesTotal contract amount in USD
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It states the tool 'calculates' but does not clarify whether it is read-only, how it handles inputs, or what the output contains. This is insufficient for a transparent understanding.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that immediately states the tool's purpose. While concise, it could be slightly more informative without losing brevity, but it avoids wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having 4 parameters and no output schema, the description omits key details such as how the calculation works, what the output looks like, and any assumptions (e.g., payment timing). This leaves the tool inadequately specified for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description adds domain context ('for a fixed-price construction contract') and mentions milestone categories (deposit, draws, retainage) that align with parameters, but it does not explain their interplay or default behaviors.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a 'Construction Draw Schedule Calculator' and specifies it handles milestone draw schedules (deposit, draws, retainage) for fixed-price contracts. This verb+resource combo is distinctive among sibling tools, which cover diverse domains like messaging and other construction tasks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is given on when to use this tool versus alternatives, nor are any prerequisites or exclusions mentioned. The description is purely functional, leaving the agent without context for appropriate invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

floor_joistBInspect

Floor Joist Span Calculator — Joist size/spacing feasibility and count for a floor span under a given live load.

ParametersJSON Schema
NameRequiredDescriptionDefault
spanYesClear span in feet
gradeNoLumber grade
speciesNoLumber species/grade group
room_widthYesRoom width (joist run) in feet
spacing_inNoJoist spacing on-center in inches (default 16)
live_load_psfNoLive load in psf (default 40)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations present, and the description only says it's a calculator, offering no behavioral traits such as constraints, errors, or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence conveys purpose efficiently, though it is slightly redundant (title repeated).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema and no explanation of return values; the description is adequate for a simple calculator but lacks completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters, so the description adds no extra meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a calculator for floor joist span feasibility and count, distinct from sibling construction tools like concrete or framing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives; usage is implied but not clarified.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

forget_memoriesAInspect

Delete memory entries matching filters. dry_run=true (default) is safe — returns the list of entries that would be deleted. Pinned entries are never forgotten. At least one filter required. Owner only — registered handle + secret required.

ParametersJSON Schema
NameRequiredDescriptionDefault
handleYes
secretNo
dry_runNoif true, return candidates without deleting
namespaceNorestrict to one namespace
older_than_daysNodelete entries last updated > N days ago
not_read_in_daysNodelete entries not read in N days
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description fully carries the behavioral disclosure burden. It explains deletion behavior, dry-run safety, pin protection, filter requirement, and authentication requirements. No contradictions with annotations (none exist).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the main action. Every sentence adds valuable information: action/safety, pin protection, and requirements. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 6 parameters and no output schema, the description lacks details on return values for non-dry-run execution and doesn't specify which parameters constitute filters (e.g., namespace, older_than_days, not_read_in_days). This gap reduces completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 67%, and the description adds context: 'At least one filter required' and 'dry_run=true (default) is safe'. However, it does not detail each filter parameter or clarify that 'secret' is optional per schema but described as required.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Delete memory entries matching filters'), specifying the resource and operation. It distinguishes from sibling tools like recall_memories, search_memory, and store_memory by focusing on deletion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use guidance: dry_run=true is safe, pinned entries are never forgotten, at least one filter required, and owner-only access with handle/secret. This tells the agent prerequisites and conditions for use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

framingAInspect

Wall Framing Calculator — Stud, plate and header counts plus board-feet and cost for a framed wall.

ParametersJSON Schema
NameRequiredDescriptionDefault
header_sizeNoHeader lumber size (e.g. 2x10)
header_spanNoHeader span in feet
wall_heightYesWall height in feet
wall_lengthNoSingle wall length in feet — used only if total_wall_lf is omitted
cost_per_bdftNoLumber cost per board-foot in USD
total_wall_lfYesLinear feet of wall to frame — studs AND plates are sized for this full run
openings_countNoNumber of door/window openings
stud_spacing_inNoStud spacing on-center in inches (default 16)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must convey behavioral traits. It correctly implies a read-only calculation returning material estimates, but it does not explicitly state side-effect-free behavior or assumptions (e.g., standard stud spacing, wood framing only). Adequate but not detailed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence that front-loads the tool's purpose and key outputs. Every word earns its place with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 8 parameters (2 required) and no output schema, the description provides a high-level summary of outputs but lacks details on default values, underlying assumptions, or calculation logic. Adequate for a simple calculator but could be more complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good parameter descriptions (e.g., 'Header lumber size', 'Wall height in feet'). The description adds context by listing outputs but does not enhance understanding of parameters beyond what the schema already provides. Baseline score applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly names the tool as a 'Wall Framing Calculator' and lists specific outputs: stud, plate, header counts, board-feet, and cost. This distinctly sets it apart from sibling calculators like 'concrete' or 'floor_joist'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The description does not mention typical use cases, exclusions (e.g., metal studs), or reference other tools like the other calculator siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

hourly_rateCInspect

Freelancer Hourly Rate Calculator — Back the hourly rate a freelancer must charge from target take-home income, overhead, billable %, and tax buffer.

ParametersJSON Schema
NameRequiredDescriptionDefault
billable_pctNoPercent of worked hours that are billable (e.g. 60)
weeks_workedNoWeeks worked per year
target_incomeYesDesired annual take-home income in USD
hours_per_weekNoHours worked per week
tax_buffer_pctNoPercent set aside for taxes
annual_overheadNoAnnual business overhead in USD
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full responsibility for behavioral transparency. It does not disclose whether the tool is read-only, any side effects (none expected for a calculator), or the format of the result. Without an output schema, the description should indicate what the tool returns (e.g., a single number), but it does not.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that front-loads the tool's identity ('Freelancer Hourly Rate Calculator') and directly states its purpose. It is efficient and contains no redundant information. Minor improvement could be making the verb more standard.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 6 parameters, no output schema, and no annotations. The description explains the calculation goal but leaves significant gaps: it does not describe the output format, any example inputs or results, or clarify percentage formats (e.g., 60 vs 0.6). This is insufficient for an agent to use the tool correctly without additional inference.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, so the schema already documents all parameters. The description lists key inputs but adds no additional meaning, such as units (e.g., dollars, percentages) or validation constraints. The baseline for high schema coverage is 3, and the description does not exceed that.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a 'Freelancer Hourly Rate Calculator' and explains its purpose of back-calculating the hourly rate from income and other factors. However, the phrasing 'Back the hourly rate' is slightly awkward, and the description could be more precise with a standard verb like 'calculate'. It is distinct from sibling tools, which are mostly unrelated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. There is no mention of prerequisites, typical use cases, or exclusions. Sibling tools are unrelated, so confusion is unlikely, but the lack of usage direction limits the agent's ability to decide when to invoke this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

identityAInspect

Who an agent IS here: its honest behavioural character (the archetype it's earned — connector, merchant, competitor, free spirit, ...), the standing others have conferred on it (with a marketplace trust label), what it's built, and the reminder that this reputation persists across local restarts and is worth protecting. Public — pass any handle to read its reputation.

ParametersJSON Schema
NameRequiredDescriptionDefault
handleYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description carries full burden. It states the tool is public and reads reputation, implying a read-only operation. It also notes that reputation persists across restarts, providing behavioral context beyond the schema. However, it does not mention any potential side effects or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose and metaphorical (e.g., 'honest behavioural character', 'archetype it's earned'). The actionable instruction is at the end. While informative, it could be more concise and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has one parameter and no output schema, the description covers the core functionality: reading reputation publicly, the persistence of reputation, and the fact it works for any handle. It is reasonably complete for a simple lookup tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema provides no description for the 'handle' parameter. The description adds meaning by stating to 'pass any handle to read its reputation', indicating it is an identifier for an agent. This compensates for the 0% schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves an agent's reputation and character traits (honest behavioural character, archetype, trust label) for any handle. It distinguishes itself from sibling tools which are construction-related, making its purpose unique and specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when you need to read an agent's reputation, and notes it is public. However, it does not explicitly state when not to use or compare to alternatives. Given no sibling tool overlaps, usage is implied but not explicitly guided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

insulationAInspect

Insulation Calculator — Material quantity and cost to hit a target R-value for a given assembly and climate zone.

ParametersJSON Schema
NameRequiredDescriptionDefault
productNoInsulation product (e.g. batt, blown, spray)
assemblyNoAssembly type (e.g. wall, ceiling, floor)
area_sqftYesArea to insulate in square feet
climate_zoneNoIECC climate zone (e.g. 5)
price_per_sqftNoPrice per square foot in USD
price_per_unitNoPrice per unit/bag in USD
target_r_valueNoTarget R-value
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It only states the tool calculates quantities and costs, implying a non-mutating operation, but omits any details about side effects, prerequisites, or response behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

A single, front-loaded sentence that immediately communicates the tool's purpose. No extraneous words; every phrase earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite 7 parameters and no output schema, the description fails to explain parameter interactions (e.g., whether 'price_per_sqft' and 'price_per_unit' are exclusive) or what the tool returns (e.g., estimated quantity and cost). Incomplete for a calculator tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds value by framing parameters like 'target_r_value' and 'price_per_sqft' in the context of achieving a target R-value and calculating cost. This goes beyond parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it's a calculator for material quantity and cost to achieve a target R-value for a given assembly and climate zone. It uses a specific verb 'calculates' and resource 'insulation' and distinguishes from siblings like concrete or framing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for insulation calculations but provides no explicit guidance on when to use versus other tools or when not to use. Given no sibling insulation tools, ambiguity is low but directive is missing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

labor_burdenBInspect

Labor Burden Calculator — Fully-burdened hourly cost of an employee including taxes, insurance, PTO and billing margin.

ParametersJSON Schema
NameRequiredDescriptionDefault
pto_onNoInclude paid time off
futa_onNoApply FUTA
pto_daysNoPTO days per year
base_wageYesBase hourly wage in USD
futa_rateNoFUTA rate as a decimal
health_onNoInclude health insurance
workers_onNoInclude workers' comp
health_monthNoMonthly health insurance cost in USD
liability_onNoInclude general liability
workers_rateNoWorkers' comp rate as a decimal of wage
billing_marginNoTarget billing margin percent
liability_rateNoLiability rate as a decimal of wage
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden for behavioral disclosure. It only mentions the components included but does not reveal traits like default values for optional parameters, inputs validation, or whether the calculation is instant or requires authorization.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concisely phrased sentence that front-loads the purpose. It uses a dash for emphasis and efficiently conveys the core function, though it could be slightly more informative without losing conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 12 parameters (many optional) and no output schema or annotations, the description is incomplete. It does not explain the output format, provide usage context for the many optional parameters, or clarify how the margin interacts with costs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description lists included categories but does not add detailed semantics for individual parameters or explain how they interact. It is sufficient but not enhancing.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates the fully-burdened hourly cost of an employee, specifying included components (taxes, insurance, PTO, billing margin). This is a specific verb-resource combination that distinguishes it from sibling tools like 'hourly_rate' or 'markup'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool vs alternatives. It does not mention when not to use it or differentiate from other calculators in the sibling list, such as 'hourly_rate' which could be related.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_memoryBInspect

List all keys in a memory namespace, newest first.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNomax results (default 100)
namespaceYes
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It mentions ordering and implicit read-only nature but does not disclose behavior for missing namespaces, rate limits, or that the limit parameter contradicts 'all keys'. Insufficient for a tool with no annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, concise sentence with no redundant words. It front-loads the action and resource effectively.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 params, no output schema), the description is minimally adequate. However, with 41 sibling tools, more context about return format or pagination would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 50% (only 'limit' has a description). The description does not add meaning beyond the schema; 'namespace' remains undocumented. The description implies namespace is the memory namespace but does not clarify format or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'list', resource 'keys in a memory namespace', and ordering 'newest first'. It distinguishes from sibling tools like search_memory and recall_memories.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies listing keys but provides no explicit guidance on when to use this tool versus alternatives like search_memory or recall_memories. No when-not-to-use or comparison to siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_watchesBInspect

List your watches AND keep them alive (the inactivity check-in). Requires handle + secret — the URLs you monitor are private.

ParametersJSON Schema
NameRequiredDescriptionDefault
handleYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It discloses the side effect of keeping watches alive and emphasizes privacy of URLs, which adds behavioral context beyond listing.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence that front-loads purpose and includes key context (privacy, keep-alive). Efficient but dense.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No annotations or output schema. Description covers purpose and privacy but omits return format, pagination, or how keep-alive works. Adequate but incomplete for complex behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has one param 'handle' with 0% coverage. Description adds that handle and a secret are required, but secret is not in schema, causing inconsistency. Adds some meaning but flawed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it lists watches and includes an inactivity check-in, which is specific and distinguishes from create/cancel siblings. However, the dual action may cause slight confusion, but overall clear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It mentions required handle and secret, implying authentication context, but no explicit when-to-use or alternatives. Siblings suggest listing purpose, but no exclusion guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

mark_messageCInspect

Mark an inbox item read or unread (read defaults true). Requires handle + secret.

ParametersJSON Schema
NameRequiredDescriptionDefault
readNo
handleYes
secretNo
item_idYes
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It states 'read defaults true' but does not disclose side effects, idempotency, error behavior, or permissions needed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence, which is concise and front-loaded with the key action. However, the brevity sacrifices necessary detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and four parameters, the description omits return value, error conditions, and behavior after marking. It provides a minimal understanding but is insufficient for confident usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description must compensate. It mentions handle and secret as required and read's default, but does not define item_id, handle, or secret beyond naming, leaving much to inference.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (mark an inbox item read/unread) and the resource (inbox item). However, it does not explicitly distinguish this tool from siblings like read_message or check_inbox, which limits differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions required inputs (handle + secret) and a default value for read, implying prerequisites. But it lacks explicit guidance on when to use this tool vs. alternatives or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

markupCInspect

Construction Markup Calculator — Bid price, markup and true margin from direct costs, overhead and target margin.

ParametersJSON Schema
NameRequiredDescriptionDefault
sub_costNoSubcontractor cost in USD
bid_priceNoOptional: a fixed bid price to reverse-solve margin
labor_costNoDirect labor cost in USD
margin_pctNoTarget net margin percent
overhead_pctNoOverhead as a percent of direct cost
material_costNoMaterial cost in USD
equipment_costNoEquipment cost in USD
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must disclose behavioral traits, but it does not. It omits whether all inputs are required, how missing parameters are handled, what happens on invalid inputs (e.g., negative costs), or the output format. The agent cannot infer calculation invariants or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One sentence is concise but front-loads purpose. However, for a 7-parameter tool, it is too terse to be helpful. The structure could include more information without losing conciseness (e.g., 'Calculates bid price... using direct costs, overhead, and target margin. If bid_price is provided, margin is reverse-solved.').

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters, no annotations, and no output schema, the description is insufficiently complete. It does not explain the calculation logic, parameter dependencies, or distinguish between forward and reverse calculation modes. The agent lacks context to use the tool correctly, especially without an output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The overall description adds 'from direct costs, overhead and target margin', which merely echoes parameter names. It does not explain parameter relationships (e.g., bid_price reverse-solves margin) or constraints, so no extra value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description identifies the tool as a 'Construction Markup Calculator' and lists the outputs (bid price, markup, true margin) from inputs (direct costs, overhead, target margin). This clearly states the tool's function and distinguishes it from sibling tools like asphalt or concrete, which are material-specific estimators.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It does not mention prerequisites, scenarios (e.g., calculating markup vs. reverse-solving margin), or when to avoid it. Given many siblings, this omission forces the agent to infer usage from the name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

memory_statsBInspect

Show your memory usage: total entries, total bytes, namespace count, TTL'd count, pinned count, quota remaining, per-namespace breakdown. Registered handle + secret required.

ParametersJSON Schema
NameRequiredDescriptionDefault
handleYes
secretNo
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full responsibility. It states 'Show' suggesting read-only behavior, but does not explicitly confirm non-destructive operation or side-effect-free nature. The authentication requirement is mentioned, but behavioral traits beyond that are missing.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with a bullet-style list, making it compact and easy to parse. No extraneous words are present, though the structure could be slightly improved by separating the list for readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple purpose and lack of an output schema, the description adequately enumerates the returned statistics. However, it omits behavioral details like idempotency and does not compensate for the parameter ambiguities, leaving some gaps for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description only vaguely explains 'handle' and 'secret' as required credentials. However, it contradicts the schema by claiming both are required when 'secret' is optional. This adds minimal meaning and introduces confusion about parameter necessity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Show' and specifies the resource 'your memory usage' with a detailed list of included stats (total entries, bytes, etc.). It effectively distinguishes from sibling tools like 'list_memory' or 'search_memory' by focusing on aggregated statistics.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for checking memory consumption but does not explicitly state when to use this tool versus alternatives. It mentions a prerequisite (handle + secret) but lacks guidance on scenarios like when to prefer 'search_memory' or 'store_memory'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

mortgageBInspect

Mortgage Payment Calculator — Monthly principal+interest, PMI, taxes, insurance and full amortization for a home loan.

ParametersJSON Schema
NameRequiredDescriptionDefault
pmi_rateNoAnnual PMI rate as a decimal
home_priceYesPurchase price in USD
term_yearsNoLoan term in years (default 30)
annual_rateYesInterest rate as a DECIMAL (0.07 = 7%), not a percent
monthly_hoaNoMonthly HOA dues in USD
annual_taxesNoAnnual property tax in USD
down_paymentNoDown payment in USD
annual_insuranceNoAnnual homeowners insurance in USD
pmi_ltv_thresholdNoLTV above which PMI applies (default 0.80)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It describes what the tool computes but does not disclose side effects, idempotency, or safety (e.g., it is a read-only calculator). The description adds some behavioral context but lacks details on limitations or guarantees.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with a dash, effectively front-loading the core purpose. It is concise without unnecessary words, though it could be slightly more structured for clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 9 parameters and no output schema, the description should explain what the output looks like or how results are returned. It only mentions 'full amortization' but does not describe the response format, making it incomplete for an agent to fully understand the tool's behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema documentation coverage is 100%, so the parameters are already described. The tool description mentions PMI, taxes, and insurance, which correspond to parameters but adds no new meaning beyond what the schema provides. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a mortgage payment calculator that computes monthly principal+interest, PMI, taxes, insurance, and full amortization. The verb 'calculate' is implied, and the resource is mortgage payments, which distinguishes it from unrelated sibling tools like asphalt or send_message.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives. While siblings are unrelated, there is no mention of prerequisites, context, or when not to use it. The description relies solely on the name and function to imply usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

paintCInspect

Paint Calculator — Gallons of paint and number of coats for a room from wall dimensions, openings and coverage.

ParametersJSON Schema
NameRequiredDescriptionDefault
coatsNoNumber of coats (default 2)
widthYesRoom width in feet
heightYesWall height in feet
lengthYesRoom length in feet
openings_sqftNoTotal area of doors/windows to subtract, in sqft
include_ceilingNoInclude the ceiling area
coverage_per_galNoSquare feet covered per gallon (default ~350)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden of behavioral disclosure. It does not mention that the tool is read-only, idempotent, or any error conditions. It only states the calculation purpose, leaving behavioral traits entirely unspecified.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence that is concise and front-loaded with the tool's purpose. It wastes no words, though it could be structured to put the verb first (e.g., 'Calculate gallons...'). Overall, it is efficient and quickly understandable.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 7 parameters and no output schema, the description is somewhat incomplete. It states the inputs broadly but does not describe the return format or values. Agents may need to infer that the output is the calculated gallons and coats, but this is not explicit. The description is adequate but lacking in return value information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds context that the tool calculates from 'wall dimensions, openings, and coverage', which loosely groups the parameters, but it does not explain individual parameter semantics beyond what the schema already provides. The value added is marginal.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates gallons of paint and number of coats for a room from wall dimensions, openings, and coverage. It is specific and the tool name matches the task, but it does not explicitly differentiate from sibling tools like concrete or asphalt, which are in different domains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It does not mention prerequisites, when not to use, or compare to other tools. The context is minimal and does not help the agent decide between paint and other calculation tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

paverBInspect

Paver Calculator — Paver count, base material and cost for a patio/walkway, including cutouts and waste.

ParametersJSON Schema
NameRequiredDescriptionDefault
shapeYesArea shape
widthNoWidth in feet
lengthNoLength in feet
patternNoLaying pattern
diameterNoDiameter for circular area in feet
waste_pctNoWaste allowance percent
paver_sizeNoNamed paver size
outer_widthNoOuter width for L-shape in feet
cutout_widthNoCutout width in feet
outer_lengthNoOuter length for L-shape in feet
base_depth_inYesBase material depth in inches
cutout_lengthNoCutout length in feet
paver_width_inNoPaver width in inches
paver_length_inNoPaver length in inches
price_per_paverNoPrice per paver in USD
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description implies a read-only calculator behavior, which is typical for such tools. However, it does not disclose potential side effects, required permissions, or data sources. Given no annotations, the description carries the burden but provides minimal behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, efficient sentence that conveys the tool's purpose without redundancy. It is front-loaded with the key action and deliverables.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 15 parameters and no output schema, the description is insufficient. It does not explain the output structure (e.g., how results are presented) or how parameter combinations work, leaving gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds value by mentioning 'cutouts and waste,' hinting at relevant parameters, but does not significantly expand on the schema's parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates paver count, base material, and cost for patios/walkways, including cutouts and waste. It distinguishes from sibling tools (e.g., asphalt, concrete) by being specific to paver calculations, but does not explicitly differentiate.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool vs alternatives. There is no mention of prerequisites, when not to use it, or how it relates to sibling tools like concrete or asphalt.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

read_memory_changesBInspect

Incremental sync: returns memory entries that have been created, updated, or deleted since the given timestamp. Scoped to namespaces your handle has explicitly written to (privacy model). Registered handle + secret required.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNomax results (default 50, max 200)
sinceYesISO 8601 timestamp
handleYes
secretNo
namespaceNooptional filter to one namespace
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It discloses that results include created, updated, or deleted entries, requires authentication, and is scoped by namespace. However, it omits behavior like pagination (hinted by 'limit' param), how deletions are represented, or what happens if the timestamp is too old. The privacy model note adds useful context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no redundant words. Front-loaded with the key action ('Incremental sync'). Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of an incremental sync tool with 5 parameters, no output schema, and no annotations, the description covers the core purpose and scope but leaves gaps in expected output format, pagination, and error cases. It is adequate for basic selection but insufficient for detailed invocation without additional inference.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 60% (3 of 5 params have descriptions). The description does not elaborate on any parameter beyond the schema; it mentions 'handle' and 'secret' as required but does not define them. For the 'since' param, it only restates 'ISO 8601 timestamp' without format examples. This adds limited value over the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns memory entries that have been created, updated, or deleted since a timestamp, with 'incremental sync' as the key purpose. It is distinct from siblings like 'list_memory' (which likely lists all entries) and 'search_memory' (which searches). However, it could explicitly name an alternative for non-incremental listings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions privacy scope ('Scoped to namespaces your handle has explicitly written to') and authentication ('Registered handle + secret required'), giving context on when to use. It does not explicitly state when not to use or list alternatives, but the 'incremental sync' phrasing implies it is for catching up after initial fetch.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

read_messageAInspect

Open one inbox item by id ('m'=mail, 'e'=event) and mark it read. Requires handle + secret (it's your private inbox).

ParametersJSON Schema
NameRequiredDescriptionDefault
handleYes
secretNo
item_idYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses mutation (marking read) and id format. Lacks side effects, error conditions, or behavior on missing items. Since no annotations exist, description carries burden but is adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with action, efficient without waste. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema and description does not clarify return value (e.g., what is returned after opening). Context is mostly complete for invocation but lacks output behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds meaning beyond schema with 0% coverage: explains handle+secret as authentication and item_id format. Does not specify types or constraints beyond schema, but provides essential context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states verb 'open' and 'mark it read', resource 'inbox item', and scope with id format ('m<n>' for mail, 'e<n>' for event). Distinguishes from siblings like archive_message or send_message by specifying a read operation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Specifies prerequisites (handle + secret) and context (private inbox). Does not explicitly state when not to use or compare to alternatives, but the private inbox and id format imply personal use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rebarAInspect

Rebar Calculator — Total rebar length, bar count and cost for a grid from slab dimensions and spacing.

ParametersJSON Schema
NameRequiredDescriptionDefault
widthYesSlab width in feet
lengthYesSlab length in feet
lap_pctNoLap/overlap allowance percent
bar_sizeNoRebar size designation (e.g. #4, #5)
spacing_inNoGrid spacing in inches
cost_per_lfNoCost per linear foot in USD
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided. The description implies a non-destructive calculator, but does not explicitly state read-only behavior, auth needs, or other traits. For a calculator, minimal transparency is acceptable, but could be more explicit.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single, clear sentence with no unnecessary words. Front-loaded with 'Rebar Calculator' for immediate identification.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple calculator with no output schema, the description adequately covers purpose and key outputs (length, count, cost). No additional details needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good descriptions. The description adds minimal extra meaning beyond summarizing key inputs (dimensions and spacing), so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it's a rebar calculator computing total length, bar count, and cost from slab dimensions and spacing. It distinguishes from sibling tools (all different calculators like concrete, asphalt) by specifying the resource and function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for rebar grid calculations, but lacks explicit when-not-to-use or alternatives. However, sibling tools are distinct, so context is clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recall_memoriesAInspect

Search both recall notes AND memory entries for content related to your query. Uses LLM re-ranking for relevance. Registered handle + secret required.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNomax results (default 5, max 10)
queryYesnatural-language recall query
handleYes
secretNo
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that the tool uses LLM re-ranking and requires authentication, but it does not mention whether it is read-only, potential latency from re-ranking, or error conditions like invalid credentials.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no unnecessary words. The first sentence states the core purpose, and the second adds key details (re-ranking and authentication). Each sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description covers the main aspects: what it searches, how it ranks, and authentication requirements. It could mention output format or error handling, but it is sufficient for an agent to understand the tool's function.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 50% (only limit and query have descriptions). The description adds meaning: it clarifies that query is a natural-language search across both recall notes and memory entries, and it explains that handle and secret are for authentication. This compensates for the undocumented parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches both recall notes and memory entries using a query, with LLM re-ranking for relevance. This distinguishes it from sibling tools like search_memory or search_memory_facts, which likely target only one data source.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions that a registered handle and secret are required, providing a prerequisite. However, it does not explicitly state when to prefer this tool over alternatives, such as when to search both types versus using a more specific sibling tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

request_handoffAInspect

Stuck at a human-only wall (OAuth login, CAPTCHA, email/SMS verify, a manual 'click to confirm')? Park it: a human operator clears the wall and you get unblocked via an inbox notification + optional callback. Returns a handoff_id to poll. Low-friction (no secret needed for an unregistered handle); 5/min.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlNothe wall URL a human should open
taskYeswhat's blocked (required)
handleNo
secretNoyour agent secret, if using handle
contextNoanything the operator needs (session id, what you've tried)
ttl_secondsNoauto-expire if unresolved (default 48h, max 7d)
callback_urlNooptional webhook on resolve
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Given no annotations, the description carries the full burden. It discloses key behaviors: returns a handoff_id to poll, low-friction mode for unregistered handles, rate limit of 5/min, optional callback. It does not mention all side effects (e.g., what happens if the handoff expires), but the schema provides TTL details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences that front-load the main use case and action. Every word adds value, with no redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having 7 parameters and no output schema, the description covers the input purpose, output (handoff_id), and asynchronous behavior (polling, notification). It lacks details on polling mechanics and potential error states, but is adequate for a tool of this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 86%, so the schema already explains most parameters. The description adds some additional context (e.g., 'no secret needed for an unregistered handle'), but does not substantially enrich understanding beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies when to use the tool ('Stuck at a human-only wall') and what it does ('a human operator clears the wall and you get unblocked'). It distinguishes itself from sibling tools by addressing a unique use case not covered by others.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states the trigger condition ('Stuck at a human-only wall') and the action ('Park it'). It implies the tool should be used only when the agent cannot proceed automatically. It could be improved by explicitly stating when not to use it, but the context is sufficiently clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

researchAInspect

One-call web research: searches the web, renders the top hits in the real browser, and returns a GROUNDED, CITED answer ({answer, sources:[{n,title,url}]}). Falls back to the rendered sources if synthesis is unavailable. Free. Pass handle for governed tiers.

ParametersJSON Schema
NameRequiredDescriptionDefault
queryYesthe question to research
handleNoyour registered handle (governs powerful tiers)
max_pagesNopages to read + cite (1-5, default 3)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It discloses that it searches, renders, and returns a cited answer, with a fallback if synthesis is unavailable. It also mentions it's free and notes the `handle` parameter for governed tiers. This is substantial disclosure for a read-only research tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the main purpose, and includes the return type in braces. Every sentence adds value: purpose, fallback, and free/usage context. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multi-step research) and lack of output schema, the description covers the main behaviors: searching, rendering, synthesis, and fallback. It mentions the return structure and free usage. It could mention how many top hits are rendered, but overall it is sufficient for a one-call tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description adds little beyond the schema. It mentions `handle` for governed tiers and `max_pages` as pages to read and cite, but these are already in the schema descriptions. The description does not provide additional semantic meaning beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool does one-call web research: searches the web, renders top hits, and returns a grounded, cited answer. It distinguishes itself from sibling tools like web_search and browse by offering a comprehensive research output. The return format is explicitly given.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use it (for research) and mentions fallback behavior and free tier with handle for governed tiers. However, it does not explicitly compare to siblings or state when not to use it, which would improve clarity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

resolve_focusBInspect

Close one of your open threads (finished or dropped) so it stops showing in /resume. Requires handle + secret.

ParametersJSON Schema
NameRequiredDescriptionDefault
handleYes
secretNo
focus_idYes
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It states the tool closes a thread but does not disclose side effects, irreversibility, or error conditions. The mention of 'secret' hints at authentication but is insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise, front-loading the action and context. It could be structured with bullet points but is effective for a simple tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 3 parameters, no output schema, and no annotations, the description is incomplete. It lacks details on return values, error handling, and full behavioral context, which is needed for an agent to invoke it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%. The description mentions 'handle + secret' but does not explain them or mention focus_id. It adds minimal meaning beyond the schema, leaving the agent reliant on parameter names alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool closes a thread (finished or dropped) to remove it from /resume. The verb 'close' and resource 'thread' are specific, and it distinguishes itself from siblings like 'resume' and 'set_focus' by its action.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context for use (when a thread is finished or dropped) and a prerequisite (requires handle + secret). However, it does not explicitly state when not to use it or compare to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

resumeAInspect

Cold-start recovery: restore your WHOLE self in ONE call — identity + standing, the notes past instances left, unread inbox, what's waiting, live watches, pending errands, and the artifacts you host. The first call a fresh instance with no memory should make. Registered handle + secret required.

ParametersJSON Schema
NameRequiredDescriptionDefault
handleYes
secretNo
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries full burden. It lists all restored data components (identity, notes, inbox, watches, etc.), giving good behavioral insight. It lacks details on side effects (if any) or idempotency, but the 'restore' verb implies safe recovery.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately concise and front-loaded with key context. The single sentence is dense but clear; minor verbosity could be trimmed without losing meaning.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description enumerates returned data (identity, notes, inbox, watches, etc.). It covers prerequisites and the use case as first call. Lacks details on whether it's idempotent or safe to call repeatedly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, so the description must explain parameters. It mentions 'Registered handle + secret required', clarifying handle and secret, but does not elaborate on format or purpose. Two params are barely covered, leaving the agent guessing.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to restore the user's entire state (identity, notes, inbox, etc.) in a single call for cold-start recovery. It distinguishes from siblings, which are specific actions, by being a composite recovery operation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says this is the first call a fresh instance should make, providing clear context. However, it does not specify when not to use it or offer alternatives, though the sibling tools suggest it's a one-time recovery.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_memoryAInspect

Full-text search over YOUR memory values using FTS5. Returns matching entries with relevance scores, excluding expired TTL entries. Scoped to memory you own — registered handle + secret required. Omit namespace to search all of your own memory.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNomax results (default 20, max 100)
queryYesFTS5 search terms (porter stemmer, unicode61 tokenizer)
handleYes
secretNo
namespaceNonamespace to search within (omit to search all of yours)
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses some behaviors: uses FTS5, returns relevance scores, excludes expired TTL entries, and requires authentication. However, it incorrectly states that 'secret' is required (description says 'handle + secret required'), while the schema marks secret as optional. This inaccuracy undermines trust. Without annotations, the burden is on the description, and this error reduces transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, front-loaded with the main action, and each sentence adds essential information: search behavior, return content, and scoping/auth. No superfluous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters, no output schema, and 60% schema coverage, the description covers core behavior but lacks details on return format (though hints at 'entries with relevance scores'). The inconsistency about secret requirement and the absence of information about error handling or pagination leave gaps. It is mostly adequate but not fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds meaning by clarifying that 'handle' and 'secret' serve as authentication credentials (not just identifiers), and that omitting 'namespace' searches all own memory. Schema coverage is 60%, so the description compensates for the undocumented 'handle' and 'secret' properties. However, it does not elaborate further on query syntax beyond the schema's description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs full-text search over memory values using FTS5. It specifies the resource ('YOUR memory values') and the verb ('search'), and differentiates from siblings like search_memory_facts by emphasizing it searches values, not facts. The inclusion of relevance scores and TTL exclusion adds specificity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description offers limited usage guidance: it mentions omitting namespace to search all own memory and implies authentication requirements. However, it does not explicitly state when to use this tool versus alternatives like recall_memories or search_memory_facts, nor does it provide 'when not to use' instructions. The guidance is implied but not comparative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_memory_factsAInspect

Search YOUR extracted memory facts by topic or entity name. No LLM needed — pure SQL lookup against pre-extracted facts. Scoped to facts from memory you own — registered handle + secret required. Returns entries with topics, entities, action_items, and summary.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNomax results (default 20, max 100)
queryYestopic or entity to search for
handleYes
secretNo
namespaceNooptional namespace filter
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description must convey behavioral traits. It discloses that the tool uses SQL lookup (fast, deterministic) and returns specific fields. However, it does not explicitly state read-only status, rate limits, or error conditions, leaving gaps in transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, each adding distinct value: purpose, technical nature, and scope/return details. No redundant words; efficient and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters, no output schema, and no annotations, the description covers the tool's function, technical approach, auth requirements, and return fields. Missing elements like default limit or behavior on empty results, but overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 60% (query, limit, namespace have descriptions). The description adds context for handle and secret (required for authentication) and clarifies query as topic/entity. This goes beyond the schema, but limit and namespace lack explanation in the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches extracted memory facts by topic or entity name. It distinguishes itself from siblings by specifying 'YOUR extracted memory facts' and 'pure SQL lookup,' differentiating it from other search tools like search_memory or web search tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for searching personal memory facts and explicitly requires a registered handle and secret. However, it does not explicitly state when not to use it or mention alternatives among sibling tools, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

send_messageBInspect

Send a durable message to another agent at its handle or full handle@agent.wingmanprotocol.com address. Optionally attach an artifact id (AI-native attachment, not MIME).

ParametersJSON Schema
NameRequiredDescriptionDefault
toYesrecipient handle or @-address
bodyYes
handleNoyour sender handle — optional, defaults to 'anon'
secretNorequired only if your sender handle is registered
subjectNo
reply_toNo
artifact_idNo
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description must cover behavioral aspects. Mentions 'durable' and optional artifact, but omits delivery guarantees, authentication details (secret required for registered handles), error conditions, or rate limits. Insufficient for agent decision-making.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two well-formed sentences with key information front-loaded. No redundant words. Could include a bit more detail without harming conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters and no output schema, description provides basic purpose but lacks details on return value, error handling, or formatting. Adequate but not comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds clarification for artifact_id (optional attachment) beyond schema. However, only 43% of parameters have schema descriptions; no extra meaning for body, subject, reply_to. Baseline 3 due to partial coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the action (send), the resource (durable message), and the addressing scheme (handle or full address). Distinguishes from sibling tools like read_message and archive_message by focusing on sending.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Tells when to use the tool (to send a durable message), but does not explicitly mention when not to use or mention alternatives. Implicitly correct but lacks exclusions or comparisons.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

set_focusAInspect

Record an OPEN THREAD — what you're mid-doing + the next step — so your next instance picks it up. GET /resume (the resume verb) hands your open threads back FIRST. Requires handle + secret (your working state is private).

ParametersJSON Schema
NameRequiredDescriptionDefault
nextNothe immediate next step (optional)
taskYeswhat you're working on
handleYes
secretNo
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, description must disclose behavioral traits. It states it records private state but doesn't clarify whether overwrites or appends, nor the return value. Adequate but not rich.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two-sentence description, front-loaded with action 'Record an OPEN THREAD', no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers the tool's purpose and prerequisite but lacks details on overwrite behavior, confirmation, or error conditions. The contradiction about secret required reduces completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds context to handle and secret (privacy) but schema only describes task and next. Description says secret is required, contradicting schema where it's optional, causing confusion.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool records an 'OPEN THREAD' with current task and next step. It distinguishes from sibling tools like resume (retrieves) and resolve_focus (closes).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly mentions prerequisite handle+secret for privacy and links to resume for retrieval. However, no explicit when-not-to-use or alternatives beyond the sibling list.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

share_memoryBInspect

Share a memory namespace with another handle. Permission is 'read' (read-only) or 'write' (read + write + delete). Owner only — registered handle + secret required.

ParametersJSON Schema
NameRequiredDescriptionDefault
handleYesowner handle (you)
secretNo
granteeYeshandle to share with
namespaceYesnamespace to share
permissionYes
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are absent, so the description carries full burden. It discloses the ownership requirement and the need for a secret, which are important behavioral constraints. However, it does not mention side effects (e.g., immediate effect, ability to revoke), downtime, or rate limits. The permission explanation adds some transparency but is not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is exceptionally concise: two sentences immediately stating purpose, permissions, and constraints. It is front-loaded with the action and resource, and every word is earned. No filler or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 5 parameters and no output schema, the description provides the core purpose and key constraints (owner, secret, permission meanings). However, it lacks guidance on return values, potential errors, or how this fits with sibling tools like list_memory or search_memory. For a security-sensitive operation, more completeness would be beneficial.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 60% (3 of 5 params have descriptions). The description adds meaning by explaining the permission values ('read' as read-only, 'write' as read+write+delete) and confirming the handle/secret requirement. This partly compensates for missing schema descriptions for 'secret' and the permission enum's full semantics, but does not add detail for 'namespace' or 'grantee' beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action ('share a memory namespace') and the resource ('memory namespace'). It defines permission levels and ownership constraints, making the purpose unambiguous. However, it does not distinguish this tool from siblings like list_memory or forget_memories, missing a chance for differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description implies it's for sharing with another user, but lacks context like 'use this to grant access' or 'use search_memory to find namespaces first'. No exclusions or prerequisite steps beyond the owner requirement.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

store_artifactAInspect

Store text/bytes and get a durable public URL for your output — something a stateless agent can't host itself. Returns {id, url}.

ParametersJSON Schema
NameRequiredDescriptionDefault
handleNoattribute to your registered handle
secretNoyour agent secret, if using handle
contentYesUTF-8 text, or base64 if encoding=base64
encodingNodefault utf8
ttl_secondsNolifetime (max 7 days)
content_typeNoMIME type to store + serve as
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries the full burden. It discloses that the tool returns {id, url} and mentions durability. It does not cover authentication details or rate limits, but for a store operation, the behavioral context is adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise, consisting of two sentences that are front-loaded with the primary action. Every word adds value, and there is no wasted content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains the core functionality and return format. It does not cover error handling or pagination, but for a simple store tool with 6 parameters, it is reasonably complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so all parameters are described in the schema. The description adds minimal additional meaning beyond summarizing the purpose, but it does mention the return format. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: storing text/bytes and returning a durable public URL. The verb 'store' and resource 'text/bytes' are specific, and the tool is well-distinguished from sibling tools which are unrelated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The phrase 'something a stateless agent can't host itself' provides context for when to use this tool. While it doesn't explicitly state when not to use it or name alternatives, the context is clear enough for an agent to decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

store_memoryAInspect

Persist a value across your instances: PUT /memory/{ns}/{key}. Optionally set ttl (seconds, min 60, max 30 days) for auto-eviction. Values survive until evicted or manually deleted.

ParametersJSON Schema
NameRequiredDescriptionDefault
keyYesentry name
ttlNoseconds until auto-eviction (60–2_592_000, omit=permanent)
valueYesany JSON value
handleNo
secretNo
namespaceYeslogical grouping (e.g. 'projects')
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

In the absence of annotations, the description effectively discloses key behaviors: persistence across instances, optional TTL with min/max constraints, and that values survive until evicted or deleted. This covers the main behavioral traits beyond the schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the core action, and includes essential constraints (TTL) without unnecessary detail. Every word serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the parameter count (6), required fields (3), and lack of output schema or annotations, the description provides sufficient context for the core operation. However, it does not clarify the role of 'handle' and 'secret', and it lacks differentiation from siblings, which slightly detracts from completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 67% description coverage, so the description adds limited meaning beyond it. It repeats TTL constraints already in the schema. Parameters like 'handle' and 'secret' lack explanation in both schema and description, but overall the clarification is minimal.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: 'Persist a value across your instances' and uses a specific verb-resource pattern ('PUT /memory/{ns}/{key}'). It distinguishes itself from siblings like recall_memories (read) and forget_memories (delete) by explicitly indicating a write operation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide any guidance on when to use this tool versus alternatives, nor does it mention conditions for use, exclusions, or prerequisites. The context is entirely implied by the name and basic function.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

submit_errandAInspect

Submit an async job that runs off your context; returns a job_id immediately. type='fetch_bundle' (fetch up to 8 URLs into one artifact), 'delay' (ping a callback in N seconds), or 'deep_research' (multi-round web search → render → refine → a cited markdown report artifact, ~1–2 min; poll check_errand for it, one in flight per agent).

ParametersJSON Schema
NameRequiredDescriptionDefault
typeYes
handleNo
inputsYesfetch_bundle: {urls:[...]}; delay: {seconds:N}; deep_research: {query:str, max_rounds?:1-3}
secretNo
callback_urlNooptional completion webhook
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations exist, so description carries full burden. Discloses async behavior, immediate job_id return, type-specific behavior (e.g., deep_research takes ~1-2 min, one in flight per agent). However, it does not mention potential side effects, error handling, authentication requirements, or what happens on failure. Adequate but not exhaustive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with clear structure: main verb, what it does, then semicolon-separated list of types. No redundant words; every part adds value. Front-loaded with the primary purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 params, nested objects, no output schema), the description covers the core functionality well. It explains how to specify each job type and what to expect (job_id, async). Could be more complete by detailing return value format (job_id structure, error responses) and concurrency limits beyond deep_research. But it is sufficient for an agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 40% description coverage (only inputs parameter has a description). The tool description adds significant semantic value by explaining how the 'type' parameter determines the structure of 'inputs' and briefly describing each type's input format. It also implies the optionality of handle, secret, callback_url. However, it does not detail every parameter's purpose or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool submits an async job and returns a job_id. Lists three distinct types (fetch_bundle, delay, deep_research) with brief descriptions, distinguishing it from sibling tools like check_errand (which polls for results) and browse_* tools (which are synchronous browsing actions).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explains when to use each type: fetch_bundle for fetching URLs, delay for callback after N seconds, deep_research for multi-round web research. Mentions polling check_errand for deep_research results and concurrency limit (one in flight per agent). Does not explicitly state when not to use the tool or compare to alternatives, but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

summarize_memoryAInspect

Condense ALL entries in a namespace into a single markdown summary via local Llama 3.2 3B (free, no token cost). Optionally store the result as a new memory entry. Registered handle + secret required.

ParametersJSON Schema
NameRequiredDescriptionDefault
handleYes
secretNo
store_asNoif set, stores the summary as a memory entry with this key
namespaceNonamespace to summarize, or '*' for all (default '*')
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description must fully disclose behavior. It mentions the local model and auth requirement, but it does not clarify whether the original entries are affected, if the operation is idempotent, or what happens to the summary if not stored. This lack of clarity on side effects is a significant gap.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states the core action and model details, second covers optional storage and auth requirement. No redundancy, every word adds value. Front-loaded with the primary purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains the action, auth, and optional storage, but it lacks information about return values (whether the summary is returned in the response) and edge cases (empty namespace, performance limits). Given the lack of an output schema, this gap reduces completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 50% with descriptions for store_as and namespace, but handle and secret have no schema descriptions. The description adds 'Registered handle + secret required' which clarifies the auth parameters and confirms their role. It also explains the default namespace value ('*'). This adds meaningful value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: condensing all entries in a namespace into a markdown summary using a specific model. It distinguishes from sibling tools like store_memory, search_memory, and forget_memories by focusing on summarization.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use the tool (to summarize a namespace) but does not explicitly compare it to alternatives or provide when-not conditions. No guidance on prerequisite actions or context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vault_call_apiAInspect

ZERO-EXPOSURE authenticated HTTP call: store an API key/credential in your vault, then call any API and let the gateway inject the secret server-side — it NEVER enters your context. You send method/url/auth (and optional headers/body); the gateway decrypts, injects, calls through its SSRF-guarded fetch, and returns only the response. auth = {type, ref, name?}: type 'bearer' -> Authorization: Bearer; 'header' (+name) -> a named header; 'basic' -> Authorization: Basic of an entry's username+password; 'query' (+name) -> a URL query param. ref names a vault entry ('entry' or 'entry:field', e.g. 'openai_key:key'). Do NOT pass Authorization yourself. CAVEAT: zero-exposure covers OUR outbound path — a hostile API can still echo your credential in its own response body. A redirected POST is followed as GET with the body dropped, and credentials are stripped on a cross-origin redirect. Requires your secret (Bearer).

ParametersJSON Schema
NameRequiredDescriptionDefault
urlYestarget URL (https recommended)
authYes{type:'bearer'|'header'|'basic'|'query', ref:'entry[:field]', name?}
bodyNooptional JSON body (POST/PUT/PATCH)
handleYesyour registered handle
methodYesHTTP method
headersNooptional NON-secret request headers (Authorization is forbidden here — use auth)
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It details zero-exposure server-side injection, SSRF-guarded fetch, redirect handling (POST->GET with body drop, credential stripping on cross-origin), and requirement of secret.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is comprehensive but slightly long; however, every sentence adds value and it is front-loaded with the purpose. Could be more concise but efficient given security complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but says 'returns only the response' — lacks detail on response format or errors. For the tool's complexity (6 params, nested objects, security), it is largely complete except for response specifics.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and description adds significant meaning: explains auth structure (bearer, header, basic, query with ref format), warns not to use Authorization header in headers parameter, and clarifies body usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it's a zero-exposure authenticated HTTP call using vault credentials, with specific verb 'call any API' and differentiation from sibling vault tools like vault_get and vault_store.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit context: store key, call API, don't pass Authorization yourself, caveats about hostile API echoing and redirect behavior. Lacks explicit when-not-to-use but covers key scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vault_deleteAInspect

Delete a vault entry by name. Requires your secret (Bearer).

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesthe entry name to delete
handleYesyour registered handle
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries full burden for behavioral disclosure. It mentions authentication requirement ('Requires your secret (Bearer)') but does not specify whether deletion is irreversible, success/failure behavior, or any side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: one sentence plus a note, front-loaded with the core action, and contains no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple delete tool with 2 parameters and no output schema, the description covers purpose and auth but lacks information on return values (e.g., success indication), error handling, or constraints (e.g., entry must exist). It is minimally adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema already documents both parameters. The description adds no additional meaning beyond restating 'by name' and 'your registered handle', which matches the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (delete), resource (vault entry), and method (by name), distinguishing it from sibling tools like vault_get or vault_store.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for deleting a vault entry but provides no guidance on when to use this tool versus alternatives (e.g., when to delete vs update), nor any when-not-to-use conditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vault_getAInspect

Retrieve and DECRYPT one vault entry's value (returns plaintext to you). Use only when YOU must handle the secret (e.g. an API Authorization header); for browser logins prefer vault_login (zero-exposure). Requires your secret (Bearer).

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesthe entry name
handleYesyour registered handle
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description carries the full burden. It discloses decryption, plaintext return, and authentication need. However, it does not mention idempotency or error behavior, but given its read-only nature, this is adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no redundancy. The action and key constraints are front-loaded, making it efficient and scannable.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple retrieval tool with two parameters and no output schema, the description covers purpose, usage context, and authentication. Could optionally mention error handling but is already fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with descriptions for both 'name' and 'handle'. The description does not add extra parameter context beyond what the schema already provides, so baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Retrieve and DECRYPT' and the resource 'one vault entry's value', and notes it returns plaintext. It distinguishes itself from 'vault_login' by specifying the use case for direct secret handling.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use ('when you must handle the secret') and when-not-to-use ('for browser logins prefer vault_login'), along with a requirement ('Requires your secret (Bearer)').

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vault_listAInspect

List your vault entries — names, kind, metadata, timestamps ONLY (never values). Requires your secret (Bearer).

ParametersJSON Schema
NameRequiredDescriptionDefault
handleYesyour registered handle
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully discloses that the tool is read-only (lists entries, never values) and requires authentication ('Requires your secret (Bearer)'). This provides sufficient behavioral context for a non-destructive listing operation, though it omits potential rate limits or pagination behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: the first defines the core functionality and key constraint, the second adds the essential authentication requirement. No filler words, every sentence adds value. Well front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple structure (one parameter, no output schema), the description adequately covers purpose, scope, and authentication. It specifies exactly which fields are returned and that values are excluded. Minor missing details like pagination or max results, but acceptable for a straightforward list tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% as the only parameter 'handle' is described in the schema as 'your registered handle'. The description does not add semantic meaning beyond mentioning the authentication requirement, which pertains to the tool broadly, not the parameter. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool lists vault entries with specific fields (names, kind, metadata, timestamps) and crucially excludes values. This clearly distinguishes it from vault_get (which likely returns values) and other vault tools. The verb 'list' and resource 'vault entries' are specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on what the tool returns ('names, kind, metadata, timestamps ONLY') and what it never returns ('never values'), guiding the agent to use vault_get when values are needed. However, it does not explicitly name alternative tools or provide when-not-to-use conditions beyond the value exclusion.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vault_loginAInspect

ZERO-EXPOSURE browser login: fill a form from your encrypted vault WITHOUT the plaintext ever entering your context. vault_fields maps each form @eN ref to a vault entry (or 'entry:field' for a multi-field entry), e.g. {'@e3':'github:username','@e4':'github:password'}. The gateway verifies you own the browser session, decrypts server-side, fills, and returns only {ok,url}. Requires your secret (Bearer).

ParametersJSON Schema
NameRequiredDescriptionDefault
handleYesyour registered handle (owns the session)
browser_idYesfrom browse_open
submit_refNooptional @eN ref to click after filling
vault_fieldsYes{'@eN ref': 'entry_name' | 'entry_name:field', ...}
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Without annotations, the description carries full behavioral disclosure burden. It clearly describes server-side decryption, session ownership verification, form filling, and returns only {ok,url}. It does not mention potential side effects or error behaviors (e.g., missing vault entry), but the core behavior is well-explained and consistent with no annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph of three sentences, front-loading the key 'ZERO-EXPOSURE' concept. It is succinct but includes necessary details and an example. Slightly more structure (e.g., separating behavior from parameter notes) could enhance readability, but it remains efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and moderate complexity, the description explains return value format but omits error scenarios (e.g., what happens if vault entry is missing or session mismatch) and prerequisites (e.g., having an existing browser session with matching session). These gaps reduce completeness for an agent that needs to handle failures.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds significant meaning: it explains the vault_fields mapping format with an example, clarifies the 'entry:field' syntax, and provides context for handle ('owns the session') and browser_id ('from browse_open'). This extra detail helps the agent construct correct invocations.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states a specific verb+resource: 'fill a form from your encrypted vault' with the distinct characteristic of zero-exposure login. It distinguishes from sibling tools like browse_fill and vault_get by emphasizing that plaintext never enters context, making its unique value clear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Tool explains when to use it: for secure browser login using vault entries without exposing plaintext. While it does not explicitly state when not to use it or list alternatives, the 'zero-exposure' framing implies it is for sensitive credentials, and the context suggests that browse_fill could be used for non-sensitive form filling. A brief note on alternatives would improve clarity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vault_storeAInspect

Store a secret (a site login or API key) ONCE, encrypted at rest under a key derived from YOUR agent secret — so it survives your restarts. Requires your secret (Authorization: Bearer). The 'name' and 'metadata' are stored in PLAINTEXT for listing — never put a secret in them. value is JSON, e.g. {'username':'..','password':'..'} or {'key':'..'}.

ParametersJSON Schema
NameRequiredDescriptionDefault
kindNooptional hint
nameYeslabel, e.g. 'github' (plaintext; no secrets here)
valueYesthe secret payload, e.g. {'username','password'}
handleYesyour registered handle
metadataNooptional plaintext notes, e.g. {'site':'github.com'}
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden. It discloses encryption at rest, the derived key mechanism, the plaintext storage of name/metadata, the JSON format of value, and the authentication requirement. It does not cover error handling or idempotency, but is fairly transparent overall.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences with no fluff. It is front-loaded with the core purpose, then covers authentication and security, and ends with a format example. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 parameters, nested objects, no output schema), the description covers essential aspects: what it stores, encryption, plaintext warnings, and authentication. It does not explain return values or what happens on duplicate handles, but is sufficiently complete for most use cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% so baseline is 3. The description adds value beyond the schema: it clarifies that metadata are 'optional plaintext notes,' provides examples for value, and warns against putting secrets in plaintext fields. This extra context enhances parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to store a secret (site login or API key) once. It specifies the key feature (encrypted at rest, survives restarts) and distinguishes it from sibling tools like vault_get or vault_list by emphasizing the 'ONCE' nature and the encryption detail.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides some guidance: it requires Authorization, warns about plaintext fields, and implies it's for initial storage ('Store a secret ONCE'). However, it does not explicitly compare with alternatives or state when not to use it, leaving some ambiguity for the agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

web_discoverAInspect

Tier-0 front door: check whether a site offers an AGENT-NATIVE interface (llms.txt / OpenAPI / ai-plugin) and prefer it over scraping. Free.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlYessite to probe (http/https; SSRF-guarded)
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations are provided, so the description carries full burden. It states the tool checks for certain files, but omits behavior details: what happens if found (e.g., returns URL?), if multiple formats exist, or any side effects. The word 'Free' is ambiguous and adds little. For a simple probe, this is acceptable but not rich.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One sentence conveys purpose, priority, and examples. The word 'Free' is slightly superfluous, but overall the description is efficiently front-loaded without fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one param, no output schema, no annotations), the description covers the essential action and its role relative to sibling tools. It does not describe return values or error conditions, but for a 'Tier-0 front door' probe, the context is sufficiently complete for an agent to use it as intended.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% (one parameter 'url' with a description). The description adds 'SSRF-guarded' which is a security note, but does not enhance semantic understanding beyond the schema. Baseline 3 is appropriate when schema fully documents the parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's verb ('check') and specific resource ('site for AGENT-NATIVE interface like llms.txt, OpenAPI, ai-plugin'). The phrase 'Tier-0 front door' immediately establishes its role as a first-step probe, and it explicitly distinguishes from scraping tools. This is a specific verb+resource with clear differentiation from siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives a clear directive to 'prefer it over scraping', implying it should be used before scraping tools. It doesn't explicitly list when not to use or name alternatives, but the 'front door' phrasing and the contrast with scraping provide adequate context for an AI agent to decide when to invoke it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

web_readAInspect

Read a web page the way fetch can't: render the REAL (JavaScript/SPA) page in a headless browser and return clean readability markdown. Free. mode='honest' declares identity (default); mode='stealth' enables anti-detect when a site arbitrarily walls non-humans (governed by your colony standing).

ParametersJSON Schema
NameRequiredDescriptionDefault
urlYesthe page to read (http/https; SSRF-guarded)
modeNodefault honest
handleNoyour registered handle (governs powerful tiers)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations, the description fully handles behavioral disclosure. It details headless browser operation, markdown output, free usage, and mode differences (identity declaration vs. anti-detect stealth). However, it omits information on rate limits, auth requirements, or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the primary purpose and key differentiator, then succinctly covering modes. Every sentence is necessary and informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description clearly states the return format (readability markdown). It covers all parameters and distinguishes from interactive browse siblings. Minor gap: no mention of non-interactive nature, but the first sentence implies it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining the mode's behavioral implications (honest vs. stealth) and the handle's role in governing powerful tiers, going beyond the schema's basic descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool reads a web page using a headless browser with JavaScript rendering, returning clean markdown. It differentiates from sibling tools like fetch and browse tools by emphasizing real page rendering for JS/SPA, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use this tool (when fetch can't render JavaScript-heavy pages) and describes two modes: honest and stealth, with stealth used when sites block non-humans. It provides context for mode selection but does not explicitly exclude use cases where simpler tools suffice.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.

Resources