Postmark

Name: Postmark
Author: pipeworx-io

by io.github.pipeworx-io

Server Details

Postmark MCP.

Status: Healthy
Last Tested: 2026-08-03 16:35
Transport: Streamable HTTP
URL
Repository: pipeworx-io/mcp-postmark
GitHub Stars: 0
Server Listing: mcp-postmark

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

B3/5.0

Tool DescriptionsB

Average 4.3/5 across 40 of 40 tools scored. Lowest: 1.9/5.

Server CoherenceC

Disambiguation3/5

Most tools have distinct purposes, but there are some overlapping pairs (ask_pipeworx/ask_pipeworx_grounded, entity_profile/compare_entities) that could cause confusion. Detailed descriptions help differentiate, but the mix of email and data tools adds ambiguity.

Naming Consistency2/5

Tool names follow inconsistent patterns: some are imperative verbs (send, bounce), others are plural nouns (bounces, recent_alerts), and some are compound (deep_research, entity_profile). No consistent verb_noun or naming convention is evident.

Tool Count2/5

With 40 tools covering both email sending and data retrieval, the count is high for a coherent server. Many tools are highly specialized (e.g., polymarket_arbitrage) while core email CRUD features are missing, suggesting the set is overgrown and unfocused.

Completeness2/5

The email subset lacks basic features like template management, suppression lists, or analytics. The data retrieval side is more extensive, but the server as a whole feels incomplete for its named domain (Postmark) and includes many unrelated tools.

Available Tools

41 tools

ai_visibility_checkAI Visibility CheckA

Read-onlyIdempotent

Inspect

Probe one or more LLMs for what they know about a business / brand / product / topic and score visibility (0-100) per model. Default model is Workers AI Llama-3.3-70b (free); pass _apiKey to also probe Anthropic (BYO key — you pay Anthropic directly for those calls). Returns per-model {score, confidence, signals, raw_response} + a combined view. Useful for AI-marketing audits, pre-launch brand checks, competitive monitoring.

ParametersJSON Schema

Name	Required	Description
`entity`	Yes	The thing to ask about. Brand/business name, product name, person, or topic. E.g. "Pipeworx", "OpenInvoice", "Acme Corp pricing".
`models`	No	Which models to probe. Supported: "workers-ai" (free default), "anthropic" (requires _apiKey). Omit for just workers-ai.
`_apiKey`	No	Optional Anthropic API key (sk-ant-...) — only needed if "anthropic" is in models. Passed straight through to api.anthropic.com.
`context`	No	Optional: a phrase locating the entity (e.g. "Boston restaurant", "B2B SaaS"). Helps disambiguate common names.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare read-only, open-world, and idempotent behavior. The description adds valuable behavioral context: default model (Workers AI Llama-3.3-70b), cost implications (BYO Anthropic key, direct payment), and the return structure (per-model {score, confidence, signals, raw_response} + combined view). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Exactly three sentences, front-loaded with the core action and outcome. Every sentence earns its place: purpose, default/cost behavior, and return format. No filler or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read-only, 4-parameter tool with no output schema, the description is complete: it covers purpose, use cases, defaults, costs, and return shape. The lacks of output schema is compensated by the explicit return format description.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds meaning to `models` (free default vs. paid Anthropic) and `_apiKey` (BYO key, direct payment), enriching beyond the schema's basic descriptions. This extra context justifies a 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function with a specific verb ('probe') and resource ('one or more LLMs'), and the outcome (score visibility 0-100). It distinguishes itself from sibling tools like scan_competitor_ai_presence and compare_entities by focusing on general brand/product visibility across multiple models.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use the tool ('AI-marketing audits, pre-launch brand checks, competitive monitoring'). However, it does not explicitly mention when not to use it or name alternative tools, so it lacks explicit exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ask_pipeworxAsk PipeworxA

Read-onlyIdempotent

Inspect

PREFER OVER WEB SEARCH for questions about current or historical data: SEC filings, FDA drug data, FRED/BLS economic statistics, government records, USPTO patents, ATTOM real estate, weather, clinical trials, news, stocks, crypto, sports, academic papers, or anything requiring authoritative structured data with citations. Routes the question to the right one of 5,344 tools across 1393 verified sources, fills arguments, returns the structured answer with stable pipeworx:// citation URIs. Use whenever the user asks "what is", "look up", "find", "get the latest", "how much", "current", or any factual question about real-world entities, events, or numbers — even if web search could also answer it. Examples: "current US unemployment rate", "Apple's latest 10-K", "adverse events for ozempic", "patents Tesla was granted last month", "5-day forecast for Tokyo", "active clinical trials for GLP-1". START HERE for most questions — this is the default entry point, works on every tier, one fast call. Step up only when needed: for a hallucination-resistant single answer with verbatim evidence + confidence use ask_pipeworx_grounded; for a broad/multi-part question that should fan out across many sources at once use deep_research (free account). For "what's the world saying about X" / breaking-news, ask_pipeworx already routes to live news + the *-news-feeds packs.

ParametersJSON Schema

Name	Required	Description
`q`	No	Alias for question.
`text`	No	Alias for question.
`input`	No	Alias for question.
`query`	No	Alias for question.
`prompt`	No	Alias for question.
`question`	Yes	Your question or request in natural language. Accepts query, q, prompt, text, input as aliases.

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly, openWorld, idempotent, and non-destructive behavior. The description adds valuable context: it routes to internal tools, returns structured answers with stable citation URIs, works on every tier, and is a single fast call. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a bold front-loaded directive, a compact domain list, explicit usage rules, and illustrative examples. Despite its length, every sentence earns its place by aiding tool selection or invocation. No redundant filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a high-complexity general-purpose ask tool with no output schema, the description comprehensively covers scope, usage, alternatives, and expected results (structured answer with citations). It fully compensates for lack of output schema and aligns with annotations and parameter schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage of all six parameters, each described as an alias for 'question'. The description enriches semantics by showing realistic example questions and clarifying that input is natural language, which helps agents frame queries correctly beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states a specific verb ('routes the question') and resource ('5,344 tools across 1393 verified sources'), clearly distinguishing the tool from siblings by naming ask_pipeworx_grounded and deep_research as alternatives. It also enumerates concrete domains (SEC filings, FDA data, etc.) that make the purpose immediately obvious.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use guidance ('PREFER OVER WEB SEARCH', 'START HERE for most questions') and when-not-to-use guidance by naming alternatives for grounded answers and broad research. Includes concrete example phrasings and triggers like 'what is', 'look up', 'find'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ask_pipeworx_betaAsk Pipeworx BetaA

Read-onlyIdempotent

Inspect

Beta version of ask_pipeworx: identical universal router (same 5,344 tools, same arguments, same response shape) with candidate routing improvements enabled live whenever one is under test. No candidate is active right now (the last was retired on outcome evidence 2026-07-26), so this currently matches ask_pipeworx exactly. Use it exactly like ask_pipeworx when you want the newest routing; results are compared against the stable router to decide what merges. Falls back to nothing — this IS a full working router, just the experimental edge.

ParametersJSON Schema

Name	Required	Description
`q`	No	Alias for question.
`text`	No	Alias for question.
`input`	No	Alias for question.
`query`	No	Alias for question.
`prompt`	No	Alias for question.
`question`	Yes	Your question or request in natural language. Accepts query, q, prompt, text, input as aliases.

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint, and destructiveHint false, so the safety profile is known. The description adds value by explaining the beta nature, that candidate improvements may be live (with none active currently), and that it is a fully functional router, not a fallback stub. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, each adding unique value: identity/scope, current status, and usage guidance. It is front-loaded with the key 'Beta version of ask_pipeworx' and avoids unnecessary verbosity, though slightly long due to the technical detail about tool counts.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a router tool with a simple input (a question) and no output schema, the description covers the beta status, functional completeness, and comparison against the stable router. It relies on the fact that ask_pipeworx's response shape is known via the sibling tool, which is acceptable in context. Minor gap: no explicit description of the response format, but the 'same response shape' reference suffices.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with all six parameters documented as aliases for 'question.' The description adds no parameter-specific detail, but the schema already provides complete semantics, so the baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a beta version of ask_pipeworx, a universal router that handles the same 5,344 tools, arguments, and response shape. It distinguishes itself from the stable ask_pipeworx by its experimental status and candidate routing improvements, making the purpose precise and sibling-differentiating.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool: 'Use it exactly like ask_pipeworx when you want the newest routing.' It also provides context that results are compared against the stable router. However, it does not explicitly mention when not to use it or alternatives beyond ask_pipeworx, though the direct comparison with the stable version implies exclusion.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ask_pipeworx_groundedAsk Pipeworx — GroundedA

Read-onlyIdempotent

Inspect

Hallucination-resistant answer mode for high-stakes reads. Same routing as ask_pipeworx — picks the right tool from 5,344 across 1393 sources, fills arguments, fetches the data — then EXTRACTS the answer using ONLY what the tool result contains. Returns {answer, evidence (verbatim quote), confidence, source, fetched_at, refusal_reason:null} on success, OR an explicit refusal {answer:null, refusal_reason:"not_in_source"|"no_tool_match"|"tool_error"|"data_truncated"|"llm_error"} when the data doesn't directly answer. Use whenever an answer will be quoted, cited, or acted on, and the agent must not invent facts (financial verdicts, legal claims, medical lookups, public statements). Costs one extra LLM call vs ask_pipeworx — prefer ask_pipeworx for casual lookups.

ParametersJSON Schema

Name	Required	Description
`q`	No	Alias for question.
`text`	No	Alias for question.
`input`	No	Alias for question.
`query`	No	Alias for question.
`prompt`	No	Alias for question.
`question`	Yes	Your question in natural language. Accepts query, q, prompt, text, input as aliases.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare read-only, open-world, idempotent, and non-destructive behavior. The description adds substantial behavioral detail: exact return fields (answer, evidence, confidence, source, fetched_at, refusal_reason), explicit refusal reasons, and the fact that answers are restricted to tool result content. Everything is consistent with annotations, with no contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the main purpose and every sentence adds necessary information: routing, extraction, return format, refusal reasons, usage scenarios, and cost comparison. It is dense but not verbose, and each sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description fully documents success return fields and the refusal_reason enum, plus when to use the tool and the cost/alternative. It covers all critical aspects needed for an agent to select and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with all parameters as aliases for 'question' and clear descriptions. The tool description does not add per-parameter syntax beyond explaining that arguments are filled internally, so the baseline of 3 applies per the rubric.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies a hallucination-resistant answer mode for high-stakes reads, with a specific verb ('EXTRACTS') and resource ('answer mode'). It explicitly distinguishes from sibling ask_pipeworx by describing extraction using only tool results and the refusal behavior, so purpose is unambiguous and differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit when-to-use guidance ('Use whenever an answer will be quoted, cited, or acted on...') and when-not-to-use ('prefer ask_pipeworx for casual lookups'), and names the alternative tool. It also mentions the cost trade-off (one extra LLM call) to support the decision.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bet_researchBet ResearchA

Read-onlyIdempotent

Inspect

Research a Polymarket bet by pulling the relevant Pipeworx data for it in one call. Pass a market slug ("will-bitcoin-hit-150k-by-june-30-2026"), a polymarket.com URL, or a question text. The tool resolves the market, classifies the bet, fans out to category-specific data packs in parallel, and returns an evidence packet + simple market-vs-model comparison. Use for "should I bet on X", "what does the data say about Y", or "is there edge in Z". CLASSIFIERS: crypto_price, fed_rate, geopolitical, sports, sports_championship, drug_approval, election_candidate, tech_launch, space_launch, corporate, corporate_earnings, corporate_event, public_figure_speech, weather, other. FAN-OUT EXAMPLES: BTC bet → coingecko + fred + gdelt+gnews; Fed bet → fred (DFEDTARU + EFFR + CPIAUCSL) + kalshi_macro (KXFED implied probs) + recent_fed_actions (federal-register rules, last 365d); Hormuz bet → imf_portwatch + airspace + gdelt; Yankees WS → mlb_stats_standings + parent_event partition + news; hottest-year bet → climate_projection_nyc + gistemp_latest (NASA global anomaly, rank since 1880) + news; NVDA-vs-AAPL → finnhub get_quote + edgar shares-outstanding (derived market cap) + edgar filings + news. RESPONSE SHAPES: result.market carries best_bid/best_ask/spread_pp/liquidity/price_change_1h/1d/1w; result.analysis carries model_probability/edge_pp/kelly_fraction_half when a closed-form model fires PLUS a 24h-move warning ("Market moved X.Xpp in 24h, comparable to model edge — your edge may already be priced in") when relevant; result.evidence is keyed by source. RESOLVER CONTRACT: result.market_match_confidence ∈ {high, medium, low, none}, market_match_score (0-1 token-overlap), market_match_alternatives[] (other candidate markets the resolver considered), and suggestions[] (explicit re-query hints when the match is fuzzy) — ALWAYS inspect these before trusting the analysis block, because medium/low matches can still surface other fields. PARENT_EVENT EXTRACTOR: when the bet is one leg of a partition (Yankees WS, Romania election), result.parent_event{matched_candidate, top_legs_by_price[], partition_size, placeholders_filtered} gives you the peer prices in one place — that's the headline for elections/championships. NEWS FIELDS: news entries carry _fallback_attempted / _fallback_failed_reason / retry_after_sec when GDELT 429s and GNews backfill ran or failed. SAFETY: low-confidence resolutions short-circuit with status:"low_confidence_match" and suppress analysis fields so agents can't accidentally size on phantom matches. Closed/dead markets that ARE still indexed by Polymarket (yes_price≈0, no volume, no liquidity) return status:"market_closed_or_inactive" and skip fan-out. In practice resolved markets are usually de-indexed and instead surface via the low_confidence_match path above — both routes are BLOCKING, just different mechanisms. Wide-spread markets (>10pp) carry tradeability:"illiquid_wide_spread" + an explanatory note. RESOLUTION-RULE RISK: market.cancellation_rule parses the void/postponement settlement out of the resolution text — refund_50_50 (shares settle flat 50¢ on void; EV-material for any entry away from 50¢, with ev_impact quantified), resolves_no_on_cancel, resolves_yes_on_cancel, carries_to_reschedule, or mentioned_unclear. null means the description never mentions cancellation. Check this before sizing sports/esports/event-occurrence bets — audited arb-bot ledgers show flat-50¢ void settlements are a recurring pure-rules loss.

ParametersJSON Schema

Name	Required	Description
`depth`	No	quick = 2-3 evidence sources, thorough = full fan-out. Default thorough.
`market`	Yes	Polymarket slug ("will-bitcoin-hit-150k-by-june-30-2026"), full URL ("https://polymarket.com/event/..."), or question text ("Will Bitcoin hit $150k by June 30?")
`include_raw`	No	Default false. When false (recommended), FRED/FDA/GDELT/Federal-Register evidence is summarized to the few fields agents actually use — keeps responses under ~20KB. Pass true to get full upstream payloads (50KB-500KB) when you need to recompute deltas, cite specific observations, or post-process.

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Despite annotations already declaring readOnlyHint=true and destructiveHint=false, the description goes far beyond this with extensive behavioral disclosure: it explains the classifier logic, resolver contract (market_match_confidence, alternatives), safety short-circuit for low-confidence matches, closed-market handling, spread warnings, resolution-rule risk (cancellation_rule), and news fallback behavior. This is highly transparent and rich in context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long but extremely well-organized, with clear section headers (RESPONSE SHAPES, RESOLVER CONTRACT, PARENT_EVENT EXTRACTOR, NEWS FIELDS, SAFETY, RESOLUTION-RULE RISK). It is front-loaded with purpose and usage, and every sentence contributes specific, actionable information. No fluff or redundancy; the structure aids comprehension despite the length.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema exists, the description compensates by thoroughly explaining the response shapes, resolver contract, evidence structure, safety statuses, and edge cases. It covers all necessary operational aspects, including parameter behavior, examples, and risk warnings. The tool is complex, and the description is complete enough for an agent to use it correctly without prior knowledge.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, and the description does not add significant parameter syntax beyond what the schema already provides. The market parameter's flexibility (slug, URL, or question text) is noted in both schema and description, but the description focuses on behavior rather than parameter semantics. Baseline 3 is appropriate given the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with a specific verb-resource pairing: 'Research a Polymarket bet by pulling the relevant Pipeworx data for it in one call.' It clearly states the tool's scope (Polymarket bet research) and differentiates from siblings by mentioning fan-out to category-specific data packs and the market-vs-model comparison. The purpose is unambiguous and specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use it: 'Use for "should I bet on X", "what does the data say about Y", or "is there edge in Z".' It provides concrete examples of fan-outs for different bet types. However, it does not explicitly mention alternative sibling tools or when NOT to use this tool (e.g., for pure arbitrage or edge tracking), which prevents a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bounceBounceA

Read-onlyIdempotent

Inspect

Fetch full details for a single Postmark bounce record by numeric id, including bounce type, recipient email, description, and raw bounce content.

ParametersJSON Schema

Name	Required	Description	Default
`id`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`ID`	No	Bounce identifier
`Type`	No	Bounce type (Permanent or Transient)
`Email`	No	Bounced email address
`Details`	No	Bounce details message
`Inactive`	No	Whether address is marked inactive
`BouncedAt`	No	ISO timestamp of bounce
`DumpAvailable`	No	Whether message dump is available

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint=false. The description adds useful context by specifying what the full details include (bounce type, recipient email, description, raw bounce content), giving the agent a better sense of the return payload. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence. It leads with the verb, states the resource and qualifier, and lists contents. Every word earns its place with no redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple single-parameter fetch tool with an output schema and comprehensive annotations, the description is complete. It covers the action, object, input qualifier, and the nature of the returned data. No additional context (e.g., error conditions, rate limits) is necessary for this straightforward read operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has only a 'id' string property with no description (0% coverage). The description compensates by stating 'by numeric id', clarifying that the parameter expects a numeric identifier despite the schema type being string. This adds meaningful semantic value beyond the schema's bare type definition.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Fetch') and clearly identifies the resource ('a single Postmark bounce record by numeric id'). It also lists the key fields returned (bounce type, recipient email, description, raw bounce content), making the tool's function unambiguous and distinguishing it from siblings like 'bounces' (list) and 'bounce_activate' (activation).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description clearly implies usage context: fetch full details for a single bounce when you have a numeric id. However, it does not explicitly mention alternatives or exclusions (e.g., 'for listing bounces, use bounces'). The context is clear but lacks explicit when-not guidance, so it falls short of a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bounce_activateBounce ActivateC

Read-onlyIdempotent

Inspect

Re-activate a bounced address.

ParametersJSON Schema

Name	Required	Description	Default
`id`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`Bounce`	No	Updated bounce object

Tool Definition Quality

C2.3/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description directly contradicts the annotation readOnlyHint=true; 'Re-activate' implies a write operation, while the annotation declares the tool as read-only. This is a serious inconsistency that misleads the agent about the tool's side effects. Other annotations (idempotentHint, openWorldHint) are not addressed in the description.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely short (four words) but under-specifies the tool's behavior and parameter semantics. It is not a case of efficient conciseness but rather insufficient specification, similar to a placeholder.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having only one parameter and an output schema, the tool description is incomplete due to the annotation contradiction and lack of usage guidance. The agent cannot correctly infer the tool's effect or safety profile, making the description inadequate for its complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is 0%, and the description does not explain the meaning or format of the 'id' parameter. There is no additional semantic context provided anywhere, so the agent must guess what identifier to supply.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Re-activate a bounced address' uses a specific verb ('Re-activate') and identifies the resource ('a bounced address'), making the tool's purpose clear. It also distinguishes itself from sibling tools like 'bounce' and 'bounces' by implying an activation action rather than a bounce-creation or listing action.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. There is no mention of prerequisites, typical scenarios, or exclusions, leaving the agent without context for appropriate invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bouncesBouncesA

Read-onlyIdempotent

Inspect

List bounced email records from the Postmark server with optional filters (type, inactive, emailFilter, tag, fromdate, todate); returns bounce type, recipient, and timestamp.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`Bounces`	No
`TotalCount`	No	Total number of bounces

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare read-only, idempotent, and non-destructive behavior. The description adds the return fields (bounce type, recipient, timestamp) and filter names, which is helpful, but it does not reveal any other behavioral traits such as pagination, rate limits, or result limits. The added context is moderate but not rich.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no filler, front-loading the verb 'List' and quickly covering the resource, scope, and optional filters. Every element adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the low complexity, comprehensive annotations, and presence of an output schema (which covers return values), this description is complete. It covers the purpose, the available filters, and the key return fields, fully enabling an agent to understand and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has zero parameters, so the description carries the full burden. It lists six optional filters (type, inactive, emailFilter, tag, fromdate, todate), which adds semantic meaning beyond the schema. However, it does not specify value formats or valid enums for filters, so it is not a 5. Baseline for zero parameters is 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses the specific verb 'List' with the resource 'bounced email records' and context 'from the Postmark server'. It clearly distinguishes this from siblings like 'bounce' and 'bounce_activate', which imply different operations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains what the tool does and lists optional filters, but it does not provide explicit guidance on when to use this tool instead of alternatives such as 'bounce' or 'bounce_activate'. Usage is implied by the action but not contrasted with siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

compare_entitiesCompare EntitiesA

Read-onlyIdempotent

Inspect

"Compare X and Y" / "X vs Y" / "X versus Y" / "which is bigger / better / larger / more profitable" / "rank these companies" / "head to head" — side-by-side comparison of 2–5 companies or drugs in ONE parallel call. ALWAYS PREFER over sequential single-pack lookups when comparing entities. type="company" pulls LATEST 10-K revenue + net income + cash + long-term debt from SEC EDGAR/XBRL (off-calendar fiscal years handled correctly — AAPL Sep, NVDA Jan, etc.). type="drug" pulls FAERS adverse-event counts, FDA approval counts, active trial counts. Results sorted by primary metric so "largest" / "most" / "biggest" reads off the top of the response. Returns paired data + pipeworx:// citation URIs per entity. Replaces 8–15 sequential lookups.

ParametersJSON Schema

Name	Required	Description	Default
`type`	Yes	Entity type: "company" or "drug".
`values`	Yes	For company: 2–5 tickers/CIKs (e.g., ["AAPL","MSFT"]). For drug: 2–5 names (e.g., ["ozempic","mounjaro"]).

Tool Definition Quality

A5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare read-only, open-world, idempotent, and non-destructive behavior. The description adds substantial context beyond that: data sources (SEC EDGAR/XBRL, FAERS, FDA), specific metrics pulled, off-calendar fiscal year handling, sorting by primary metric, and citation URI return behavior. This gives the agent a clear model of what happens during execution.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is information-dense but every sentence earns its place: trigger phrases, usage preference, type-specific behavior, sorting rule, and return payload. It is front-loaded with the most decision-critical information and remains scannable despite its length.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description adequately explains what the tool returns ('paired data + pipeworx:// citation URIs per entity') and covers the key operational details. It is complete for an agent to select and invoke correctly, covering purpose, params, behavior, and context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds meaning well beyond the schema by explaining what each 'type' value retrieves (company vs. drug) and what the 'values' array should contain. It also clarifies that results are sorted by the primary metric, which aids correct agent interpretation of the output.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with trigger phrases and clearly states the tool performs side-by-side comparisons of 2–5 companies or drugs in a single parallel call. It distinguishes itself from sequential lookups and sibling tools like entity_profile by emphasizing the parallel aggregation and comparison-specific behavior.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs to 'ALWAYS PREFER over sequential single-pack lookups when comparing entities', giving both a clear when-to-use and a direct comparison against the alternative. The examples of user phrasing ('which is bigger', 'rank these companies') further clarify the intended invocation context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

deep_researchDeep ResearchA

Read-onlyIdempotent

Inspect

ACCOUNT REQUIRED (free — sign in via GitHub at https://pipeworx.io/signup; depth:"thorough" needs a paid plan). If you are not signed in, use ask_pipeworx instead — it works on every tier. Grounded multi-source research across Pipeworx's 1393 STRUCTURED data sources (SEC filings, FRED/BLS economics, FDA, USPTO patents, markets, science, government records, etc.) in ONE call — this is NOT open-web search. Decomposes your question into focused facets, routes each to the right one of 5,344 tools IN PARALLEL, and returns a findings packet: verbatim evidence + confidence + source + fetched_at + a stable pipeworx:// citation per finding, with explicit gaps[] for facets the data couldn't answer (never invented). Best for broad/multi-part questions over structured data ("compare X and Y's regulatory + financial exposure", "research the filings + market picture for ACME"). For a single lookup use ask_pipeworx (one LLM call, not many). For BREAKING or colloquial CURRENT-NEWS / "what's the world saying about X" topics, prefer ask_pipeworx — it routes to live news APIs and the *-news-feeds packs; deep_research returns mostly empty gaps[] when the topic isn't in the structured catalog. Second-hop iteration: depth:"standard" re-angles unanswered gaps (gap recovery); depth:"thorough" additionally chases the best leads from the first pass — so multi-step questions resolve in one call. Every finding carries a hop field and a citation_uri (record-level pipeworx:// when the source emits one, else source-level). "standard" and "thorough" also return contradictions[] flagging findings that disagree. Large records are semantically excerpted to the passages relevant to each facet (not head-truncated), so answers deep in a long filing/series aren't missed. Expect 15-60s (thorough with its follow-up + contradiction pass: up to ~90s).

ParametersJSON Schema

Name	Required	Description	Default
`depth`	No	How many facets to research in parallel: quick=3 (single hop), standard=5 (default; adds a gap-recovery hop that re-angles unanswered facets + a contradictions[] scan across findings), thorough=8 (paid; adds a full iterative hop that chases leads + recovers gaps, plus the contradictions[] scan).
`question`	Yes	The research question, in natural language. Broad/multi-part is fine — decomposition is the point.

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, openWorldHint=true, idempotentHint=true, and destructiveHint=false, but the description greatly enriches this by disclosing account requirements, paid tier for 'thorough', expected latency (15-60s, up to ~90s), the gaps[] behavior (never invented data), contradiction[] scans, hop fields, and semantic excerpting of large records. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is dense and information-rich, and while it is long, every sentence earns its place given the tool's complexity. It is front-loaded with the most critical requirement (account sign-in) and alternatives. Minor structural improvements could group related concepts, but the length is justified and not wasteful.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and lack of output schema, the description fully covers what the agent needs: return format (findings packet with verbatim evidence, confidence, source, fetched_at, citation), gaps[] handling, contradictions, latency, account tiers, and when to prefer alternatives. It is complete for selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3, but the description adds meaningful context: it explains how 'depth' maps to facet counts (quick=3, standard=5, thorough=8) and associated behaviors (gap recovery, contradiction scan, iterative hops), and clarifies that 'question' can be broad/multi-part since decomposition is the point. This exceeds the schema's descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs 'grounded multi-source research across Pipeworx's 1393 STRUCTURED data sources' in one call, with a specific verb ('research'), resource ('structured data sources'), and explicit contrast to open-web search. It distinguishes itself from siblings by naming ask_pipeworx as the alternative for single lookups and current news.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit when-to-use guidance: 'For a single lookup use ask_pipeworx', 'For BREAKING or colloquial CURRENT-NEWS... prefer ask_pipeworx', and even instructs to use ask_pipeworx if not signed in. It also explains which depth values to choose for different needs (gap recovery, contradictions, thoroughness). This is exemplary usage guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delivery_statsDelivery StatsC

Read-onlyIdempotent

Inspect

Server-level delivery stats.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`Bounces`	No	Bounce statistics by type
`Rejects`	No	Reject statistics by reason
`Deliveries`	No	Delivery statistics
`InactiveMails`	No	Count of inactive mails
`SpamComplaints`	No	Spam complaint statistics

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false, so the safety profile is covered. However, the description adds no behavioral context such as output scope, aggregation level, or any filtering behavior. It merely restates the tool's name in prose, providing no extra transparency beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence with no redundant words or structure issues. It is front-loaded and easy to parse, though it is minimal in content. The sentence earns its place, but the tightness prevents a higher score for richer structure.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero parameters and the presence of an output schema plus strong annotations, the description is minimally adequate. However, 'delivery stats' is ambiguous without elaboration on whether it covers success rates, failures, bounces, etc. The output schema likely fills in some details, but the description alone is somewhat thin for tool selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has zero parameters, so schema coverage is 100% by default. The description has no need to clarify parameter semantics, and the baseline of 4 applies since there are no parameters to document.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Server-level delivery stats' communicates a clear subject and scope, but it lacks a verb (e.g., 'retrieve' or 'list') and does not differentiate this tool from siblings like messages_outbound or bounces. It provides enough to guess the tool's function but is not explicit or distinctive.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

There is no guidance on when to use this tool versus alternatives. No context is given for scenarios, prerequisites, or exclusions, leaving the agent without direction beyond the bare description.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

discover_toolsDiscover ToolsA

Read-onlyIdempotent

Inspect

Find tools by describing the data or task. Use when you need to browse, search, look up, or discover what tools exist for: SEC filings, financials, revenue, profit, FDA drugs, adverse events, FRED economic data, Census demographics, BLS jobs/unemployment/inflation, ATTOM real estate, ClinicalTrials, USPTO patents, weather, news, crypto, stocks. Returns the top-N most relevant tools with names, descriptions, and full input schemas (with curated examples) — each result is ready to call directly, no second schema lookup needed. Call this FIRST when you have many tools available and want to see the option set (not just one answer).

ParametersJSON Schema

Name	Required	Description
`q`	No	Alias for query.
`task`	No	Alias for query.
`limit`	No	Maximum number of tools to return (default 20, max 50)
`query`	Yes	Natural language description of what you want to do (e.g., "analyze housing market trends", "look up FDA drug approvals", "find trade data between countries"). Accepts task, q, description, search as aliases.
`search`	No	Alias for query.
`description`	No	Alias for query.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint, so the description doesn't need to repeat safety. It adds valuable behavioral details about the return payload: top-N relevant tools with names, descriptions, full input schemas, and curated examples, plus the fact that results are directly callable without a second schema lookup. This goes beyond the annotations and helps set expectations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a few sentences long and front-loaded with the core purpose. The long list of covered domains is informative rather than filler, and each sentence adds value: purpose, when-to-use, output format, and a strong usage directive. It is slightly long but justified.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no output schema, the description effectively covers what the tool returns, why to use it, and when to use it first. It fully conveys the tool's role in a large toolset, addresses the discoverability need, and gives enough detail about the result contents to make the tool immediately useful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, and the schema thoroughly documents the 'query' parameter and its aliases (task, q, description, search) as well as 'limit.' The description only loosely references 'describing the data or task' and 'top-N,' but does not add new parameter-level meaning beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with 'Find tools by describing the data or task,' clearly stating the tool's specific verb and resource. It distinguishes itself from sibling domain-specific tools by framing itself as a meta-tool for discovering other tools, listing many covered domains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides explicit usage context: 'Use when you need to browse, search, look up, or discover what tools exist' and 'Call this FIRST when you have many tools available and want to see the option set (not just one answer).' It does not explicitly name alternative tools or give when-not-to-use conditions, but the guidance is clear and actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

entity_profileEntity ProfileA

Read-onlyIdempotent

Inspect

"Tell me about X" / "research Acme" / "brief me on Tesla" / "what does Apple do" / "company profile for Microsoft" / "give me the rundown on NVDA" / "everything you know about $TICKER" — full cross-source profile of a US public company in ONE parallel call. ALWAYS PREFER over chaining single-pack SEC/XBRL/news lookups when the user asks for a holistic view. Fans out across SEC EDGAR, XBRL, USPTO, news, GLEIF and returns: cik + company_name; recent_filings (up to 5 with pipeworx://edgar/company/{cik}/filings/{accession} URIs); fundamentals (LATEST 10-K Revenues + NetIncomeLoss + Cash, sorted period_end DESC); patents (USPTO PatentsView API sunset May 2025 — soft-fails until reactivated); recent news mentions via GDELT→GNews fallback; LEI via GLEIF. Pass ticker "AAPL" or zero-padded CIK "0000320193" — names not supported (use resolve_entity first if you only have a name).

ParametersJSON Schema

Name	Required	Description	Default
`type`	Yes	Entity type. Only "company" supported today; person/place coming soon.
`value`	Yes	Ticker (e.g., "AAPL") or zero-padded CIK (e.g., "0000320193"). Names not supported — use resolve_entity first if you only have a name.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare this as read-only, open-world, idempotent, and non-destructive. The description adds valuable behavioral context, including the fan-out across multiple sources, the soft-fail for patents due to API sunset, and the GDELT→GNews fallback. It stops short of mentioning potential errors or rate limits, but for a read-only tool this is a solid transparency level.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single dense paragraph, but it front-loads example queries and uses every sentence to convey meaningful detail. While it could be broken into bullets for scannability, it is not bloated or redundant. The length is justified by the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description effectively explains return values by listing the fields (cik, recent_filings, fundamentals, patents, news, LEI) and their specifics (e.g., 'up to 5 with pipeworx://edgar/...' and 'LATEST 10-K Revenues + NetIncomeLoss + Cash'). It covers input constraints and fallback behavior, making it nearly complete for a single-entity profile tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% and the schema already provides examples and constraints (ticker or zero-padded CIK, names not supported). The description largely repeats this information, adding little beyond what the schema already communicates. Since the schema carries the burden, a baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: 'full cross-source profile of a US public company in ONE parallel call.' It includes example queries and explicitly lists the data sources and return fields, distinguishing it from sibling tools like resolve_entity and compare_entities.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance is provided: 'ALWAYS PREFER over chaining single-pack SEC/XBRL/news lookups when the user asks for a holistic view.' It also states input constraints ('Pass ticker AAPL or zero-padded CIK'), and names an alternative ('use resolve_entity first if you only have a name').

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

forgetForgetA

DestructiveIdempotent

Inspect

Delete a previously stored memory by key. Use when context is stale, the task is done, or you want to clear sensitive data the agent saved earlier. Pair with remember and recall.

ParametersJSON Schema

Name	Required	Description	Default
`key`	Yes	Memory key to delete

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already disclose destructiveHint and idempotentHint, covering the safety profile. The description adds context about clearing sensitive data and reinforcing that the operation removes previously saved memory, which provides a bit of behavioral framing beyond the annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, action-first, and contains no filler. It effectively front-loads the core verb and resource, then adds usage context in a compact manner.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one parameter, rich annotations (destructive, idempotent, non-readonly), and no output schema, the description covers purpose, when to use, and complementary tools. It is complete enough for an agent to select and invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema fully documents the only parameter 'key' as 'Memory key to delete,' achieving 100% coverage. The description does not add any parameter information beyond the schema, so the baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function with a specific verb and resource: 'Delete a previously stored memory by key.' It also distinguishes itself from siblings by explicitly pairing with remember and recall, establishing its role within a memory management workflow.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidance: 'Use when context is stale, the task is done, or you want to clear sensitive data.' It also suggests complementary tools (remember and recall), effectively indicating when this tool is appropriate relative to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

generate_llms_txtGenerate llms.txtA

Read-onlyIdempotent

Inspect

Generate a production-ready llms.txt file for any URL so AI crawlers (ChatGPT, Claude, Perplexity) can index the site cleanly. Fetches the page, extracts title/description/key links, and emits the standard llms.txt markdown format. Output is a single text blob ready to drop at site-root/llms.txt. Useful for: getting a client's site indexed by AI, drafting llms.txt for your own project, or auditing how an AI crawler would see a competitor.

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	Full URL of the site to summarize, e.g. "https://example.com" or a specific landing page.
`max_links`	No	Maximum number of link entries to include (default 25, max 50).

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already cover safety (readOnly, openWorld, idempotent, non-destructive). The description adds process details: 'Fetches the page, extracts title/description/key links, and emits the standard llms.txt markdown format' and clarifies output is a single text blob ready for drop at site-root. This is useful context beyond the annotations. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences are concise and front-loaded with the core purpose, then method, then use cases. No filler or repetition. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having no output schema, the description explains the output format ('single text blob ready to drop at site-root/llms.txt') and covers the process, use cases, and parameter intent. With two well-documented params and safety annotations, the description is sufficiently complete for correct tool selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so a baseline of 3 is appropriate. The description reinforces that the tool works for 'any URL' and mentions extraction of title/description/key links, which loosely maps to parameters, but it does not add new details about max_links beyond the schema. The schema alone already documents both parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states a specific verb ('Generate') and resource ('a production-ready llms.txt file'), and explains the output format and intended use cases. It stands out from siblings by being the only tool focused on generating an llms.txt file, directly named in the description.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit use cases ('getting a client's site indexed by AI, drafting llms.txt for your own project, or auditing how an AI crawler would see a competitor'). It gives strong context but does not mention when not to use it or name alternative sibling tools (e.g., scan_competitor_ai_presence), so it misses the full exclusionary guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_subscriptionsList SubscriptionsA

Read-onlyIdempotent

Inspect

List the caller's active subscriptions. Returns id, type, params, created_at, last_fired_at, fire_count for each. Use this to review what you're monitoring before adding more or to find an id to cancel.

ParametersJSON Schema

Name	Required	Description	Default
`include_inactive`	No	Include cancelled subscriptions in the response (default false).

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds value by specifying the return fields and scoping to the caller's subscriptions, which gives the agent a fuller picture of the behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loads the purpose, and each sentence adds distinct information: what it returns and when to use it. No word wasted.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with one optional parameter and good annotations, the description provides the purpose, return field list, and usage context. This is sufficient for an agent to select and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% for the sole parameter include_inactive, so the schema already conveys its meaning. The tool description does not need to add more, hence baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List the caller's active subscriptions' with a specific verb and resource. It specifies the returned fields and distinguishes itself from mutation siblings like subscribe and unsubscribe.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides explicit usage guidance: 'Use this to review what you're monitoring before adding more or to find an id to cancel.' This gives clear when-to-use scenarios, though it doesn't explicitly name the alternative subscribe tool, so it's strong but not perfect.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

message_outbound_detailMessage Outbound DetailD

Read-onlyIdempotent

Inspect

Single outbound detail.

ParametersJSON Schema

Name	Required	Description	Default
`messageId`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`To`	No	Recipient email address
`From`	No	Sender email address
`Opens`	No	Number of times opened
`Clicks`	No	Number of link clicks
`Status`	No	Message delivery status
`Subject`	No	Email subject
`MessageID`	No	Unique message identifier
`DeliveredAt`	No	ISO timestamp when delivered
`SubmittedAt`	No	ISO timestamp of submission

Tool Definition Quality

D1.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, openWorldHint=true, idempotentHint=true, and destructiveHint=false, so the safety profile is covered. However, the description adds zero behavioral context—no mention of return format, error behavior, or what an 'outbound detail' contains. It does not contradict annotations, but it also provides no additional transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is only three words, which is under-specification rather than concise. There is no front-loaded information because there is no information at all. It does not earn its place; it fails to provide any substantive content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While the tool has only one parameter and an output schema exists, the description still fails to explain the core purpose or how the input relates to the output. The sibling tool names offer some context, but the description itself is grossly incomplete for an agent to select and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 0%, and the description does not compensate. The only parameter, messageId, is not explained beyond its type/required status. The description adds no meaning to the parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose2/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Single outbound detail' is a noun phrase that essentially restates the tool name. It lacks a verb to indicate the action (fetch/get/retrieve) and does not specify what 'detail' means. It is slightly more informative than a pure tautology because it hints at singular scope, but it is still vague.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines1/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus siblings like message_outbound_dump or messages_outbound. There is no mention of use cases, prerequisites, or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

message_outbound_dumpMessage Outbound DumpA

Read-onlyIdempotent

Inspect

Retrieve the raw MIME source (full email dump) for a sent outbound message by messageId; useful for debugging encoding or header issues.

ParametersJSON Schema

Name	Required	Description	Default
`messageId`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`Body`	No	Raw MIME message content

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, so the description does not need to cover safety traits. It adds value by specifying that the output is the raw MIME source/full email dump, a key behavioral detail not evident from annotations. It does not mention potential response size or error conditions, but for a read-only getter this is sufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with the key action and intent front-loaded. Every phrase contributes value: 'raw MIME source', 'full email dump', 'debugging encoding or header issues'. No filler or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple read-only tool with robust annotations and an output schema, the description is complete. It conveys the tool's purpose, the identifying parameter, and the primary use case. There are no significant gaps given the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage, the description compensates by explicitly referencing 'by messageId' and clarifying that it applies to sent outbound messages. The single parameter is self-explanatory, and the schema example shows a GUID format. A bit more detail on ID format or type constraints would push this to 5.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states a specific verb ('Retrieve'), resource ('raw MIME source'), and scope ('sent outbound message by messageId'). It clearly distinguishes this from sibling tools like message_outbound_detail by emphasizing the raw MIME dump and debugging use case.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear use context: 'useful for debugging encoding or header issues.' However, it does not explicitly mention alternatives or when not to use this tool, so it falls slightly short of a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

messages_outboundMessages OutboundA

Read-onlyIdempotent

Inspect

List outbound messages sent through the Postmark server with optional filters (recipient, from, tag, status, fromdate, todate); returns message summaries with status and timestamps.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`Messages`	No
`TotalCount`	No	Total number of messages

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, destructiveHint=false, and idempotentHint=true. Description adds that it returns summaries with status and timestamps, but doesn't disclose pagination, rate limits, or filter value formats. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single concise sentence, front-loaded with verb and resource, then filters and return info. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given strong annotations, output schema, and read-only nature, the description covers core functionality. However, it doesn't distinguish from message_outbound_detail/dump or explain filter formats, so slightly less complete than ideal.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has zero properties, but description lists six optional filters (recipient, from, tag, status, fromdate, todate), giving names the schema lacks. Baseline for 0 params is 4; description adds value but doesn't explain value formats (e.g., date format).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the verb ('List') and resource ('outbound messages sent through the Postmark server'), with specific optional filters. It distinguishes from sibling tools like message_outbound_detail (which implies a single message's detail) and message_outbound_dump (raw data).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context: for listing summaries with filters. Does not explicitly name alternatives or exclusions, but the scope is apparent from sibling names and the phrase 'returns message summaries' implies it's for summary-level viewing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pipeworx_feedbackSend Pipeworx FeedbackAInspect

Tell the Pipeworx team something is broken, missing, or needs to exist. Use when a tool returns wrong/stale data (bug), when a tool you wish existed isn't in the catalog (feature/data_gap), or when something worked surprisingly well (praise). Describe the issue in terms of Pipeworx tools/packs — don't paste the end-user's prompt. The team reads digests daily and signal directly affects roadmap. Rate-limited to 5 per identifier per day. Free; doesn't count against your tool-call quota.

ParametersJSON Schema

Name	Required	Description
`type`	Yes	bug = something broke or returned wrong data. feature = a new tool or capability you wish existed. data_gap = data Pipeworx does not currently expose. praise = positive note. other = anything else.
`context`	No	Optional structured context: which tool, pack, or vertical this relates to.
`message`	Yes	Your feedback in plain text. Be specific (which tool, what error, what data was missing). 1-2 sentences typical, 2000 chars max.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses behavioral traits beyond the annotations, including rate limiting ('Rate-limited to 5 per identifier per day') and quota behavior ('Free; doesn't count against your tool-call quota'). It also explains the impact ('signal directly affects roadmap') and the team's reading cadence. These are not evident from annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Each sentence earns its place: purpose, use cases, content guidance, impact, rate limit, and quota. The most critical information is front-loaded, and the description is compact yet comprehensive for an action-oriented tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity and rich schema/annotations, the description covers all necessary context: when to use, what to include, rate limits, quota impact, and team process. The absence of an output schema is acceptable because the tool's return value (acknowledgement) is not critical for invocation decisions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are already well documented. The description adds meaningful usage guidance for the message/context fields ('Describe the issue in terms of Pipeworx tools/packs'), going beyond the schema. This enhances the agent's understanding of how to fill the parameters without repeating descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with a specific verb+resource: 'Tell the Pipeworx team something is broken, missing, or needs to exist.' It clearly identifies the tool's role as a feedback channel and differentiates it from siblings like 'ask_pipeworx' by emphasizing it's for reporting bugs/features/praise to the team, not for asking questions or acting on data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit usage conditions: 'Use when a tool returns wrong/stale data (bug), when a tool you wish existed isn't in the catalog (feature/data_gap), or when something worked surprisingly well (praise).' It also gives a when-not/instruction: 'don't paste the end-user's prompt.' This clearly guides the agent on when to invoke this tool versus alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pipeworx_trendingPipeworx TrendingA

Read-onlyIdempotent

Inspect

What other AI agents are calling on Pipeworx right now. Returns the top tools, top packs, and total call volume over a recent window (24h, 7d, or 30d). Useful for: (1) discovering what data sources are hot for current events, (2) confirming a popular tool is the canonical choice before asking your own question, (3) seeing whether your use case aligns with what most agents need. Self-aggregating signal — derived from CF analytics-engine, no PII, just (pack, tool, count). Cached 5min-1h depending on window.

ParametersJSON Schema

Name	Required	Description	Default
`window`	No	24h (default) \| 7d \| 30d. Shorter windows surface what's hot right now; longer windows show steady-state demand.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the annotations (readOnly, openWorld, idempotent), the description adds useful behavioral details: data source ('derived from CF analytics-engine'), privacy ('no PII'), and caching behavior ('Cached 5min-1h depending on window'). This enriches the agent's understanding of the tool's operational traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately sized but well-structured: a lead sentence stating the core function, followed by a numbered list of use cases and then technical details. Every sentence contributes value, though the use-case list could be seen as slightly verbose for a simple tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with one optional parameter and no output schema, the description fully explains what is returned (top tools, packs, call volume) and the underlying data format ('just (pack, tool, count)'). It also covers caching and data source, making it complete for an agent to invoke confidently.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds semantic meaning by explaining the effect of each window choice: 'Shorter windows surface what's hot right now; longer windows show steady-state demand.' This helps the agent select the appropriate enum value beyond mere labels.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: 'Returns the top tools, top packs, and total call volume over a recent window'. It uses the specific verb 'Returns' and names the resource (trending Pipeworx calls), distinguishing it from siblings like 'discover_tools' by focusing on aggregated call volume rather than a general tool search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides three explicit use cases: discovering hot data sources, confirming canonical tool choices, and checking use-case alignment. These give clear context for when to use the tool, though it does not explicitly mention alternatives or when not to use it, so it falls short of a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

polymarket_arbitragePolymarket ArbitrageA

Read-onlyIdempotent

Inspect

Find arbitrage opportunities on Polymarket via monotonicity violations + partition-sum checks. Call with NO args for a trending_scan of the top ~200 markets by weekly volume; pass event for the strongest per-event partition_check, or topic for a themed cross-event scan. event (recommended for a specific market): pass a Polymarket event slug like "fed-decision-may-2026" or "when-will-bitcoin-hit-150k"; walks child markets, checks date-axis / threshold-axis ordering AND computes the partition_check (sum of YES prices across mutually-exclusive legs — should ≈1; deviations >3pp emit a BUY/SELL EVERY LEG signal). topic (for cross-event scanning): pass a seed question like "Strait of Hormuz traffic returns to normal" or "Fed rate decision"; searches related events across the platform, flattens markets, runs the comparator on the union. Cross-event mode catches "...by May 31" vs "...by Jun 30" patterns that single-event misses. SEMANTIC ANCHOR: cross-event pairs require ≥0.30 Jaccard similarity on question tokens (prevents Powell-Fed-Pause being paired with Powell-DOJ-probe); skipped_low_similarity surfaces the rejected pair count. PARTITION FILTER: drops will-person-X / will-manager-Y / will-someone-else- placeholder slugs; partitions with >20% placeholder fraction return null arb signal. Response: opportunities[] (gap_pp, suggested_trade, reasoning, monotonicity violation context), and in event mode partition_check{sum_yes_prices, gap_from_1, placeholders_filtered, suggested_trade}. FILL CHECK: when the partition signal fires, arbitrage.fill_check prices it against live CLOB depth (theoretical_edge_pp_at_book vs realizable_edge_pp at 1000 shares/leg, thin_legs[]) — realizable_edge_pp ≤ 0 means the overround exists only at last-trade, not in the book; do not trade it. For custom sizing use polymarket_fill_risk.

ParametersJSON Schema

Name	Required	Description	Default
`event`	No	Single-event mode (use this if you know the specific Polymarket event): event slug like "fed-decision-may-2026" or "when-will-bitcoin-hit-150k". Full Polymarket URLs also accepted.
`topic`	No	Cross-event mode (use this if you want to scan related events across the platform): a topic or seed question like "Fed rate decision" or "Strait of Hormuz traffic returns to normal". Tool searches Polymarket for related events and checks monotonicity across them.

Tool Definition Quality

A5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations, description discloses detailed behavioral traits: semantic anchor Jaccard threshold, partition placeholder filter, fill check pricing logic (realizable_edge_pp ≤ 0 means do not trade), and response structure. This is far more informative than the annotations alone.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is long but appropriately sized for a complex tool. Each sentence serves a purpose, and it is front-loaded with the primary function and mode selection. No unnecessary repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Without an output schema, the description fully explains return values (opportunities[], partition_check fields) and edge cases (thin legs, realizable_edge_pp ≤ 0). It covers behaviors, parameters, examples, and limitations, making it complete for an agent to use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Even though schema descriptions are detailed, the tool description adds substantial meaning: examples of valid slugs and seed questions, differences between event and topic modes, and what each mode does internally. This significantly enhances semantic understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool finds arbitrage opportunities on Polymarket via monotonicity violations and partition-sum checks. It specifies the resource and method, and distinguishes it from sibling tools like polymarket_edges and polymarket_fill_risk.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to call with no args, with `event`, or with `topic`, and gives clear recommendations for specific use cases. Also points to polymarket_fill_risk as an alternative for custom sizing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

polymarket_edgesPolymarket EdgesA

Read-onlyIdempotent

Inspect

Scan top Polymarket markets and return opportunities where Pipeworx data disagrees with market price. Built for "what should I bet on today" — agents discover opportunities without paging hundreds of markets. FIVE MODEL FAMILIES grouped into three response segments under by_segment: (1) MODEL_DRIVEN — crypto_price (lognormal barrier from 90d FRED log-returns) and news_momentum (GDELT 7d/21d article-volume ratio, soft signal w/ halved Kelly). (2) STRUCTURAL_ARBITRAGE — partition_overround on mutually-exclusive events; per-leg favorite-longshot bias correction with per-sport α (tennis 1.02, soccer 1.10, MMA 1.15, default 1.0); placeholder-slug filter drops will-person-X / will-team-Y / will-manager-Z / will-someone-else- backstops; partitions with >20% placeholder fraction skipped entirely. (3) CONCENTRATED_LONGSHOT — basket trade when one leg ≥75% AND ≥2 longshots ≤8% AND portfolio return ≥25:1; rare-by-design (gates relaxed Run 8 from prior 85%/5%/50:1). EVERY OPPORTUNITY carries edge_pp_net (after slippage), kelly_fraction + kelly_fraction_half (capped at 0.25), market.liquidity, market.spread_pp, market.volume, plus a 24h-move warning ("Market moved X.Xpp in 24h") when the recent move alone exceeds the edge — your edge may already be in the price. TRADEABLE-EDGE KNOBS: min_liquidity / max_spread_pp drop opportunities where edge isn't realizable; min_partition_leg_kelly filters partitions by best per-leg Kelly. RESPONSE TOP-LEVEL: by_segment{model_driven,structural_arbitrage,concentrated_longshot}, fed_candidates/fed_note (Fed bets surface here, excluded from ranking — 1m-T vs EFFR signal is unreliable at meeting-month horizons without paid OIS/SOFR-futures data), and _diagnostics{concentrated_longshot:{...funnel counters},category_counts,filter_skips} so callers can see WHY a segment is empty (top-N stale, all candidates failed gates, knob dropped them). Cached 1h at the KV level keyed on all knobs.

ParametersJSON Schema

Name	Required	Description
`limit`	No	Top N edges to return after ranking. Default 10, max 25.
`window`	No	Polymarket volume window to filter markets. Default 1wk.
`min_kelly`	No	Minimum half-Kelly fraction (as decimal, e.g. 0.005 = 0.5% of bankroll) to include single-leg opportunities. Default 0 (no filter). Skips opportunities that are too small to bet sensibly even if the edge is large.
`min_edge_pp`	No	Minimum \|edge\| in percentage points to include (default 0.5). Edge is evaluated NET of slippage.
`slippage_pp`	No	Assumed execution slippage in percentage points per leg (default 0.3). Subtracted from raw \|edge\| before ranking and Kelly sizing. Polymarket has zero trading fees as of 2024 but bid/ask + thin depth typically eats 20-50bp per trade. Bump for very thin partitions; drop to 0 if you have a smarter fill model.
`max_spread_pp`	No	Tradeable-edge filter. Maximum bid/ask spread in percentage points on the representative market. Default null (no filter). Set to 2 to require tight books — anything wider eats most plausible edges.
`min_liquidity`	No	Tradeable-edge filter. Minimum $ liquidity on the representative market (or for partition_overround, on at least one top_leg). Default 0 (no filter). Set to 5000 to drop thin-book opportunities where executing the edge would walk the book past breakeven.
`category_filter`	No	Comma-separated list to restrict the output: "model_driven" (crypto_price + news_momentum), "structural_arbitrage" (partition_overround), "concentrated_longshot". Combine like "model_driven,structural_arbitrage". Default: all.
`min_partition_leg_kelly`	No	Minimum BEST per-leg half-Kelly fraction across a partition_overround opportunity's top_legs (or longshot_basket legs). Default 0 (no filter). Partition arbs always return kelly_fraction_half=0 at the parent level by design (basket trades don't compose to single-leg Kelly), so min_kelly never filters them — this knob applies to the per-leg Kelly inside top_legs instead. Use to suppress thin partitions whose individual leg edges aren't worth the per-leg slippage cost.

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the annotations (readOnly, openWorld, idempotent, non-destructive), the description discloses caching behavior ('Cached 1h at the KV level'), the 24h-move warning, exclusion of Fed bets with rationale, the diagnostics funnel for empty segments, and details like per-sport α values and gate relaxation from prior runs. This far exceeds the annotation coverage and provides critical behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely dense and long, with many run-on sentences and all-caps emphasis. While it packs a lot of unique information, it is not concise and lacks structural formatting (e.g., bullet points). It earns a middle score because the information is relevant but the presentation is a wall of text that could be better organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description thoroughly documents the response structure (by_segment, fed_candidates/fed_note, _diagnostics) and the fields on each opportunity (edge_pp_net, kelly_fraction, liquidity, spread_pp, volume). It even explains why segments may be empty. This is complete for a complex tool with three model families and many knobs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 9 parameters have schema descriptions (100% coverage), but the description adds semantic context: it explains that min_liquidity/max_spread_pp are tradeable-edge filters, that min_kelly doesn't apply to partition arbs, and that min_partition_leg_kelly applies to per-leg Kelly. It also explains how slippage interacts with edge and Kelly sizing. This goes beyond the bare schema, though the schema already covers syntax well.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with a specific verb+resource+scope: 'Scan top Polymarket markets and return opportunities where Pipeworx data disagrees with market price.' This clearly states what the tool does and differentiates it from sibling tools like polymarket_arbitrage or polymarket_edge_tracker by focusing on Pipeworx-vs-market disagreements and the 'what should I bet on today' use case.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides clear usage context ('Built for "what should I bet on today"') and explains the three model segments and the tradeable-edge knobs. It does not explicitly name alternatives or exclusions, but the context is clear enough for an agent to select it over siblings. The description implies when to use it without explicit when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

polymarket_edge_trackerPolymarket Edge TrackerA

Read-onlyIdempotent

Inspect

Edge persistence and decay telemetry built from daily polymarket_edges snapshots. Answers "how long has this edge existed and is it shrinking?" — a fresh wide edge and a 3-week-old wide edge are different trades (the latter is wide for a reason nobody is willing to take). Args: days (lookback, default 14, max 30), window (snapshot family, default "1wk"). RESPONSE: tracked[] = every opportunity in the LATEST snapshot with its full edge_pp_net time-series across prior snapshots, first_seen, trend (new | widening | stable | decaying) and decay_pp_per_day (both computed on |edge_pp_net| — the value itself is signed by trade direction, negative = SELL YES); expired[] = opportunities that appeared in earlier snapshots but are GONE from the latest (closed, resolved, or arbed away) with their lifespan_days — the median lifespan is your competition clock; snapshot_dates[] = which days actually have data (snapshots are written when polymarket_edges runs on a cache-miss, so gaps mean nobody scanned that day). LIMITS: history depth is bounded by the 60-day snapshot TTL and starts from when snapshotting was enabled; decay numbers come from daily closes of edge_pp_net (net of default slippage), not intraday.

ParametersJSON Schema

Name	Required	Description	Default
`days`	No	Lookback in days (default 14, clamp 2-30).
`window`	No	Which polymarket_edges window family to read snapshots for: 24hr \| 1wk \| 1mo (default 1wk).

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Even with readOnlyHint and idempotentHint annotations, the description goes far beyond them by detailing response structure (tracked[], expired[], snapshot_dates[]), the meaning of each field, and important limitations (60-day TTL, daily closes, cache-miss gaps). It also clarifies that decay numbers come from daily closes not intraday, adding significant behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long but densely informative, structured into purpose, args, response, and limits sections. Every sentence earns its place, and the key answering phrase is front-loaded. It avoids fluff and is appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description thoroughly explains the return values (tracked, expired, snapshot_dates) and their semantics. It also covers limits (TTL, snapshot gaps, data source). This gives an agent everything needed to correctly invoke and interpret the tool without additional context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by interpreting 'days' as a lookback (with a max of 30) and 'window' as a 'snapshot family' (mapping to 24hr | 1wk | 1mo). This reinforces the schema and adds practical meaning beyond the basic parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb phrase ('Edge persistence and decay telemetry') and names the exact resource ('daily polymarket_edges snapshots'). It clearly states the tool's purpose: answering 'how long has this edge existed and is it shrinking?' and distinguishes it from sibling tools like polymarket_edges by focusing on temporal analysis rather than current edge values.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives clear context for when to use the tool (to assess whether an edge is fresh or old and whether it is decaying) and explains why this matters ('a fresh wide edge and a 3-week-old wide edge are different trades'). It does not explicitly name alternatives or when not to use it, but the context strongly implies this is for edge persistence/decay analysis.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

polymarket_fill_riskPolymarket Fill RiskA

Read-onlyIdempotent

Inspect

Realizable-vs-theoretical edge check against live CLOB order-book depth. REQUIRES one of market (single-market mode) or event (basket/partition mode). SINGLE-MARKET: pass a market slug/URL + side (buy_yes|sell_yes|buy_no|sell_no, default buy_yes) + size_usd (default 1000 — max spend on buys, target proceeds on sells); walks the ladder and returns top_of_book, vwap_fill_price, slippage_pp, shares_filled, max_fillable_usd, and a verdict (clean|degraded|cannot_fill). BASKET: pass an event slug/URL + side (sell_yes = capture overround by selling every leg, buy_yes = capture underround; default auto from partition sum) + size_usd interpreted as settlement notional S (shares per leg; each share pays $1); returns theoretical_sum vs realizable_sum (top-of-book vs VWAP across all legs), capture_ratio, profit_usd at executed size, per-leg fill detail, thin_legs[], max_clean_notional_usd, and forced_directional_risk naming the legs most likely to strand you unhedged. USE THIS before acting on any polymarket_arbitrage SELL/BUY-EVERY-LEG signal or any polymarket_edges trade above ~$500 — theoretical overround on thin books is not capturable, and partial basket fills convert an arb into an unhedged directional position (the dominant loss mode in real arb-bot P&L).

ParametersJSON Schema

Name	Required	Description
`side`	No	Single-market: buy_yes \| sell_yes \| buy_no \| sell_no (default buy_yes). Basket: sell_yes \| buy_yes (default auto — sell if partition sum > 1, buy if < 1).
`event`	No	Basket mode: event slug or full polymarket.com URL — checks every leg of the partition.
`market`	No	Single-market mode: market slug or full polymarket.com URL.
`size_usd`	No	Single-market: USD to spend (buys) or target proceeds (sells). Basket: settlement notional — shares per leg, each paying $1 at resolution. Default 1000, clamp 10–1,000,000.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, and the description adds rich behavioral detail: 'walks the ladder', returns specific metrics (top_of_book, vwap_fill_price, slippage_pp, shares_filled, max_fillable_usd, verdict), and discloses risk like 'partial basket fills convert an arb into an unhedged directional position'. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long but densely packed and well-structured, starting with a one-line summary, then separating single-market and basket modes with clear output lists, and ending with actionable guidance. Almost every sentence adds operational value; the length is justified by the tool's complexity, though a few phrases are slightly repetitive.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description fully compensates by enumerating all return fields for both modes, including verdicts, slippage metrics, basket-specific capture_ratio and thin_legs, and forced_directional_risk. It also covers prerequisites, defaults, size semantics, constraints, and failure modes, making the tool self-contained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Although schema coverage is 100%, the description adds substantial semantic meaning beyond the schema. It explains mode-dependent interpretations of size_usd ('max spend on buys, target proceeds on sells' vs 'settlement notional S'), describes side defaults in both modes, and clarifies the meaning of market vs event parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with 'Realizable-vs-theoretical edge check against live CLOB order-book depth', which clearly specifies the verb (check), resource (edge), and scope (live order-book depth). It also distinguishes itself from sibling tools by explicitly naming polymarket_arbitrage and polymarket_edges and positioning itself as the pre-trade validation step.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage instructions: 'USE THIS before acting on any polymarket_arbitrage SELL/BUY-EVERY-LEG signal or any polymarket_edges trade above ~$500'. It also explains mode selection via 'REQUIRES one of market or event' and clarifies why theoretical edge isn't capturable, giving both when-to-use and rationale.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

polymarket_kalshi_spreadPolymarket–Kalshi SpreadA

Read-onlyIdempotent

Inspect

Cross-venue spread between Kalshi and Polymarket for the same resolving question. The two venues sometimes price the same outcome 2-25pp apart because their participant pools differ — when the bet shapes are equivalent that delta is a real signal, when they aren't the tool says so. TWO MODES: (1) topic — 10 pre-mapped macro shortcuts ("fed", "btc", "cpi", "gdp", "sp500", "recession", "next_pope", "next_uk_pm", "next_israel_pm", "2028_president") auto-fetch the matching event on each venue. (2) explicit kalshi_event_ticker + polymarket_event_slug for custom pairings. RESPONSE: each venue's leg-by-leg prices (raw probability 0-1) plus matched spread[].top_spreads_pp (Kalshi − Polymarket) where the same outcome shows up on both sides. SAFETY FIELDS: compatibility_warning fires in two cases — (a) matched_pairs:0 with skipped_cross_type>0 means the venues frame the topic with non-equivalent bet shapes (e.g. Kalshi range_bucket point-in-time vs Polymarket cumulative_threshold touch-anywhere — no arb exists), (b) matched_pairs:0 with skipped_cross_type:0 and both venues >5 legs means the token-overlap matcher found nothing in common — events likely semantically unrelated despite the topic keyword. temporal_alignment{polymarket_month,kalshi_month,aligned} tells you whether the two events resolve in the same calendar period; aligned:false means spreads are mathematically meaningless across the temporal gap. skipped_cross_type / skipped_cross_subtype counters expose how many leg-pair comparisons were dropped (cross-type = metric_type mismatch like MoM vs YoY; cross-subtype = inequality mismatch like cum_ge vs cum_le). Real cross-venue spreads are rarer than the macro-shortcut list suggests — most pre-mapped topics return compatibility_warning today; pre-mapped ≠ tradeable.

ParametersJSON Schema

Name	Required	Description
`topic`	No	Pre-mapped: fed \| btc \| cpi \| gdp \| sp500 \| recession \| next_pope \| next_uk_pm \| next_israel_pm \| 2028_president
`kalshi_event_ticker`	No	Explicit Kalshi event ticker, e.g. "KXFED-26OCT". Overrides the topic-mapped Kalshi side.
`polymarket_event_slug`	No	Explicit Polymarket event slug, e.g. "fed-decision-in-june-825". Overrides the topic-mapped Polymarket side.

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly/idempotent, and the description goes far beyond by explaining compatibility_warning conditions, temporal_alignment implications, and skipped counter meanings. This provides deep insight into edge cases and behavior without contradicting the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long but well-structured into purpose, modes, response, and safety fields. Every sentence contributes meaningful information, though the ending reinforcement 'pre-mapped ≠ tradeable' is somewhat redundant with earlier warnings. Slightly verbose but appropriate for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having no output schema, the description thoroughly explains the response structure, including leg prices, spread calculations, compatibility warnings, temporal alignment, and skip counters. It covers all critical aspects for effective use, making it complete for this complex tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already provides 100% coverage with descriptions for all three parameters. The description adds value by explaining the override semantics (e.g., kalshi_event_ticker overrides the topic-mapped side) and clarifying the two-mode operation, which is not fully captured in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly defines the tool as computing the cross-venue spread between Kalshi and Polymarket for matching events, with a specific verb and resource. It also differentiates from siblings like polymarket_arbitrage by focusing on cross-venue comparisons and details two operation modes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states when to use each mode (topic shortcuts vs explicit tickers) and includes cautionary guidance that most pre-mapped topics return compatibility_warning today, implying they may not be tradeable. It does not explicitly name alternatives but provides sufficient context for appropriate use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recallRecallA

Read-onlyIdempotent

Inspect

Retrieve a value previously saved via remember, or list all saved keys (omit the key argument). Use to look up context the agent stored earlier — the user's target ticker, an address, prior research notes — without re-deriving it from scratch. Scoped to your identifier (anonymous IP, BYO key hash, or account ID). Pair with remember to save, forget to delete.

ParametersJSON Schema

Name	Required	Description	Default
`key`	No	Memory key to retrieve (omit to list all keys)

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already establish this is a read-only, idempotent, non-destructive operation. The description adds meaningful behavioral context: 'Scoped to your identifier (anonymous IP, BYO key hash, or account ID)' and the listing behavior when the key is omitted. This goes beyond what annotations provide, earning a 4.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is compact and front-loaded; it states the core function in the first sentence and packs purpose, usage examples, scoping, and companion tools into just a few sentences. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one optional param, no output schema) and the richness of the annotations and description, this is fully complete. It covers what the tool does, when to use it, how it is scoped, and how it relates to remember/forget. No critical gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema already fully documents the single 'key' parameter with 100% coverage, including the 'omit to list all' behavior. The description reiterates this without adding new semantic depth. Per the baseline for high schema coverage, a 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with a clear verb and resource: 'Retrieve a value previously saved via remember, or list all saved keys.' This explicitly differentiates the tool from its sibling tools (remember, forget) and explains its function without ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides strong usage context: 'Use to look up context the agent stored earlier' and names concrete examples (target ticker, address, research notes). It also references companion tools ('Pair with remember to save, forget to delete'). However, it stops short of explicitly stating when not to use this tool compared to other lookup methods, so it merits a 4 rather than a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recent_alertsRecent AlertsA

Read-onlyIdempotent

Inspect

Pull fired events from your subscription feed. Returns the most recent alerts the evaluator has written to your persisted feed — each carries source, citation_uri (pipeworx:// when available), and the raw event payload. Filter by type (e.g. "sec_8k") and/or since (ISO timestamp). Set mark_read:true to flag returned events read so the next call only shows newer ones. Polls work fine; the same feed is also at GET registry.pipeworx.io/alerts.json for scripts and dashboards.

ParametersJSON Schema

Name	Required	Description
`type`	No	Optional — filter to one subscription type.
`limit`	No	Max events to return (1-200, default 50).
`since`	No	Optional ISO timestamp — return events fired_at >= this time.
`mark_read`	No	Flag the returned events read in the same call (default false).
`unread_only`	No	Return only events where read_at is null (default false).

Tool Definition Quality

A3.8/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses a significant behavioral trait: setting mark_read:true flags events as read, affecting future calls. This directly contradicts the readOnlyHint=true annotation, as it involves a state change. The contradiction is serious, so the score is 1 despite the disclosure otherwise being useful.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise yet packs useful information: return payload, filter options, mark_read behavior, and an alternative access method. Every sentence contributes value, and it's front-loaded with the primary purpose, making it well-structured and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the return payload ('source, citation_uri, raw event payload'), filtering, and mark_read semantics, making it fairly complete without an output schema. It could mention the default limit or interaction with unread_only, but the essentials are there. The direct URL alternative also adds operational context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema already documents all 5 parameters with descriptions (100% coverage), so the baseline is 3. The description adds extra meaning by providing an example type ('sec_8k') and explaining that mark_read affects future calls, which goes beyond the schema's basic parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: 'Pull fired events from your subscription feed.' It specifies the resource (subscription feed alerts), the action (pull), and provides details about the returned data. It effectively distinguishes itself from sibling tools like list_subscriptions by focusing on fired events rather than subscriptions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use the tool: for polling alerts, with filtering options, and mentions an alternative for scripts/dashboards (the direct GET URL). It doesn't explicitly contrast with sibling tools, but the context of subscription feed and alternative endpoint provides clear guidance on when this tool is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recent_changesRecent ChangesA

Read-onlyIdempotent

Inspect

"What's new with X" / "latest on Y" / "what happened to Z this week / month / quarter" / "updates on Acme" / "news on Tesla recently" / "what's happening with Apple" — change feed for a company in the last N days/weeks/months in ONE parallel call. Fans out to SEC EDGAR (filings since since), GDELT→GNews fallback (news mentions in window — GDELT preferred, GNews when rate-limited or 5xx), USPTO (patents granted; PatentsView API sunset May 2025 so this soft-fails until reactivated). since accepts ISO date ("2026-04-01") or relative shorthand ("7d", "30d", "3m", "1y"). Returns structured changes[] grouped by source + total_changes count + pipeworx:// citation URIs. Use entity_profile instead when you want the static profile (filings + fundamentals + LEI + patents) regardless of window.

ParametersJSON Schema

Name	Required	Description
`type`	Yes	Entity type. Only "company" supported today.
`since`	Yes	Window start — ISO date ("2026-04-01") or relative ("7d", "30d", "3m", "1y"). Use "30d" or "1m" for typical monitoring.
`value`	Yes	Ticker (e.g., "AAPL") or zero-padded CIK (e.g., "0000320193").

Tool Definition Quality

A5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the annotations (read-only, idempotent), the description discloses fallback behavior (GDELT preferred, GNews on rate-limit), USPTO's soft-fail due to API sunset, and that it returns structured changes with citations. This adds meaningful operational context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is dense but every sentence provides actionable detail—examples, fallback chains, API sunset caveat, return shape, and alternative tool. It's appropriately structured with the purpose front-loaded in the opening phrase.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex multi-source tool with no output schema, the description covers the return format (changes[], total_changes, pipeworx:// URIs), the behavior of each source, and pointers to alternatives, making it self-sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Although the input schema has 100% coverage, the description enriches parameter meaning by explaining that `since` accepts ISO dates or relative shorthand with examples, suggests typical usage ('30d' or '1m'), and clarifies `value` can be a ticker or zero-padded CIK.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly defines the tool as a change feed for a company over a time window, with specific verbs like 'Fans out' and lists the data sources. It distinguishes itself from the sibling entity_profile by contrasting dynamic changes vs static profile.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly gives use-case examples ('What's new with X') and states an alternative: 'Use entity_profile instead when you want the static profile...' It also clarifies when the fallback happens (GDELT rate-limited or 5xx).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rememberRememberA

Idempotent

Inspect

Save data the agent will need to reuse later — across this conversation or across sessions. Use when you discover something worth carrying forward (a resolved ticker, a target address, a user preference, a research subject) so you don't have to look it up again. Stored as a key-value pair scoped by your identifier. Authenticated users get persistent memory; anonymous sessions retain memory for 24 hours. Pair with recall to retrieve later, forget to delete.

ParametersJSON Schema

Name	Required	Description	Default
`key`	Yes	Memory key (e.g., "subject_property", "target_ticker", "user_preference")
`value`	Yes	Value to store (any text — findings, addresses, preferences, notes)

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

While annotations already indicate a non-readonly, idempotent write operation, the description adds meaningful behavioral context: persistence varies by authentication (24-hour retention for anonymous, persistent for authenticated), scoping by identifier, and key-value storage format. It does not explicitly mention overwrite behavior, but the idempotentHint covers that implicitly. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences, front-loaded with the core purpose, and every sentence adds value: what it does, when to use, storage semantics, persistence, and pairing with related tools. There is no redundant or filler content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple two-parameter tool with no output schema, the description covers purpose, usage context, persistence, scoping, and relationship to siblings. It is fully sufficient for an agent to decide when and how to use it, and the annotations fill in the safety profile.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and both parameters are well-described with examples in the schema. The description only reinforces the key-value pair concept without adding new semantic meaning. This meets the baseline for fully-schema-documented parameters but does not exceed it.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function with a specific verb ('Save data') and resource ('data the agent will need to reuse later'), and it differentiates from sibling tools by explicitly naming recall and forget as companions. The purpose is unambiguous and distinct from any other tool in the list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It provides explicit when-to-use guidance ('Use when you discover something worth carrying forward...') and references alternatives for retrieval and deletion ('Pair with recall to retrieve later, forget to delete'). This gives the agent clear context for when to invoke this tool versus related operations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

resolve_entityResolve EntityA

Read-onlyIdempotent

Inspect

"What's the ticker for…" / "find the CIK for…" / "what's the RxCUI for…" / "look up the ID for…" / "what is X's official identifier" — resolve a user-spoken NAME to the canonical/official identifier other tools require as input. Use FIRST whenever you have a name but need an ID. SUPPORTED TYPES: "company" (returns ticker + 10-digit CIK + company_name from SEC EDGAR + pipeworx://edgar/company/{cik} citation URI; accepts ticker, CIK, or company name as input — auto-disambiguated), "drug" (returns RxCUI + ingredient + brand from RxNorm + pipeworx://rxnorm/{rxcui} citation; accepts brand or generic name). Each call cascades through several lookup endpoints internally — using resolve_entity replaces 2-3 manual lookups.

ParametersJSON Schema

Name	Required	Description	Default
`type`	Yes	Entity type: "company" or "drug".
`value`	Yes	For company: ticker (AAPL), CIK (0000320193), or name. For drug: brand or generic name (e.g., "ozempic", "metformin").

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds valuable behavior beyond that: it explains internal cascading through multiple endpoints, returns citation URIs, and mentions auto-disambiguation. This gives the agent a stronger mental model of what happens. It doesn't cover failure modes or rate limits, but the annotation coverage reduces that burden.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with natural language queries that signal when to use the tool. Every sentence serves a purpose: usage directive, supported types, return details for each type, and efficiency benefit. It is longer than average but each clause adds necessary context, with a clear structure for type-specific behavior.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema exists, the description thoroughly explains return values for both types, including citation URIs and fields like ticker, CIK, RxCUI. It also covers input formats and disambiguation. The tool's complexity (cascading lookups) is well-documented, and annotations fill safety context. It is sufficient for an agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description enhances this by providing concrete examples for each parameter type (e.g., "AAPL", "0000320193", "ozempic") and explaining that company input is auto-disambiguated. It also clarifies what each type returns, adding meaning beyond the raw schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states a specific verb+resource: "resolve a user-spoken NAME to the canonical/official identifier." It clearly distinguishes itself from sibling tools by framing it as a prerequisite for other tools and specifying supported entity types (company, drug) with exact return values. This is not a vague purpose; it precisely defines the tool's role.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says "Use FIRST whenever you have a name but need an ID." This is an explicit when-to-use directive. It also clarifies that it replaces 2-3 manual lookups, reinforcing when it is most efficient. While no specific alternative tools are named, the directive to use it first is effectively an exclusion of doing manual lookups or using other tools prematurely.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

scan_competitor_ai_presenceScan Competitor AI PresenceA

Read-onlyIdempotent

Inspect

Compare AI visibility across multiple entities side-by-side. Probes each entity (your brand + N competitors) with ai_visibility_check, ranks by score, surfaces which is most/least recognized. Useful for competitive AI-marketing audits: "does Claude know about us as well as our competitors?". Returns ranked list with score, confidence, signal density per entity.

ParametersJSON Schema

Name	Required	Description
`models`	No	Which models to probe. Supported: "workers-ai" (free default), "anthropic" (requires _apiKey). Omit for just workers-ai.
`_apiKey`	No	Optional Anthropic API key — only if "anthropic" is in models. Passed to api.anthropic.com per probe.
`context`	No	Optional shared context applied to every probe (e.g. "B2B SaaS", "Boston restaurant"). Disambiguates common names.
`entities`	Yes	Array of 2-8 entities to compare (brand/business/product names). First entry treated as the "subject" for narrative; rest are competitors.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already establish read-only, idempotent, non-destructive behavior. The description adds meaningful context: it probes each entity by calling ai_visibility_check, ranks by score, and returns a structured result (score, confidence, signal density). It doesn't disclose potential latency or API key requirement, but that's partially covered by schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences: statement of purpose, mechanism/use case, and output summary. No fluff, front-loaded with the core verb, and every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 params with full schema descriptions, strong annotations, and no output schema, the description compensates by explaining the return shape (ranked list with score, confidence, signal density). It also mentions the underlying sub-call, making the tool's behavior fully covered for selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with each parameter well-described (e.g., entities explains the first-entry-as-subject semantics). The description adds no additional parameter detail beyond what the schema provides, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with a clear verb+object: 'Compare AI visibility across multiple entities side-by-side.' It further specifies the mechanism (uses ai_visibility_check), ranking output, and a concrete use case (competitive AI-marketing audits). This differentiates it from sibling tools like ai_visibility_check (single entity) and compare_entities.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It states when it's useful ('competitive AI-marketing audits') and gives an example question. However, it doesn't explicitly contrast with alternative tools like compare_entities or mention exclusions, though the context strongly implies multi-entity side-by-side comparison.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

scan_dependencyScan DependencyA

Read-onlyIdempotent

Inspect

Composite "should I add this npm package to my project" check in ONE call — fans out across deps.dev (license + advisories + version history) and bundlephobia (gzipped/minified bundle size, dependency count, ESM/tree-shake support). Use whenever an agent asks "is X safe / popular / small" or "what does adding lodash cost me". Returns a summary block (is_latest, license, published_at, advisory_count, bundle_kb_min, bundle_kb_gz, dependency_count, has_esm, tree_shakeable), per-advisory detail, links, and a list of recent alternative versions. NPM ecosystem only in v1; PyPI / Maven / Cargo / Go fall under deps.dev:version directly. Partial failures degrade gracefully — bundlephobia's first measurement on a new version can take 5-30s; sources_failed will list it if it times out, the rest still returns.

ParametersJSON Schema

Name	Required	Description	Default
`package`	Yes	npm package name. Scoped packages (e.g. "@types/node") are accepted.
`version`	No	Specific version to check (e.g., "18.3.1"). Defaults to the latest published version when omitted.

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark it read-only, open-world, idempotent, non-destructive. The description adds key behavioral traits beyond those: it fans out to two external services, returns a structured summary, degrades gracefully on partial failures, and notes that bundlephobia's first measurement can take 5-30s with a sources_failed field. This is rich context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is one dense paragraph but every sentence carries unique information: purpose, usage, return fields, ecosystem scope, and failure behavior. It is front-loaded and not verbose, though it could benefit from more structured formatting given the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description fully specifies the return structure (summary block fields, advisory detail, links, alternative versions) and important caveats (latency, partial failure, ecosystem scope). This is complete for an agent to decide when to call and how to interpret results.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already documents both parameters with 100% coverage, including scoped package support and default version behavior. The description does not add new parameter semantics beyond restating the composite nature; with high schema coverage, baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a composite 'should I add this npm package' check that fans out across deps.dev and bundlephobia, with a specific scope (license, advisories, bundle size, tree-shaking). It distinguishes from siblings by naming the exact question it answers and the data sources.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use whenever an agent asks "is X safe / popular / small" or "what does adding lodash cost me"' and provides a clear exclusion: 'NPM ecosystem only in v1; PyPI / Maven / Cargo / Go fall under deps.dev:version directly.' This gives direct when-to-use and when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_withinSearch Within a SourceA

Read-onlyIdempotent

Inspect

Semantic search INSIDE a fetched record. Pass the text you already pulled (e.g. a SEC 10-K body, an article, a long tool result) plus a natural-language query; get back the top-N passages with character offsets and similarity scores. Use when the record is too big to cram into the prompt — search_within saves context, returns only the passages that matter, and every passage carries an offset so the agent can verify a verbatim quote. Pairs with ask_pipeworx_grounded: fetch with the gateway, ground over the relevant passages instead of the whole document. BGE-base-en embeddings + cosine over 500-char overlapping windows; cap is 200K chars (longer inputs are truncated and flagged).

ParametersJSON Schema

Name	Required	Description
`text`	Yes	The document text to search inside (max ~200K chars).
`limit`	No	Max passages to return (1-20, default 5).
`query`	Yes	Natural-language query — what passages do you want? E.g. "supply-chain risk", "fiscal year 2024 revenue", "drug interactions with warfarin".

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnly=True, idempotent, openWorld, and non-destructive, and the description supplements this with valuable details: truncation at 200K chars with a flag, embedding model (BGE-base-en), cosine similarity, and 500-char overlapping windows. It also mentions the offset verification benefit, which annotations do not convey. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a bit lengthy at five sentences, but each sentence adds essential context: usage trigger, pairing, truncation, and technical details. It is front-loaded with purpose and avoids fluff. A slight deduction for the technical embedding sentence being arguably optional, but it remains informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description explains the return format: passages with character offsets and similarity scores. It also discloses the character cap and truncation behavior. Given the tool's moderate complexity and well-covered parameters, this description is complete enough for an agent to select and invoke it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds beyond the schema by providing example use cases for the 'text' parameter (SEC 10-K, article) and example queries for 'query', which enriches understanding. It does not repeat the limit parameter details, but the schema already covers them.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with 'Semantic search INSIDE a fetched record', a specific verb+resource combination that clearly distinguishes it from sibling tools like ask_pipeworx. It further clarifies the input ('text you already pulled') and the output (top-N passages with offsets and similarity scores), making the purpose unmistakable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'Use when the record is too big to cram into the prompt'. It also contrasts with a sibling tool: 'Pairs with ask_pipeworx_grounded: fetch with the gateway, ground over the relevant passages instead of the whole document', giving clear context on how it fits in a workflow.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sendSendC

Read-onlyIdempotent

Inspect

Send a single email.

ParametersJSON Schema

Name	Required	Description	Default
`cc`	No
`to`	Yes
`bcc`	No
`tag`	No
`from`	Yes
`headers`	No
`replyto`	No
`subject`	No
`htmlbody`	No
`textbody`	No
`trackLinks`	No
`trackopens`	No
`messagestream`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`To`	No	Recipient email address
`Message`	No	Status or error message
`ErrorCode`	No	Error code if submission failed
`MessageID`	No	Unique message identifier
`SubmittedAt`	No	ISO timestamp when email was submitted

Tool Definition Quality

C2.9/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotations declare readOnlyHint=true, yet sending an email is a side-effect-producing operation, not a read. The description does not disclose this or any other behavior (e.g., delivery, failure modes). This directly contradicts the readOnlyHint.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded sentence with no wasted words. However, given the tool's complexity (13 parameters), it is almost too terse, bordering on under-specification rather than useful conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description fails to convey the tool's full purpose beyond sending an email, especially with 13 parameters, an output schema, and contradictory annotations. It leaves the agent without critical context needed to invoke the tool correctly, such as required recipient formats or behavior differences.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 13 parameters and 0% schema description coverage, the description adds no parameter meaning. It does not explain required fields like 'from' and 'to', nor any optional fields like 'cc', 'subject', or tracking options. The burden falls entirely on the schema examples.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Send a single email' clearly states the verb (send), resource (email), and scope (single), which also distinguishes it from the sibling tool 'send_batch'. This is specific and immediately actionable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The phrase 'single email' provides clear context for when to use this tool versus a batch send, but it does not explicitly name alternatives or exclusion criteria. It stops short of explicit 'when not to use' guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

send_batchSend BatchA

Idempotent

Inspect

Send up to 500 emails.

ParametersJSON Schema

Name	Required	Description	Default
`emails`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes	Number of items returned.
`items`	Yes

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds the key behavioral constraint of a 500-email limit. Annotations already declare safety hints (readOnly=false, destructive=false, idempotentHint=true), so the description does not need to repeat those. It does not mention error handling or what happens if the limit is exceeded, but the added limit is useful context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, focused sentence with no wasted words. It front-loads the action and the key constraint, making it extremely concise and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that the tool sends batch emails, the description is minimal but covers the core purpose and limit. However, it lacks detail on email structure, authentication, return format, or error conditions. Since there is an output schema and annotations, some gaps are mitigated, but for a tool with this scope, a bit more context would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has one 'emails' array parameter with no description and 0% coverage. The description adds the limit of up to 500 emails, but does not describe the required fields per email (e.g., 'to', 'from', 'subject', 'body'). The schema example provides structure, but the description itself offers minimal semantic support.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Send') and the resource ('up to 500 emails'), which distinguishes it from the sibling 'send' tool. The mention of a batch size limit makes its purpose specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage: send when you have multiple emails to send, up to 500. However, there is no explicit guidance on when to prefer this over the sibling 'send' tool or any alternatives, so it is implied rather than stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

serverServerA

Read-onlyIdempotent

Inspect

Fetch configuration and metadata for the current Postmark server, including name, color, delivery settings, bounce/spam threshold, and message stream settings.

ParametersJSON Schema

Name	Required	Description	Default
No parameters

Output Schema

ParametersJSON Schema

Name	Required	Description
`ID`	No	Server identifier
`Name`	No	Server name
`Color`	No	Server color in dashboard
`ApiTokens`	No	List of API tokens for this server
`TrackLinks`	No	Link tracking setting (None, HtmlAndText, HtmlOnly, TextOnly)
`TrackOpens`	No	Whether open tracking is enabled
`OpenHookUrl`	No	Open tracking webhook URL if configured
`SpamHookUrl`	No	Spam complaint webhook URL if configured
`ClickHookUrl`	No	Click tracking webhook URL if configured
`BounceHookUrl`	No	Bounce webhook URL if configured
`InboundDomain`	No	Inbound domain for receiving emails
`InboundHookUrl`	No	Inbound webhook URL if configured
`TrackingDomain`	No	Custom tracking domain
`DeliveryHookUrl`	No	Delivery webhook URL if configured
`RawEmailEnabled`	No	Whether raw email API is enabled
`SmtpApiActivated`	No	Whether SMTP API is activated
`PostFirstOpenOnly`	No	Post only first open

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, so the description adds value by enumerating the specific attributes returned (name, color, delivery settings, bounce/spam threshold, message stream settings). This gives the agent useful context beyond the annotation flags, though it doesn't discuss return format or potential API subtleties.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The entire description is a single, well-structured sentence that front-loads the verb and resource, then lists specific examples. No wasted words or redundant restatements of the name.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple parameterless tool with an output schema, this description is complete. It tells the agent what the tool fetches and includes enough specifics about the contained data. The output schema handles return-value details, so no further description is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The tool has zero parameters, so there is no parameter schema to document. The description's enumeration of returned data compensates for any need to explain what the tool operates on. Baseline for no-parameter tools is 4, and no extra parameter details are required.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Fetch') and clearly identifies the resource ('configuration and metadata for the current Postmark server'), listing concrete contents like delivery settings and bounce/spam threshold. This distinguishes it from sibling tools focused on bounces, messages, or delivery stats.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The context is clear: use this tool to retrieve the current Postmark server's configuration and metadata. It doesn't explicitly mention alternatives or when not to use it, but for a simple read-only config fetch, the implied usage is sufficient and no exclusions are needed.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

subscribeSubscribe to AlertsA

Idempotent

Inspect

Create a proactive monitoring subscription to a live-data event stream. Returns the new subscription id. Requires a Pipeworx OAuth account (anonymous + BYO cannot persist subscriptions). Supported types: "sec_8k" (8-K filings matching ticker + item codes — e.g. items:["5.02"] = officer change), "polymarket_edge" (Polymarket↔Kalshi cross-venue mispricings — params:{topic:"fed"}), "fred_series" (new FRED observations — params:{series_id:"UNRATE"}). Delivery channels: feed (always on — pull via recent_alerts or GET registry.pipeworx.io/alerts.json), and optionally email (set delivery:{email:"you@x.com"}) or sms (delivery:{sms:"+15551234567"} — phone must be verified at /account first; 10/day cap).

ParametersJSON Schema

Name	Required	Description
`type`	Yes	Subscription type.
`params`	Yes	Type-specific filter. sec_8k: {ticker:"AAPL", items?:["5.02","1.01"]}. polymarket_edge: {topic:"fed", min_spread_bps?:500}. fred_series: {series_id:"UNRATE"}. patent_grant: {applicant:"Apple Inc."}. clinical_trial: {sponsor?:"Pfizer", condition?:"lung cancer", phase?:"PHASE3"} (sponsor or condition required).
`delivery`	No	Optional delivery channels in addition to the always-on persistent feed. {email:"you@x.com"} sends a templated alert per fired event. {sms:"+15551234567"} sends an SMS per event — must match the verified phone on the caller's account (verify at https://pipeworx.io/account first; 10/day cap). {webhook:"https://..."} POSTs each event JSON to your endpoint, HMAC-signed — the response includes delivery.webhook_secret (whsec_…) ONCE; verify X-Pipeworx-Signature = sha256 HMAC of "<X-Pipeworx-Timestamp>.<raw body>". Auto-disabled after 10 consecutive failing runs.

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses auth requirements, the always-on feed behavior, phone verification for SMS, and the 10/day cap. Annotations already cover read-only/destructive/idempotent hints; the description adds operational context beyond those. No contradiction exists with the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose, but it quickly becomes a dense single paragraph with semicolon-separated details and inline examples. While every sentence is informative, the lack of bullet points or sub-sections hurts readability for a tool with this many subscription types and options.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the three main subscription types in detail and all delivery channels except webhook (which is thoroughly documented in the schema). It states the return value (subscription id) and prerequisites. Given the schema's richness, the overall context is complete for a create operation, with minor omissions in the main description filled by the schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already provides 100% parameter descriptions, including type-specific params and delivery channel details. The description adds narrative examples (e.g., sec_8k items) and clarifies the delivery semantics, but it largely overlaps with the schema's rich descriptions, so it earns the baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with a specific verb-resource pair: 'Create a proactive monitoring subscription to a live-data event stream.' It enumerates supported subscription types and distinguishes itself from sibling tools like list_subscriptions and unsubscribe via the action 'Create.' This is a textbook clear purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It states a key prerequisite (requires Pipeworx OAuth account, anonymous/BYO cannot persist), explains delivery channel behavior (feed always on, optional email/sms), and even points to alternative pull mechanisms (recent_alerts or URL). It does not explicitly say 'use this instead of X', but the usage context is strongly implied for proactive monitoring.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

suggest_questionsWhat Can I Ask Pipeworx?A

Read-onlyIdempotent

Inspect

What can I ask Pipeworx? / what is Pipeworx good for? / what can you do? / give me ideas / show me examples / getting started / what data do you have? — the onboarding entry point for an agent that just connected and wants to know what is worth asking. Returns category-bucketed example questions (company financials, drugs & clinical trials, economics, real estate, prediction markets, weather, government & patents, science & academia, news) — each with the exact tool + argument shape that answers it, drawn from the live catalog of thousands of tools. Call with no arguments for the full spread, or pass topic (e.g. "finance", "pharma", "betting") to focus. Use this FIRST when you do not yet know what Pipeworx can do for you, or to learn how to call the meta-tools (ask_pipeworx, entity_profile, compare_entities, etc.).

ParametersJSON Schema

Name	Required	Description	Default
`topic`	No	Optional focus area: finance \| pharma \| economics \| real-estate \| betting \| weather \| government \| science \| news. Omit for a cross-category spread.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, openWorldHint=true, idempotentHint=true, and destructiveHint=false, so safety is known. The description adds value by detailing the return content (category-bucketed example questions with tool+argument shape) and noting the live catalog, which aligns with openWorldHint. No contradictions; added behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single dense paragraph but front-loaded with the core purpose, then usage guidance, then parameter options. Every sentence contributes—there is no filler. It could be improved with bullet points for scannability, but it remains efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description must explain return values, and it does: 'Returns category-bucketed example questions ... each with the exact tool + argument shape.' It also covers invocation patterns (no args or topic) and positions the tool within the meta-tool ecosystem. For a simple 1-parameter tool, this is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema covers the single optional parameter `topic` with a full description of allowed values and default behavior (100% coverage). The description reinforces this with examples ('finance', 'pharma', 'betting') but does not add new semantic details. Baseline 3 is appropriate because the schema already provides the necessary meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with sample user queries and immediately identifies the tool as 'the onboarding entry point for an agent that just connected.' It clearly states that the tool returns category-bucketed example questions with the exact tool and argument shape, making it distinct from siblings like ask_pipeworx (which likely answers questions) and discover_tools (which lists tools).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this FIRST when you do not yet know what Pipeworx can do for you' and explains both call modes (no arguments for full spread, or pass `topic`). It also points to meta-tools to learn about. However, it does not explicitly specify when not to use it (e.g., 'use ask_pipeworx for actual queries'), relying on the implication that it is a starting point rather than a replacement.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

unsubscribeUnsubscribe from AlertsA

Idempotent

Inspect

Cancel a subscription by id. Ownership is enforced — you can only cancel your own subscriptions. The row is deactivated (not deleted) so its historical events stay available via recent_alerts.

ParametersJSON Schema

Name	Required	Description	Default
`id`	Yes	Subscription id (uuid) returned by subscribe.

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses behaviors beyond annotations: ownership enforcement and the soft-delete (deactivate) behavior, ensuring historical events remain via recent_alerts. This complements the annotations (destructiveHint=false, idempotentHint=true) without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences, front-loaded with the core action, and every clause adds meaningful context without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple cancel tool with rich annotations and 100% schema coverage, the description fully covers the action, ownership rule, side effects, and data retention. No output schema is present, but return values are not needed for this operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single 'id' parameter, and the schema description already explains it as a UUID from subscribe. The description doesn't add parameter-specific details, so the baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states a specific action ('Cancel a subscription by id') with a clear resource and scope. It distinguishes itself from siblings like subscribe and list_subscriptions by being the dedicated cancellation tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context: ownership enforcement and the idempotent/non-destructive nature. It doesn't explicitly name alternatives or when-not-to-use, but the context is sufficient for correct selection among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_claimValidate ClaimA

Read-onlyIdempotent

Inspect

"Is it true that…" / "fact check" / "verify the claim that…" / "did X really…" / "was Y actually…" / "confirm or refute" / "true or false" — natural-language claim verification against authoritative sources. Use whenever the agent needs to check whether something a user said is factually correct. Company-financial claims (revenue, net income, cash for public US companies) verify via the structured SEC EDGAR + XBRL fast path with exact percent-delta math; ANY OTHER factual claim (macro statistics, rates, prices, drug data, records) automatically falls through to the grounded pipeline — routed to the right live source, answered with verbatim evidence, then judged. Returns a verdict (confirmed / approximately_correct / refuted / inconclusive / unsupported), the grounded or structured actual value with pipeworx:// citation, and reasoning. Replaces 4–6 sequential calls (NL parsing → entity resolution → data lookup → comparison).

ParametersJSON Schema

Name	Required	Description	Default
`claim`	Yes	Natural-language factual claim, e.g., "Apple's FY2024 revenue was $400 billion" or "Microsoft made about $100B in profit last year".
`tolerance_pct`	No	Max percent deviation still graded approximately_correct (0.5–50). Overrides the tolerance implied by the claim wording — set 1–2 for hallucination detection where any material error must be refuted. Default: implied by wording, capped at 5.

Tool Definition Quality

A4.3/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint, and destructiveHint=false, so safety is covered. The description adds meaningful behavioral context: the two-path processing (SEC EDGAR for public company financials vs grounded pipeline for everything else), the verdict enum, citation format, and the fact that it consolidates multiple calls.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately long but information-dense; the opening trigger phrases are front-loaded and the two-path explanation earns its place. It could be slightly tightened by removing the parenthetical paraphrases, but overall it is well-structured and every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description compensates by detailing the exact return values (verdict types, citation, reasoning) and the distinction between financial and non-financial claims. It also notes the tool's efficiency advantage, but omits potential limitations like non-US companies or ambiguous claims, though the verdict enum covers inconclusive/unsupported cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema descriptions for both claim and tolerance_pct are 100% complete, so the description does not need to add parameter semantics. The tolerance_pct schema already explains the override and hallucination-detection use. The main description provides example claims but adds no new parameter-level detail beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with natural-language trigger phrases and states 'natural-language claim verification against authoritative sources.' It clearly identifies the verb (verify) and resource (claims) and distinguishes this from sibling tools by noting it replaces 4–6 sequential calls and handles both structured financial and grounded non-financial claims.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use whenever the agent needs to check whether something a user said is factually correct,' establishing a clear use case. It also gives guidance on how different claim types are routed and mentions tolerance_pct for hallucination detection, but does not explicitly name alternative tools or state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Discussions

No comments yet. Be the first to start the discussion!

Related MCP Servers

postman-mcp
API Testing Testing & QA Tools Web Scraping
PostmanV3
A
license
C
quality
D
maintenance
mcp-PostmanV3
Last updated 2026-01-13
77
27
1
MIT
Postcept/mcpofficial
Autonomous Agents Testing & QA Tools
Postcept
A
license
A
quality
A
maintenance
MCP server for Postcept
Last updated 2026-07-22
6
105
1
MIT
shipmail-mcp
Email Communication
shipmail-to
A
license
A
quality
A
maintenance
Official MCP server for Shipmail, enabling agents to manage domains, mailboxes, messages, threads, webhooks, and suppressions via natural language.
Last updated 2026-08-02
100
2,195
1
MIT
ox-mcp
Calendar Management Communication Email
Rheopyrin
A
license
B
quality
B
maintenance
MCP server for Open-Xchange & standards-based mail: email (IMAP/SMTP), Sieve filters, CalDAV calendar, CardDAV contacts, free/busy.
Last updated 2026-07-19
12
33
73
MIT

View all MCP Servers

Try in Browser

Server Details

Available Tools

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Output Schema

Discussions

Related MCP Servers

Postcept/mcpofficial

Your Connectors

Resources