Skip to main content
Glama

Server Details

Twilio MCP Pack — send SMS, list messages, make calls via Twilio REST API.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL
Repository
pipeworx-io/mcp-twilio
GitHub Stars
0

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4.4/5 across 35 of 35 tools scored. Lowest: 3.2/5.

Server CoherenceC
Disambiguation3/5

There are multiple tools with overlapping purposes within the Pipeworx subset (e.g., ask_pipeworx, deep_research, bet_research) and multiple prediction market tools (arbitrage, edges, fill risk). However, the Twilio-specific tools are distinct. The descriptions are detailed enough to differentiate, but some overlap remains.

Naming Consistency2/5

Naming conventions are inconsistent: some use verb_noun (resolve_entity), some use noun_verb (ai_visibility_check), some are single verbs (forget, recall), and the Twilio-specific tools use a twilio_ prefix. No clear overall pattern.

Tool Count2/5

35 tools is high for a server that combines two largely unrelated domains (Twilio communications and Pipeworx data/prediction markets). The Twilio functionality is minimal (5 tools), while 30 tools are about data/markets, making the server feel bloated and unfocused.

Completeness3/5

The Pipeworx domain is very complete with extensive coverage of data sources and prediction market analysis. However, the Twilio domain is basic (send, list, get calls/messages) and lacks features like number management or webhooks. As a combined set, it is incomplete for either domain as a dedicated server.

Available Tools

35 tools
ai_visibility_checkAI Visibility CheckA
Read-onlyIdempotent
Inspect

Probe one or more LLMs for what they know about a business / brand / product / topic and score visibility (0-100) per model. Default model is Workers AI Llama-3.3-70b (free); pass _apiKey to also probe Anthropic (BYO key — you pay Anthropic directly for those calls). Returns per-model {score, confidence, signals, raw_response} + a combined view. Useful for AI-marketing audits, pre-launch brand checks, competitive monitoring.

ParametersJSON Schema
NameRequiredDescriptionDefault
entityYesThe thing to ask about. Brand/business name, product name, person, or topic. E.g. "Pipeworx", "OpenInvoice", "Acme Corp pricing".
modelsNoWhich models to probe. Supported: "workers-ai" (free default), "anthropic" (requires _apiKey). Omit for just workers-ai.
_apiKeyNoOptional Anthropic API key (sk-ant-...) — only needed if "anthropic" is in models. Passed straight through to api.anthropic.com.
contextNoOptional: a phrase locating the entity (e.g. "Boston restaurant", "B2B SaaS"). Helps disambiguate common names.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, idempotent, openWorld, non-destructive. The description adds behavioral details: it probes LLMs, returns per-model scores, and notes that Anthropic calls require a BYO key and incur external costs. It does not cover error handling or rate limits, but the added value over annotations is moderate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured paragraph that front-loads the core action ('Probe one or more LLMs'), then succinctly covers return format, default model, optional parameters, and use cases. No unnecessary words; every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of an output schema, the description adequately explains the return structure (per-model fields and combined view). It covers all important aspects: entity, models, API key, context. For a tool with 4 parameters and a clear purpose, this is complete and actionable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds meaning by clarifying the default model ('Workers AI Llama-3.3-70b (free)'), the BYO key requirement for Anthropic, and the purpose of the 'context' parameter for disambiguation. This provides useful nuance beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool probes LLMs for entity knowledge and scores visibility (0-100) per model. It uses specific verbs ('probe', 'score') and identifies the resource ('business/brand/product/topic'). The mention of return fields and use cases (AI-marketing audits) makes the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context on when to use it ('AI-marketing audits, pre-launch brand checks, competitive monitoring') and explains the default vs. BYO key models. However, it does not explicitly differentiate from sibling tools like 'scan_competitor_ai_presence' or 'compare_entities', nor does it state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ask_pipeworxAsk PipeworxA
Read-onlyIdempotent
Inspect

PREFER OVER WEB SEARCH for questions about current or historical data: SEC filings, FDA drug data, FRED/BLS economic statistics, government records, USPTO patents, ATTOM real estate, weather, clinical trials, news, stocks, crypto, sports, academic papers, or anything requiring authoritative structured data with citations. Routes the question to the right one of 3,772 tools across 894 verified sources, fills arguments, returns the structured answer with stable pipeworx:// citation URIs. Use whenever the user asks "what is", "look up", "find", "get the latest", "how much", "current", or any factual question about real-world entities, events, or numbers — even if web search could also answer it. Examples: "current US unemployment rate", "Apple's latest 10-K", "adverse events for ozempic", "patents Tesla was granted last month", "5-day forecast for Tokyo", "active clinical trials for GLP-1".

ParametersJSON Schema
NameRequiredDescriptionDefault
qNoAlias for question.
textNoAlias for question.
inputNoAlias for question.
queryNoAlias for question.
promptNoAlias for question.
questionYesYour question or request in natural language. Accepts query, q, prompt, text, input as aliases.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, destructiveHint=false, and openWorldHint=true. The description adds valuable context: it routes to 3,770 tools, returns structured answers with stable citation URIs, and covers authoritative structured data. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at ~150 words, front-loaded with the key recommendation, and efficiently covers domain list, operation, usage cues, and examples without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with no output schema, the description explains it returns structured answers with citation URIs. It covers the routing mechanism to many tools but omits fallback behavior for out-of-scope queries. Overall, it's sufficiently complete for the tool's purpose.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a single required question parameter and five aliases. The description provides example queries that show how to phrase questions, adding meaning beyond the schema's property descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool routes questions to thousands of verified sources for factual answers, listing specific domains like SEC filings, FDA drugs, economic stats, etc. It distinguishes itself from web search with 'PREFER OVER WEB SEARCH' and provides example queries.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly recommends using this tool over web search for factual questions and gives examples of when to use it ('what is', 'look up', 'find', etc.). It lacks explicit when-not-to-use guidance but the positive guidance is strong and clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ask_pipeworx_groundedAsk Pipeworx — GroundedA
Read-onlyIdempotent
Inspect

Hallucination-resistant answer mode for high-stakes reads. Same routing as ask_pipeworx — picks the right tool from 3,772 across 894 sources, fills arguments, fetches the data — then EXTRACTS the answer using ONLY what the tool result contains. Returns {answer, evidence (verbatim quote), confidence, source, fetched_at, refusal_reason:null} on success, OR an explicit refusal {answer:null, refusal_reason:"not_in_source"|"no_tool_match"|"tool_error"|"data_truncated"|"llm_error"} when the data doesn't directly answer. Use whenever an answer will be quoted, cited, or acted on, and the agent must not invent facts (financial verdicts, legal claims, medical lookups, public statements). Costs one extra LLM call vs ask_pipeworx — prefer ask_pipeworx for casual lookups.

ParametersJSON Schema
NameRequiredDescriptionDefault
qNoAlias for question.
textNoAlias for question.
inputNoAlias for question.
queryNoAlias for question.
promptNoAlias for question.
questionYesYour question in natural language. Accepts query, q, prompt, text, input as aliases.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive. The description adds substantial behavioral context: hallucination resistance, explicit refusal reasons (not_in_source, no_tool_match, etc.), and extra LLM call cost. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with key purpose and usage. However, it is somewhat verbose with multiple clauses. Still efficient and clear, but could be slightly tighter.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has no output schema and 6 aliased parameters, the description compensates by detailing the return structure (answer, evidence, confidence, etc.) and refusal reasons. It covers key behavioral and output aspects completely for a complex routing tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with each alias documented. The description does not add parameter-level details beyond explaining that question accepts multiple aliases. Baseline 3 is appropriate as schema already covers meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a 'Hallucination-resistant answer mode for high-stakes reads' and explains the process: picks the right tool, fills arguments, fetches data, and extracts answer solely from tool result. It distinguishes from the sibling ask_pipeworx by noting extra cost and different use case.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'whenever an answer will be quoted, cited, or acted on, and the agent must not invent facts' and when not: 'prefer ask_pipeworx for casual lookups.' Provides concrete examples like 'financial verdicts, legal claims, medical lookups, public statements.'

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bet_researchBet ResearchA
Read-onlyIdempotent
Inspect

Research a Polymarket bet by pulling the relevant Pipeworx data for it in one call. Pass a market slug ("will-bitcoin-hit-150k-by-june-30-2026"), a polymarket.com URL, or a question text. The tool resolves the market, classifies the bet, fans out to category-specific data packs in parallel, and returns an evidence packet + simple market-vs-model comparison. Use for "should I bet on X", "what does the data say about Y", or "is there edge in Z". CLASSIFIERS: crypto_price, fed_rate, geopolitical, sports, sports_championship, drug_approval, election_candidate, tech_launch, space_launch, corporate, corporate_earnings, corporate_event, public_figure_speech, weather, other. FAN-OUT EXAMPLES: BTC bet → coingecko + fred + gdelt+gnews; Fed bet → fred (DFEDTARU + EFFR + CPIAUCSL) + kalshi_macro (KXFED implied probs) + recent_fed_actions (federal-register rules, last 365d); Hormuz bet → imf_portwatch + airspace + gdelt; Yankees WS → mlb_stats_standings + parent_event partition + news; hottest-year bet → climate_projection_nyc + gistemp_latest (NASA global anomaly, rank since 1880) + news; NVDA-vs-AAPL → finnhub get_quote + edgar shares-outstanding (derived market cap) + edgar filings + news. RESPONSE SHAPES: result.market carries best_bid/best_ask/spread_pp/liquidity/price_change_1h/1d/1w; result.analysis carries model_probability/edge_pp/kelly_fraction_half when a closed-form model fires PLUS a 24h-move warning ("Market moved X.Xpp in 24h, comparable to model edge — your edge may already be priced in") when relevant; result.evidence is keyed by source. RESOLVER CONTRACT: result.market_match_confidence ∈ {high, medium, low, none}, market_match_score (0-1 token-overlap), market_match_alternatives[] (other candidate markets the resolver considered), and suggestions[] (explicit re-query hints when the match is fuzzy) — ALWAYS inspect these before trusting the analysis block, because medium/low matches can still surface other fields. PARENT_EVENT EXTRACTOR: when the bet is one leg of a partition (Yankees WS, Romania election), result.parent_event{matched_candidate, top_legs_by_price[], partition_size, placeholders_filtered} gives you the peer prices in one place — that's the headline for elections/championships. NEWS FIELDS: news entries carry _fallback_attempted / _fallback_failed_reason / retry_after_sec when GDELT 429s and GNews backfill ran or failed. SAFETY: low-confidence resolutions short-circuit with status:"low_confidence_match" and suppress analysis fields so agents can't accidentally size on phantom matches. Closed/dead markets that ARE still indexed by Polymarket (yes_price≈0, no volume, no liquidity) return status:"market_closed_or_inactive" and skip fan-out. In practice resolved markets are usually de-indexed and instead surface via the low_confidence_match path above — both routes are BLOCKING, just different mechanisms. Wide-spread markets (>10pp) carry tradeability:"illiquid_wide_spread" + an explanatory note. RESOLUTION-RULE RISK: market.cancellation_rule parses the void/postponement settlement out of the resolution text — refund_50_50 (shares settle flat 50¢ on void; EV-material for any entry away from 50¢, with ev_impact quantified), resolves_no_on_cancel, resolves_yes_on_cancel, carries_to_reschedule, or mentioned_unclear. null means the description never mentions cancellation. Check this before sizing sports/esports/event-occurrence bets — audited arb-bot ledgers show flat-50¢ void settlements are a recurring pure-rules loss.

ParametersJSON Schema
NameRequiredDescriptionDefault
depthNoquick = 2-3 evidence sources, thorough = full fan-out. Default thorough.
marketYesPolymarket slug ("will-bitcoin-hit-150k-by-june-30-2026"), full URL ("https://polymarket.com/event/..."), or question text ("Will Bitcoin hit $150k by June 30?")
include_rawNoDefault false. When false (recommended), FRED/FDA/GDELT/Federal-Register evidence is summarized to the few fields agents actually use — keeps responses under ~20KB. Pass true to get full upstream payloads (50KB-500KB) when you need to recompute deltas, cite specific observations, or post-process.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare read-only and idempotent behavior. The description adds extensive behavioral context: resolver contract (market_match_confidence, alternatives, suggestions), parent event extraction, news fallback mechanism (429s, backfill), handling of closed markets, wide spread tradeability, resolution-rule risk, and even arb-bot ledger anecdotes. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core purpose and usage, then proceeds with detailed examples and edge cases. It is somewhat verbose but well-structured with sections (classifiers, fan-out examples, response shapes). Every sentence adds value, though it could be trimmed for an agent without losing essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 3 parameters and no output schema, yet the description fully explains the response shape (result.market, result.analysis, result.evidence) and covers all critical edge cases: low-confidence matches, closed markets, wide spreads, cancellation rules, and fallback mechanisms. It is exceptionally complete for a complex tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description adds value by explaining parameter usage in context (e.g., market input types, depth default behavior, include_raw impact on response size). However, the schema already includes clear descriptions, so the increment is modest.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Research a Polymarket bet by pulling the relevant Pipeworx data for it in one call.' It specifies the resource (Polymarket bet) and the action (research with data pull), and distinguishes from siblings by focusing on Polymarket bet research with Pipeworx data. The use cases ('should I bet on X', etc.) further clarify its role.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage scenarios: 'Use for "should I bet on X", "what does the data say about Y", or "is there edge in Z".' It also explains when not to trust results via resolver contract and low-confidence matches. However, it does not compare directly to sibling tools like polymarket_edges or polymarket_arbitrage, missing an opportunity to guide tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

compare_entitiesCompare EntitiesA
Read-onlyIdempotent
Inspect

"Compare X and Y" / "X vs Y" / "X versus Y" / "which is bigger / better / larger / more profitable" / "rank these companies" / "head to head" — side-by-side comparison of 2–5 companies or drugs in ONE parallel call. ALWAYS PREFER over sequential single-pack lookups when comparing entities. type="company" pulls LATEST 10-K revenue + net income + cash + long-term debt from SEC EDGAR/XBRL (off-calendar fiscal years handled correctly — AAPL Sep, NVDA Jan, etc.). type="drug" pulls FAERS adverse-event counts, FDA approval counts, active trial counts. Results sorted by primary metric so "largest" / "most" / "biggest" reads off the top of the response. Returns paired data + pipeworx:// citation URIs per entity. Replaces 8–15 sequential lookups.

ParametersJSON Schema
NameRequiredDescriptionDefault
typeYesEntity type: "company" or "drug".
valuesYesFor company: 2–5 tickers/CIKs (e.g., ["AAPL","MSFT"]). For drug: 2–5 names (e.g., ["ozempic","mounjaro"]).
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint, and not destructive. The description adds valuable context: it pulls latest SEC 10-K data for companies handling off-calendar fiscal years, and FAERS data for drugs, includes sorting by primary metric, and returns citation URIs. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with example queries, then describes data sources and special handling, all in 3-4 sentences. It is efficient and structured, though slightly long for a simple tool. Still, every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description explains return includes paired data + citation URIs and sorting. It covers entity types and data sources comprehensively. The annotations handle safety/idempotency. Could explicitly list return fields, but overall complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions. The description adds meaningful context: explains 'company' pulls 10-K data and 'drug' pulls FAERS data, gives examples for values (tickers/CIKs, drug names), and specifies min/max items. This enhances understanding beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs side-by-side comparison of entities (companies or drugs) with example queries like 'X vs Y' and 'rank these companies'. It distinguishes from sibling tools by specifying it replaces multiple sequential lookups, and the name and title align.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises 'ALWAYS PREFER over sequential single-pack lookups when comparing entities', providing clear when-to-use guidance. It also differentiates from siblings like entity_profile by emphasizing parallel comparison.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

deep_researchDeep ResearchA
Read-onlyIdempotent
Inspect

Grounded multi-source research in ONE call. Decomposes your question into focused sub-questions, routes each to the right one of 3,772 tools across 894 authoritative sources IN PARALLEL, and extracts a grounded answer per facet — verbatim evidence, confidence, source, fetched_at, and a stable pipeworx:// citation on every finding, with explicit gaps[] for facets the data couldn't answer (never invented). Returns a structured findings packet you can synthesize for your user; the facts arrive pre-verified. Use for broad or multi-part questions ("compare X and Y's exposure to Z", "research the regulatory + financial + market picture for ACME"); use ask_pipeworx for single lookups — it's one LLM call instead of many. Requires a Pipeworx account (sign in via GitHub at https://pipeworx.io/signup); depth:"thorough" requires a paid plan. Expect 15-60s.

ParametersJSON Schema
NameRequiredDescriptionDefault
depthNoHow many facets to research in parallel: quick=3, standard=5 (default), thorough=8 (paid plans).
questionYesThe research question, in natural language. Broad/multi-part is fine — decomposition is the point.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses behavioral traits beyond annotations: decomposition into sub-questions, parallel routing, extraction of grounded answers with gaps array (never invented), pre-verified facts, and citation format. Annotations only note readOnly, openWorld, idempotent, non-destructive; description adds critical process and output details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Front-loaded with key capability and then structured into specifics. Slightly long but every sentence adds value. Could trim 'in ONE call' redundancy with title, but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (parallel multi-source research across 3,772 tools) and no output schema, the description fully explains return format (findings packet with evidence, confidence, source, fetched_at, citation, gaps). Covers prerequisites, constraints, and timing. Completely fills the informational gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and description adds meaning: explains depth enum values (quick=3, standard=5, thorough=8) and clarifies that question is natural language and decomposition is intended. Adds value beyond the minimal schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the tool's verb ('researches'), resource ('multi-source research in ONE call'), and mechanism (decomposes, routes to 3,772 tools across 894 sources in parallel). It distinguishes from sibling ask_pipeworx by contrasting single vs. multi-facet use cases.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('broad or multi-part questions') and when not to use ('single lookups' → use ask_pipeworx). Also covers prerequisites (Pipeworx account, paid plan for 'thorough'), expected duration (15-60s), and depth selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

discover_toolsDiscover ToolsA
Read-onlyIdempotent
Inspect

Find tools by describing the data or task. Use when you need to browse, search, look up, or discover what tools exist for: SEC filings, financials, revenue, profit, FDA drugs, adverse events, FRED economic data, Census demographics, BLS jobs/unemployment/inflation, ATTOM real estate, ClinicalTrials, USPTO patents, weather, news, crypto, stocks. Returns the top-N most relevant tools with names, descriptions, and full input schemas (with curated examples) — each result is ready to call directly, no second schema lookup needed. Call this FIRST when you have many tools available and want to see the option set (not just one answer).

ParametersJSON Schema
NameRequiredDescriptionDefault
qNoAlias for query.
taskNoAlias for query.
limitNoMaximum number of tools to return (default 20, max 50)
queryYesNatural language description of what you want to do (e.g., "analyze housing market trends", "look up FDA drug approvals", "find trade data between countries"). Accepts task, q, description, search as aliases.
searchNoAlias for query.
descriptionNoAlias for query.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and idempotentHint=true, so safe. Description adds that it returns top-N tools with full schemas and curated examples, and that results are ready to call directly. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Front-loaded with core purpose, but the list of domains is somewhat lengthy. Everything is relevant and no wasted sentences.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, but description explains what is returned (names, descriptions, schemas). Covers the use case well, including limit parameter. Lacks explicit return format details but sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents parameters. Description adds examples of queries and mentions alias parameters, but doesn't add significant new meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Find tools by describing the data or task' and enumerates specific domains. Distinguishes from siblings by emphasizing 'discover what tools exist' and 'call this FIRST, not just one answer'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use: 'when you need to browse, search, look up, or discover what tools exist'. Also says 'Call this FIRST when you have many tools available and want to see the option set'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

entity_profileEntity ProfileA
Read-onlyIdempotent
Inspect

"Tell me about X" / "research Acme" / "brief me on Tesla" / "what does Apple do" / "company profile for Microsoft" / "give me the rundown on NVDA" / "everything you know about $TICKER" — full cross-source profile of a US public company in ONE parallel call. ALWAYS PREFER over chaining single-pack SEC/XBRL/news lookups when the user asks for a holistic view. Fans out across SEC EDGAR, XBRL, USPTO, news, GLEIF and returns: cik + company_name; recent_filings (up to 5 with pipeworx://edgar/company/{cik}/filings/{accession} URIs); fundamentals (LATEST 10-K Revenues + NetIncomeLoss + Cash, sorted period_end DESC); patents (USPTO PatentsView API sunset May 2025 — soft-fails until reactivated); recent news mentions via GDELT→GNews fallback; LEI via GLEIF. Pass ticker "AAPL" or zero-padded CIK "0000320193" — names not supported (use resolve_entity first if you only have a name).

ParametersJSON Schema
NameRequiredDescriptionDefault
typeYesEntity type. Only "company" supported today; person/place coming soon.
valueYesTicker (e.g., "AAPL") or zero-padded CIK (e.g., "0000320193"). Names not supported — use resolve_entity first if you only have a name.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, idempotentHint), the description reveals internal fan-out across sources, includes a note about patents API sunset and soft-failure, and details what each source contributes. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is detailed and front-loaded with example queries, but it is somewhat verbose. Every sentence contributes value, making it well-structured, though a bit lengthy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite the absence of an output schema, the description comprehensively lists what is returned (CIK, filings, fundamentals, patents, news, LEI) and explains fallbacks and limitations. It is complete for a tool of this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds significant context: type is limited to 'company' for now, value must be ticker or CIK (zero-padded), and names are unsupported. This enriches the schema definitions but does not introduce new parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides a 'full cross-source profile of a US public company in ONE parallel call,' listing example queries and distinguishing itself from chaining individual lookups. It is a specific verb-resource combination that differentiates from siblings like 'resolve_entity' and 'deep_research'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises 'ALWAYS PREFER over chaining single-pack SEC/XBRL/news lookups when the user asks for a holistic view.' Also specifies input limitations: ticker or zero-padded CIK only, not names, and directs to use 'resolve_entity' first if only a name is available.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

forgetForgetA
DestructiveIdempotent
Inspect

Delete a previously stored memory by key. Use when context is stale, the task is done, or you want to clear sensitive data the agent saved earlier. Pair with remember and recall.

ParametersJSON Schema
NameRequiredDescriptionDefault
keyYesMemory key to delete
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description mentions 'clear sensitive data' beyond annotations' destructiveHint. No contradiction with annotations. Adds context about data sensitivity.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two efficient sentences, front-loaded purpose, then usage guidance. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Simple tool with one parameter and no output schema. Description covers purpose, usage, and sibling context, making it complete for its complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with description 'Memory key to delete'. Tool description adds no extra meaning beyond schema's parameter description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states 'Delete a previously stored memory by key' - specific verb and resource, and distinguishes from siblings 'remember' and 'recall'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description explicitly says when to use: 'when context is stale, the task is done, or you want to clear sensitive data'. It pairs with remember and recall, but doesn't explicitly state when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

generate_llms_txtGenerate llms.txtA
Read-onlyIdempotent
Inspect

Generate a production-ready llms.txt file for any URL so AI crawlers (ChatGPT, Claude, Perplexity) can index the site cleanly. Fetches the page, extracts title/description/key links, and emits the standard llms.txt markdown format. Output is a single text blob ready to drop at site-root/llms.txt. Useful for: getting a client's site indexed by AI, drafting llms.txt for your own project, or auditing how an AI crawler would see a competitor.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlYesFull URL of the site to summarize, e.g. "https://example.com" or a specific landing page.
max_linksNoMaximum number of link entries to include (default 25, max 50).
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare the tool as read-only, open-world, idempotent, and non-destructive. The description adds behavioral detail: fetches the page, extracts title/description/key links, and emits markdown format. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (two sentences plus bullet list), front-loaded with action, and every sentence serves a purpose. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple tool with 2 parameters, no output schema, and rich annotations, the description covers purpose, process, and use cases. Lacks error handling or output format details, but adequate overall.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds minimal value: an example for 'url' but nothing for 'max_links'. It does not elaborate on parameter constraints beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the verb 'Generate' and the resource 'llms.txt file', explaining its purpose for AI crawler indexing. It distinguishes from siblings like 'ai_visibility_check' and 'scan_competitor_ai_presence' by focusing on file generation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit use cases: getting a client's site indexed, drafting for own project, or auditing competitor. It clearly indicates when to use the tool, though it lacks explicit when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_subscriptionsList SubscriptionsA
Read-onlyIdempotent
Inspect

List the caller's active subscriptions. Returns id, type, params, created_at, last_fired_at, fire_count for each. Use this to review what you're monitoring before adding more or to find an id to cancel.

ParametersJSON Schema
NameRequiredDescriptionDefault
include_inactiveNoInclude cancelled subscriptions in the response (default false).
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, destructiveHint=false, openWorldHint=true, idempotentHint=true, so the description adds minimal behavioral context beyond listing return fields. This is adequate given the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states purpose and return fields, second gives usage guidance. No wasted words, well front-loaded with essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With one optional boolean parameter, full schema coverage, and annotations, the description is largely complete. It lists return fields and usage context. Could mention ordering or pagination, but not essential for this simple tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the parameter 'include_inactive' is fully described in the schema. The description does not add extra meaning beyond what the schema provides, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'List the caller's active subscriptions' and lists the returned fields, distinguishing it from sibling tools like subscribe and unsubscribe. It also provides a specific use case: reviewing monitoring before adding more or finding an ID to cancel.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly tells when to use the tool: 'Use this to review what you're monitoring before adding more or to find an id to cancel.' It implies not to use for subscribing or unsubscribing, but lacks an explicit 'do not use when X' statement.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pipeworx_feedbackSend Pipeworx FeedbackAInspect

Tell the Pipeworx team something is broken, missing, or needs to exist. Use when a tool returns wrong/stale data (bug), when a tool you wish existed isn't in the catalog (feature/data_gap), or when something worked surprisingly well (praise). Describe the issue in terms of Pipeworx tools/packs — don't paste the end-user's prompt. The team reads digests daily and signal directly affects roadmap. Rate-limited to 5 per identifier per day. Free; doesn't count against your tool-call quota.

ParametersJSON Schema
NameRequiredDescriptionDefault
typeYesbug = something broke or returned wrong data. feature = a new tool or capability you wish existed. data_gap = data Pipeworx does not currently expose. praise = positive note. other = anything else.
contextNoOptional structured context: which tool, pack, or vertical this relates to.
messageYesYour feedback in plain text. Be specific (which tool, what error, what data was missing). 1-2 sentences typical, 2000 chars max.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond annotations: rate-limited to 5 per identifier per day, doesn't count against quota, and signals directly affect roadmap. Annotations only indicate readOnlyHint=false and destructiveHint=false, so the description significantly enhances the understanding of the tool's behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the action and purpose, then provides usage guidelines and notes. It uses concise sentences with no redundancy. Every sentence serves a purpose, making it both informative and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity and lack of output schema, the description completes the picture: it explains what happens after submission (team reads digests, signals affect roadmap). No missing information for an agent to understand the tool's value and behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the description still adds value by explaining enum values (bug, feature, data_gap, praise, other) in plain language, specifying message length limit (2000 chars), and describing optional context fields (pack, tool, vertical). This goes beyond the schema's brief descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to send feedback to the Pipeworx team about bugs, missing features, data gaps, or praise. It uses a specific verb ('Tell') and resource ('Pipeworx team'), and the context distinguishes it from sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists when to use the tool (bug, feature/data_gap, praise) and provides usage notes like rate limit (5 per day) and that it's free. It gives clear criteria for each feedback type, making it easy for an agent to decide when to invoke this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

polymarket_arbitragePolymarket ArbitrageA
Read-onlyIdempotent
Inspect

REQUIRES one of event (single-event mode) OR topic (cross-event mode) — call with no args fails. Find arbitrage opportunities on Polymarket via monotonicity violations + partition-sum checks. event (recommended for a specific market): pass a Polymarket event slug like "fed-decision-may-2026" or "when-will-bitcoin-hit-150k"; walks child markets, checks date-axis / threshold-axis ordering AND computes the partition_check (sum of YES prices across mutually-exclusive legs — should ≈1; deviations >3pp emit a BUY/SELL EVERY LEG signal). topic (for cross-event scanning): pass a seed question like "Strait of Hormuz traffic returns to normal" or "Fed rate decision"; searches related events across the platform, flattens markets, runs the comparator on the union. Cross-event mode catches "...by May 31" vs "...by Jun 30" patterns that single-event misses. SEMANTIC ANCHOR: cross-event pairs require ≥0.30 Jaccard similarity on question tokens (prevents Powell-Fed-Pause being paired with Powell-DOJ-probe); skipped_low_similarity surfaces the rejected pair count. PARTITION FILTER: drops will-person-X / will-manager-Y / will-someone-else- placeholder slugs; partitions with >20% placeholder fraction return null arb signal. Response: opportunities[] (gap_pp, suggested_trade, reasoning, monotonicity violation context), and in event mode partition_check{sum_yes_prices, gap_from_1, placeholders_filtered, suggested_trade}. FILL CHECK: when the partition signal fires, arbitrage.fill_check prices it against live CLOB depth (theoretical_edge_pp_at_book vs realizable_edge_pp at 1000 shares/leg, thin_legs[]) — realizable_edge_pp ≤ 0 means the overround exists only at last-trade, not in the book; do not trade it. For custom sizing use polymarket_fill_risk.

ParametersJSON Schema
NameRequiredDescriptionDefault
eventNoSingle-event mode (use this if you know the specific Polymarket event): event slug like "fed-decision-may-2026" or "when-will-bitcoin-hit-150k". Full Polymarket URLs also accepted.
topicNoCross-event mode (use this if you want to scan related events across the platform): a topic or seed question like "Fed rate decision" or "Strait of Hormuz traffic returns to normal". Tool searches Polymarket for related events and checks monotonicity across them.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read-only, open-world, idempotent, non-destructive. Description adds substantial behavioral context including algorithmic details (monotonicity, partition check), fill check mechanics, and similarity thresholds—all consistent with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is lengthy but well-structured with clear sections (SEMANTIC ANCHOR, PARTITION FILTER, FILL CHECK). Every sentence contributes value; slight verbosity is justified by tool complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity, absence of output schema, and rich annotations, the description is remarkably complete. It covers input requirements, algorithmic behavior, edge cases, and links to related tools. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema already describes both parameters. Description adds significant extra meaning: explains the difference between event and topic modes, provides example inputs, and describes the logic applied to each. Valuable beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool finds arbitrage opportunities on Polymarket via monotonicity violations and partition-sum checks. It distinguishes two modes (event and topic) and includes specific examples of slugs and topics, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states that exactly one of 'event' or 'topic' must be provided, with clear guidance on when to use each mode. Also advises on fill check and custom sizing via sibling tool, leaving no ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

polymarket_edgesPolymarket EdgesA
Read-onlyIdempotent
Inspect

Scan top Polymarket markets and return opportunities where Pipeworx data disagrees with market price. Built for "what should I bet on today" — agents discover opportunities without paging hundreds of markets. FIVE MODEL FAMILIES grouped into three response segments under by_segment: (1) MODEL_DRIVEN — crypto_price (lognormal barrier from 90d FRED log-returns) and news_momentum (GDELT 7d/21d article-volume ratio, soft signal w/ halved Kelly). (2) STRUCTURAL_ARBITRAGE — partition_overround on mutually-exclusive events; per-leg favorite-longshot bias correction with per-sport α (tennis 1.02, soccer 1.10, MMA 1.15, default 1.0); placeholder-slug filter drops will-person-X / will-team-Y / will-manager-Z / will-someone-else- backstops; partitions with >20% placeholder fraction skipped entirely. (3) CONCENTRATED_LONGSHOT — basket trade when one leg ≥75% AND ≥2 longshots ≤8% AND portfolio return ≥25:1; rare-by-design (gates relaxed Run 8 from prior 85%/5%/50:1). EVERY OPPORTUNITY carries edge_pp_net (after slippage), kelly_fraction + kelly_fraction_half (capped at 0.25), market.liquidity, market.spread_pp, market.volume, plus a 24h-move warning ("Market moved X.Xpp in 24h") when the recent move alone exceeds the edge — your edge may already be in the price. TRADEABLE-EDGE KNOBS: min_liquidity / max_spread_pp drop opportunities where edge isn't realizable; min_partition_leg_kelly filters partitions by best per-leg Kelly. RESPONSE TOP-LEVEL: by_segment{model_driven,structural_arbitrage,concentrated_longshot}, fed_candidates/fed_note (Fed bets surface here, excluded from ranking — 1m-T vs EFFR signal is unreliable at meeting-month horizons without paid OIS/SOFR-futures data), and _diagnostics{concentrated_longshot:{...funnel counters},category_counts,filter_skips} so callers can see WHY a segment is empty (top-N stale, all candidates failed gates, knob dropped them). Cached 1h at the KV level keyed on all knobs.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoTop N edges to return after ranking. Default 10, max 25.
windowNoPolymarket volume window to filter markets. Default 1wk.
min_kellyNoMinimum half-Kelly fraction (as decimal, e.g. 0.005 = 0.5% of bankroll) to include single-leg opportunities. Default 0 (no filter). Skips opportunities that are too small to bet sensibly even if the edge is large.
min_edge_ppNoMinimum |edge| in percentage points to include (default 0.5). Edge is evaluated NET of slippage.
slippage_ppNoAssumed execution slippage in percentage points per leg (default 0.3). Subtracted from raw |edge| before ranking and Kelly sizing. Polymarket has zero trading fees as of 2024 but bid/ask + thin depth typically eats 20-50bp per trade. Bump for very thin partitions; drop to 0 if you have a smarter fill model.
max_spread_ppNoTradeable-edge filter. Maximum bid/ask spread in percentage points on the representative market. Default null (no filter). Set to 2 to require tight books — anything wider eats most plausible edges.
min_liquidityNoTradeable-edge filter. Minimum $ liquidity on the representative market (or for partition_overround, on at least one top_leg). Default 0 (no filter). Set to 5000 to drop thin-book opportunities where executing the edge would walk the book past breakeven.
category_filterNoComma-separated list to restrict the output: "model_driven" (crypto_price + news_momentum), "structural_arbitrage" (partition_overround), "concentrated_longshot". Combine like "model_driven,structural_arbitrage". Default: all.
min_partition_leg_kellyNoMinimum BEST per-leg half-Kelly fraction across a partition_overround opportunity's top_legs (or longshot_basket legs). Default 0 (no filter). Partition arbs always return kelly_fraction_half=0 at the parent level by design (basket trades don't compose to single-leg Kelly), so min_kelly never filters them — this knob applies to the per-leg Kelly inside top_legs instead. Use to suppress thin partitions whose individual leg edges aren't worth the per-leg slippage cost.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly, openWorld, idempotent, non-destructive. Description adds immense detail: model families, Kelly sizing, slippage, caching, diagnostics, and tradeable-edge filters, far exceeding annotation info.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with bold terms and parentheses, but very long and dense. Every sentence adds value, but the length may overwhelm some agents. Could be tighter.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Explains response structure (by_segment, diagnostics) and edge cases (Fed bets, empty segments) even without an output schema. Covers caching and diagnostics so callers understand why results might be empty.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% so baseline is 3, but description adds significant context for parameters like min_kelly (half-Kelly fraction) and slippage_pp (Polymarket fees), making them more actionable.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it scans Polymarket markets and returns opportunities where Pipeworx data disagrees. It specifies three model families and is built for 'what should I bet on today', distinguishing it from siblings like polymarket_arbitrage or polymarket_edge_tracker.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Clear context on when to use (daily betting decisions) and what knobs to adjust. No explicit exclusions or comparisons with alternatives, but the description provides enough detail for an agent to choose appropriately.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

polymarket_edge_trackerPolymarket Edge TrackerA
Read-onlyIdempotent
Inspect

Edge persistence and decay telemetry built from daily polymarket_edges snapshots. Answers "how long has this edge existed and is it shrinking?" — a fresh wide edge and a 3-week-old wide edge are different trades (the latter is wide for a reason nobody is willing to take). Args: days (lookback, default 14, max 30), window (snapshot family, default "1wk"). RESPONSE: tracked[] = every opportunity in the LATEST snapshot with its full edge_pp_net time-series across prior snapshots, first_seen, trend (new | widening | stable | decaying) and decay_pp_per_day (both computed on |edge_pp_net| — the value itself is signed by trade direction, negative = SELL YES); expired[] = opportunities that appeared in earlier snapshots but are GONE from the latest (closed, resolved, or arbed away) with their lifespan_days — the median lifespan is your competition clock; snapshot_dates[] = which days actually have data (snapshots are written when polymarket_edges runs on a cache-miss, so gaps mean nobody scanned that day). LIMITS: history depth is bounded by the 60-day snapshot TTL and starts from when snapshotting was enabled; decay numbers come from daily closes of edge_pp_net (net of default slippage), not intraday.

ParametersJSON Schema
NameRequiredDescriptionDefault
daysNoLookback in days (default 14, clamp 2-30).
windowNoWhich polymarket_edges window family to read snapshots for: 24hr | 1wk | 1mo (default 1wk).
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnly, openWorld, idempotent, non-destructive. The description adds significant behavioral context: snapshot write-on-cache-miss, gaps meaning no scan, 60-day TTL, decay from daily closes. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is detailed and well-structured, starting with the core purpose and then elaborating on response fields and limits. While every sentence adds value, it is slightly verbose, earning a 4 rather than a 5.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Without an output schema, the description comprehensively explains the response structure (tracked, expired, snapshot_dates) and limitations (TTL, start date). It covers all necessary context for an agent to understand what the tool returns.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already fully describes both parameters (days, window) with defaults and constraints. The description merely restates them without adding extra meaning, so it meets the baseline for high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: providing edge persistence and decay telemetry from daily snapshots. It distinguishes between fresh and old wide edges, and the detailed response structure sets it apart from sibling tools like polymarket_edges.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use the tool (to understand edge persistence and decay), but does not explicitly mention when not to use it or name specific alternatives, though the sibling list provides context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

polymarket_fill_riskPolymarket Fill RiskA
Read-onlyIdempotent
Inspect

Realizable-vs-theoretical edge check against live CLOB order-book depth. REQUIRES one of market (single-market mode) or event (basket/partition mode). SINGLE-MARKET: pass a market slug/URL + side (buy_yes|sell_yes|buy_no|sell_no, default buy_yes) + size_usd (default 1000 — max spend on buys, target proceeds on sells); walks the ladder and returns top_of_book, vwap_fill_price, slippage_pp, shares_filled, max_fillable_usd, and a verdict (clean|degraded|cannot_fill). BASKET: pass an event slug/URL + side (sell_yes = capture overround by selling every leg, buy_yes = capture underround; default auto from partition sum) + size_usd interpreted as settlement notional S (shares per leg; each share pays $1); returns theoretical_sum vs realizable_sum (top-of-book vs VWAP across all legs), capture_ratio, profit_usd at executed size, per-leg fill detail, thin_legs[], max_clean_notional_usd, and forced_directional_risk naming the legs most likely to strand you unhedged. USE THIS before acting on any polymarket_arbitrage SELL/BUY-EVERY-LEG signal or any polymarket_edges trade above ~$500 — theoretical overround on thin books is not capturable, and partial basket fills convert an arb into an unhedged directional position (the dominant loss mode in real arb-bot P&L).

ParametersJSON Schema
NameRequiredDescriptionDefault
sideNoSingle-market: buy_yes | sell_yes | buy_no | sell_no (default buy_yes). Basket: sell_yes | buy_yes (default auto — sell if partition sum > 1, buy if < 1).
eventNoBasket mode: event slug or full polymarket.com URL — checks every leg of the partition.
marketNoSingle-market mode: market slug or full polymarket.com URL.
size_usdNoSingle-market: USD to spend (buys) or target proceeds (sells). Basket: settlement notional — shares per leg, each paying $1 at resolution. Default 1000, clamp 10–1,000,000.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint=true, idempotentHint=true) indicate a read-only check. Description adds behavioral context: it walks the order book ladder, returns fill details and a verdict, and warns about partial basket fills converting arb into unhedged positions. No contradiction with annotations. However, it could explicitly state it does not execute trades.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is front-loaded with purpose, then details requirements and modes, ending with use-case guidance. Every sentence adds necessary information. Slightly long (~200 words) but justified by tool complexity. Could be more concise without losing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite lacking an output schema, the description thoroughly explains return fields (top_of_book, vwap_fill_price, slippage_pp, verdict, etc.) and basket-specific returns (theoretical_sum, capture_ratio, profit_usd, thin_legs, max_clean_notional_usd, forced_directional_risk). It addresses the key risk of partial fills, making it complete for a complex tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with detailed parameter descriptions. The tool description adds value by clarifying parameter interdependencies (e.g., REQUIRES one of market or event), different interpretations of size_usd between modes, and default side logic for baskets. This goes beyond the schema's baseline information.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs a 'Realizable-vs-theoretical edge check against live CLOB order-book depth'. It defines two modes (single-market and basket) and explicitly distinguishes from siblings like polymarket_arbitrage and polymarket_edges by specifying when to use this tool before acting on their signals.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit usage guidance: 'USE THIS before acting on any polymarket_arbitrage SELL/BUY-EVERY-LEG signal or any polymarket_edges trade above ~$500'. It explains why (theoretical overround not capturable, partial fills lead to directional risk), giving clear when-to-use and when-not-to-use context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

polymarket_kalshi_spreadPolymarket–Kalshi SpreadA
Read-onlyIdempotent
Inspect

Cross-venue spread between Kalshi and Polymarket for the same resolving question. The two venues sometimes price the same outcome 2-25pp apart because their participant pools differ — when the bet shapes are equivalent that delta is a real signal, when they aren't the tool says so. TWO MODES: (1) topic — 10 pre-mapped macro shortcuts ("fed", "btc", "cpi", "gdp", "sp500", "recession", "next_pope", "next_uk_pm", "next_israel_pm", "2028_president") auto-fetch the matching event on each venue. (2) explicit kalshi_event_ticker + polymarket_event_slug for custom pairings. RESPONSE: each venue's leg-by-leg prices (raw probability 0-1) plus matched spread[].top_spreads_pp (Kalshi − Polymarket) where the same outcome shows up on both sides. SAFETY FIELDS: compatibility_warning fires in two cases — (a) matched_pairs:0 with skipped_cross_type>0 means the venues frame the topic with non-equivalent bet shapes (e.g. Kalshi range_bucket point-in-time vs Polymarket cumulative_threshold touch-anywhere — no arb exists), (b) matched_pairs:0 with skipped_cross_type:0 and both venues >5 legs means the token-overlap matcher found nothing in common — events likely semantically unrelated despite the topic keyword. temporal_alignment{polymarket_month,kalshi_month,aligned} tells you whether the two events resolve in the same calendar period; aligned:false means spreads are mathematically meaningless across the temporal gap. skipped_cross_type / skipped_cross_subtype counters expose how many leg-pair comparisons were dropped (cross-type = metric_type mismatch like MoM vs YoY; cross-subtype = inequality mismatch like cum_ge vs cum_le). Real cross-venue spreads are rarer than the macro-shortcut list suggests — most pre-mapped topics return compatibility_warning today; pre-mapped ≠ tradeable.

ParametersJSON Schema
NameRequiredDescriptionDefault
topicNoPre-mapped: fed | btc | cpi | gdp | sp500 | recession | next_pope | next_uk_pm | next_israel_pm | 2028_president
kalshi_event_tickerNoExplicit Kalshi event ticker, e.g. "KXFED-26OCT". Overrides the topic-mapped Kalshi side.
polymarket_event_slugNoExplicit Polymarket event slug, e.g. "fed-decision-in-june-825". Overrides the topic-mapped Polymarket side.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint, openWorldHint, idempotentHint, destructiveHint false. The description adds extensive behavioral context: two modes, compatibility_warning conditions, temporal_alignment field, skipped_cross_type/counters, and note that pre-mapped topics often return warnings. This goes far beyond what annotations provide, fully disclosing behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively long but well-organized with sections for modes, response, and safety fields. It is front-loaded with the core purpose. While every sentence adds value, some redundancy could be trimmed, but overall structure is effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (cross-venue comparison with edge cases) and no output schema, the description thoroughly explains the return fields: leg-by-leg prices, matched spread, compatibility_warning, temporal_alignment, and skipped counters. It addresses all necessary behavioral aspects for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all three parameters. The description adds meaning by explaining the topic list, the relationship between topic and explicit parameters, and provides examples. This enriches the schema beyond mere field names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it computes a cross-venue spread between Kalshi and Polymarket for the same resolving question. It distinguishes two modes (topic shortcuts and explicit pairing) and explains the output. This is a specific verb+resource description that differentiates from sibling tools like polymarket_arbitrage.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit when-to-use context: 'when the bet shapes are equivalent that delta is a real signal, when they aren't the tool says so.' It warns that most pre-mapped topics return compatibility warnings and explains temporal alignment. While it does not explicitly name sibling alternatives, the detailed context enables an agent to decide appropriateness.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recallRecallA
Read-onlyIdempotent
Inspect

Retrieve a value previously saved via remember, or list all saved keys (omit the key argument). Use to look up context the agent stored earlier — the user's target ticker, an address, prior research notes — without re-deriving it from scratch. Scoped to your identifier (anonymous IP, BYO key hash, or account ID). Pair with remember to save, forget to delete.

ParametersJSON Schema
NameRequiredDescriptionDefault
keyNoMemory key to retrieve (omit to list all keys)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already ensure read safety; description adds valuable context about scoping and pairing with remember/forget, no contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose, no redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all necessary aspects for a simple tool: retrieval, listing, scoping, pairing, and usage scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Description essentially restates the parameter schema, which already has 100% coverage, so minimal added value beyond baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the verb (retrieve/list), resource (saved values/keys), and distinguishes from siblings (remember, forget) with concrete usage examples.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use context (lookup saved data) and scoping (per identifier), but does not explicitly state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recent_alertsRecent AlertsA
Read-onlyIdempotent
Inspect

Pull fired events from your subscription feed. Returns the most recent alerts the evaluator has written to your persisted feed — each carries source, citation_uri (pipeworx:// when available), and the raw event payload. Filter by type (e.g. "sec_8k") and/or since (ISO timestamp). Set mark_read:true to flag returned events read so the next call only shows newer ones. Polls work fine; the same feed is also at GET registry.pipeworx.io/alerts.json for scripts and dashboards.

ParametersJSON Schema
NameRequiredDescriptionDefault
typeNoOptional — filter to one subscription type.
limitNoMax events to return (1-200, default 50).
sinceNoOptional ISO timestamp — return events fired_at >= this time.
mark_readNoFlag the returned events read in the same call (default false).
unread_onlyNoReturn only events where read_at is null (default false).
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, etc. The description adds meaningful behavioral context like mark_read side effects and alternative feed endpoint, without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single paragraph is concise and front-loaded with key action. Every sentence adds value, though a more structured format (e.g., bullet points) could improve readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers return fields and key behaviors (mark_read, filtering). Lacks pagination details beyond limit, but sufficient for a list-retrieval tool without output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description adds real value by explaining parameters in context (e.g., filter by type, mark_read for read tracking). Complements schema effectively.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool pulls fired events from the subscription feed and returns recent alerts with specific fields. It distinguishes itself from sibling tools, none of which target alert retrieval.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear usage context: mentions polling is fine, offers alternative access via URL, and explains filtering options. However, it does not explicitly compare to siblings or state when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recent_changesRecent ChangesA
Read-onlyIdempotent
Inspect

"What's new with X" / "latest on Y" / "what happened to Z this week / month / quarter" / "updates on Acme" / "news on Tesla recently" / "what's happening with Apple" — change feed for a company in the last N days/weeks/months in ONE parallel call. Fans out to SEC EDGAR (filings since since), GDELT→GNews fallback (news mentions in window — GDELT preferred, GNews when rate-limited or 5xx), USPTO (patents granted; PatentsView API sunset May 2025 so this soft-fails until reactivated). since accepts ISO date ("2026-04-01") or relative shorthand ("7d", "30d", "3m", "1y"). Returns structured changes[] grouped by source + total_changes count + pipeworx:// citation URIs. Use entity_profile instead when you want the static profile (filings + fundamentals + LEI + patents) regardless of window.

ParametersJSON Schema
NameRequiredDescriptionDefault
typeYesEntity type. Only "company" supported today.
sinceYesWindow start — ISO date ("2026-04-01") or relative ("7d", "30d", "3m", "1y"). Use "30d" or "1m" for typical monitoring.
valueYesTicker (e.g., "AAPL") or zero-padded CIK (e.g., "0000320193").
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, openWorldHint, and non-destructive behavior. The description adds substantial behavioral context: it details the multiple data sources (SEC EDGAR, GDELT→GNews fallback, USPTO with soft-failure), explains the fallback logic, and mentions the return format (changes[] grouped by source, total_changes, pipeworx:// URIs). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose but well-structured: it starts with example queries (front-loaded), then explains sources, parameter details, and alternatives. Every sentence adds value, though a slight trim could improve conciseness without losing information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (multiple sources, fallback, soft-failure) and absence of an output schema, the description adequately explains what the tool returns (grouped changes, total count, citation URIs). It could be more complete by specifying the shape of individual change objects or mentioning limits, but it provides sufficient context for an agent to decide when to use the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is high. The description adds significant meaning beyond the schema: it explains the 'since' parameter with examples (ISO date or relative shorthand like '7d', '30d', '3m', '1y') and provides typical usage advice. It clarifies that 'value' can be a ticker or CIK. This guidance is valuable for correct invocation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides a change feed for a company, with example queries like 'What's new with X' and 'latest on Y'. It distinguishes itself from the sibling tool entity_profile, which is for static profiles. The purpose is specific, concrete, and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit example queries, explains when to use the tool (for change feed queries), and explicitly tells when to use the alternative sibling tool entity_profile ('Use entity_profile instead when you want the static profile...'). It also gives recommended values for the 'since' parameter ('Use "30d" or "1m" for typical monitoring').

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rememberRememberA
Idempotent
Inspect

Save data the agent will need to reuse later — across this conversation or across sessions. Use when you discover something worth carrying forward (a resolved ticker, a target address, a user preference, a research subject) so you don't have to look it up again. Stored as a key-value pair scoped by your identifier. Authenticated users get persistent memory; anonymous sessions retain memory for 24 hours. Pair with recall to retrieve later, forget to delete.

ParametersJSON Schema
NameRequiredDescriptionDefault
keyYesMemory key (e.g., "subject_property", "target_ticker", "user_preference")
valueYesValue to store (any text — findings, addresses, preferences, notes)
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=false, idempotentHint=true, destructiveHint=false. Description adds context about scoping by identifier, session persistence limits, and pairing with recall/forget, which are not in annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences: purpose, usage guidance, and behavioral detail. Each sentence adds value with no redundancy. Front-loaded with key message.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple key-value store with two parameters and no output schema, the description fully explains what, when, and how the tool behaves across sessions and authentication states.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters with descriptions (100% coverage). Description adds example key patterns and clarifies value type, but does not introduce new semantics beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Save data the agent will need to reuse later' with specific verb (save) and resource (data). Distinguishes from sibling tools like recall and forget by naming them and explaining the pairing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use when you discover something worth carrying forward' and provides concrete examples (resolved ticker, target address, user preference). Also notes the difference between authenticated (persistent) and anonymous (24-hour) sessions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

resolve_entityResolve EntityA
Read-onlyIdempotent
Inspect

"What's the ticker for…" / "find the CIK for…" / "what's the RxCUI for…" / "look up the ID for…" / "what is X's official identifier" — resolve a user-spoken NAME to the canonical/official identifier other tools require as input. Use FIRST whenever you have a name but need an ID. SUPPORTED TYPES: "company" (returns ticker + 10-digit CIK + company_name from SEC EDGAR + pipeworx://edgar/company/{cik} citation URI; accepts ticker, CIK, or company name as input — auto-disambiguated), "drug" (returns RxCUI + ingredient + brand from RxNorm + pipeworx://rxnorm/{rxcui} citation; accepts brand or generic name). Each call cascades through several lookup endpoints internally — using resolve_entity replaces 2-3 manual lookups.

ParametersJSON Schema
NameRequiredDescriptionDefault
typeYesEntity type: "company" or "drug".
valueYesFor company: ticker (AAPL), CIK (0000320193), or name. For drug: brand or generic name (e.g., "ozempic", "metformin").
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare read-only, idempotent, open-world. Description adds that it cascades through several endpoints, replaces 2-3 manual lookups, and auto-disambiguates. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is slightly long but front-loaded with examples and well-organized. Every sentence adds value; could be slightly more concise but not wasteful.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description fully explains what each type returns (ticker+CIK+citation for company; RxCUI+ingredient+brand+citation for drug). Covers internal cascading and disambiguation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description goes beyond by providing concrete examples (e.g., ticker, CIK, name for company; brand/generic for drug) and explaining return fields per type.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool resolves a user-spoken name to a canonical identifier, with specific examples and supported types (company, drug). It distinguishes itself from siblings by positioning itself as the first step when an ID is needed.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use FIRST whenever you have a name but need an ID.' Provides detailed supported types and what they return. Lacks explicit 'when not to use' but is generally clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

scan_competitor_ai_presenceScan Competitor AI PresenceA
Read-onlyIdempotent
Inspect

Compare AI visibility across multiple entities side-by-side. Probes each entity (your brand + N competitors) with ai_visibility_check, ranks by score, surfaces which is most/least recognized. Useful for competitive AI-marketing audits: "does Claude know about us as well as our competitors?". Returns ranked list with score, confidence, signal density per entity.

ParametersJSON Schema
NameRequiredDescriptionDefault
modelsNoWhich models to probe. Supported: "workers-ai" (free default), "anthropic" (requires _apiKey). Omit for just workers-ai.
_apiKeyNoOptional Anthropic API key — only if "anthropic" is in models. Passed to api.anthropic.com per probe.
contextNoOptional shared context applied to every probe (e.g. "B2B SaaS", "Boston restaurant"). Disambiguates common names.
entitiesYesArray of 2-8 entities to compare (brand/business/product names). First entry treated as the "subject" for narrative; rest are competitors.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, open-world. Description adds that it ranks entities by score, surfaces most/least recognized, and returns score/confidence/signal density. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose. Every sentence adds value: purpose, mechanism, example use case. No redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 params, orchestration of another tool), the description covers purpose, usage expectations, and output format. No output schema needed as description lists return fields.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema descriptions cover 100% of parameters. Description adds semantics: first entity is treated as 'subject' for narrative. This clarifies parameter usage beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool compares AI visibility across multiple entities side-by-side, with specific verb-verb 'Compare AI visibility' and resource 'multiple entities'. It distinguishes from sibling 'ai_visibility_check' by describing orchestration of multiple probes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear usage context ('competitive AI-marketing audits') with an example question. Implicitly differentiates from sibling 'ai_visibility_check' by stating it probes each entity with that tool. Lacks explicit exclusions or alternative tools, but the context is strong.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

scan_dependencyScan DependencyA
Read-onlyIdempotent
Inspect

Composite "should I add this npm package to my project" check in ONE call — fans out across deps.dev (license + advisories + version history) and bundlephobia (gzipped/minified bundle size, dependency count, ESM/tree-shake support). Use whenever an agent asks "is X safe / popular / small" or "what does adding lodash cost me". Returns a summary block (is_latest, license, published_at, advisory_count, bundle_kb_min, bundle_kb_gz, dependency_count, has_esm, tree_shakeable), per-advisory detail, links, and a list of recent alternative versions. NPM ecosystem only in v1; PyPI / Maven / Cargo / Go fall under deps.dev:version directly. Partial failures degrade gracefully — bundlephobia's first measurement on a new version can take 5-30s; sources_failed will list it if it times out, the rest still returns.

ParametersJSON Schema
NameRequiredDescriptionDefault
packageYesnpm package name. Scoped packages (e.g. "@types/node") are accepted.
versionNoSpecific version to check (e.g., "18.3.1"). Defaults to the latest published version when omitted.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and non-destructive behavior. The description adds critical behavioral details: it is a composite call that fans out to two services, may have partial failures (bundlephobia can take 5-30s), and gracefully degrades by listing sources_failed. This surpasses annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph but efficiently packs information without redundancy. It uses front-loaded structure (composite, use case) and every sentence contributes. A slightly more structured format (e.g., bullet points) could improve readability, but current version is adequate.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description fully explains the return format (summary block with fields, advisory detail, links, alternative versions) and edge cases (NPM only, bundlephobia timing). It provides enough detail for an agent to understand the tool's complete behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by clarifying that scoped packages are accepted (e.g., '@types/node') and that the version parameter defaults to latest when omitted. This provides practical context beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool performs a composite check for evaluating npm packages, using specific verbs ('scan') and resource ('dependency'). It distinguishes itself from sibling tools by focusing on npm ecosystem and combining deps.dev and bundlephobia data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly instructs when to use the tool ('when an agent asks is X safe / popular / small') and provides limitations (NPM only in v1) with alternative suggestions for other ecosystems (PyPI, Maven, etc.). This helps the agent decide correctly.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

search_withinSearch Within a SourceA
Read-onlyIdempotent
Inspect

Semantic search INSIDE a fetched record. Pass the text you already pulled (e.g. a SEC 10-K body, an article, a long tool result) plus a natural-language query; get back the top-N passages with character offsets and similarity scores. Use when the record is too big to cram into the prompt — search_within saves context, returns only the passages that matter, and every passage carries an offset so the agent can verify a verbatim quote. Pairs with ask_pipeworx_grounded: fetch with the gateway, ground over the relevant passages instead of the whole document. BGE-base-en embeddings + cosine over 500-char overlapping windows; cap is 200K chars (longer inputs are truncated and flagged).

ParametersJSON Schema
NameRequiredDescriptionDefault
textYesThe document text to search inside (max ~200K chars).
limitNoMax passages to return (1-20, default 5).
queryYesNatural-language query — what passages do you want? E.g. "supply-chain risk", "fiscal year 2024 revenue", "drug interactions with warfarin".
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral details beyond annotations: it specifies the embedding model (BGE-base-en), similarity method (cosine), windowing (500-char overlapping), and character cap (200K chars with truncation flag). Annotations already indicate read-only, idempotent, and open-world hints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured: opening sentence states core function, followed by usage guidance, companion tool mention, and technical details. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and lack of output schema, the description fully covers input behavior, internal mechanics, limits, and integration context. It implicitly describes return values (passages with offsets and scores) without needing an output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the description still adds value by giving examples for the query parameter, clarifying the text parameter's role, and explaining the limit parameter's range. This exceeds what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs semantic search inside a fetched record. It distinguishes itself from siblings by explicitly naming ask_pipeworx_grounded as a companion and focusing on searching within already-fetched text.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance is provided: 'Use when the record is too big to cram into the prompt.' It explains benefits (saves context, returns relevant passages with offsets) and mentions pairing with another tool for grounding.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

subscribeSubscribe to AlertsA
Idempotent
Inspect

Create a proactive monitoring subscription to a live-data event stream. Returns the new subscription id. Requires a Pipeworx OAuth account (anonymous + BYO cannot persist subscriptions). Supported types: "sec_8k" (8-K filings matching ticker + item codes — e.g. items:["5.02"] = officer change), "polymarket_edge" (Polymarket↔Kalshi cross-venue mispricings — params:{topic:"fed"}), "fred_series" (new FRED observations — params:{series_id:"UNRATE"}). Delivery channels: feed (always on — pull via recent_alerts or GET registry.pipeworx.io/alerts.json), and optionally email (set delivery:{email:"you@x.com"}) or sms (delivery:{sms:"+15551234567"} — phone must be verified at /account first; 10/day cap).

ParametersJSON Schema
NameRequiredDescriptionDefault
typeYesSubscription type.
paramsYesType-specific filter. sec_8k: {ticker:"AAPL", items?:["5.02","1.01"]}. polymarket_edge: {topic:"fed", min_spread_bps?:500}. fred_series: {series_id:"UNRATE"}. patent_grant: {applicant:"Apple Inc."}. clinical_trial: {sponsor?:"Pfizer", condition?:"lung cancer", phase?:"PHASE3"} (sponsor or condition required).
deliveryNoOptional delivery channels in addition to the always-on persistent feed. {email:"you@x.com"} sends a templated alert per fired event. {sms:"+15551234567"} sends an SMS per event — must match the verified phone on the caller's account (verify at https://pipeworx.io/account first; 10/day cap). {webhook:"https://..."} POSTs each event JSON to your endpoint, HMAC-signed — the response includes delivery.webhook_secret (whsec_…) ONCE; verify X-Pipeworx-Signature = sha256 HMAC of "<X-Pipeworx-Timestamp>.<raw body>". Auto-disabled after 10 consecutive failing runs.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses requirements (OAuth account, phone verification) and limitations (10/day SMS cap). The annotations include idempotentHint=true, but the description does not clarify whether creating the same subscription twice returns the existing ID or creates a new one. However, the openWorldHint and readOnlyHint are consistent. Overall, the description adds behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is detailed but well-structured, starting with the core purpose, then listing types with examples, and finishing with delivery options. It is slightly dense but front-loads key information without unnecessary repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple types, optional delivery channels, nested params), the description covers all critical aspects: requirements, type-specific filters, delivery constraints, and the return value (subscription id). No output schema is provided, but the description adequately explains the return.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description adds rich examples for each subscription type (e.g., 'sec_8k: {ticker:"AAPL", items?:["5.02"]}') and elaborates on delivery channels with specific constraints (webhook signing, phone verification, caps). This adds significant meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Create a proactive monitoring subscription to a live-data event stream.' It specifies the verb (Create), resource (subscription), and distinguishes from siblings like list_subscriptions and unsubscribe by detailing subscription types and delivery channels.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context on when to use: it requires a Pipeworx OAuth account, lists supported types with examples, and notes optional delivery channels with constraints (phone verification, 10/day cap). It implicitly guides usage by mentioning the always-on feed and referencing recent_alerts for pulling. However, it does not explicitly state when not to use or compare to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

suggest_questionsWhat Can I Ask Pipeworx?A
Read-onlyIdempotent
Inspect

What can I ask Pipeworx? / what is Pipeworx good for? / what can you do? / give me ideas / show me examples / getting started / what data do you have? — the onboarding entry point for an agent that just connected and wants to know what is worth asking. Returns category-bucketed example questions (company financials, drugs & clinical trials, economics, real estate, prediction markets, weather, government & patents, science & academia, news) — each with the exact tool + argument shape that answers it, drawn from the live catalog of thousands of tools. Call with no arguments for the full spread, or pass topic (e.g. "finance", "pharma", "betting") to focus. Use this FIRST when you do not yet know what Pipeworx can do for you, or to learn how to call the meta-tools (ask_pipeworx, entity_profile, compare_entities, etc.).

ParametersJSON Schema
NameRequiredDescriptionDefault
topicNoOptional focus area: finance | pharma | economics | real-estate | betting | weather | government | science | news. Omit for a cross-category spread.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, idempotentHint, openWorldHint), description reveals it returns category-bucketed example questions with exact tool+argument shapes from a live catalog, and that omitting topic gives full spread. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is somewhat long but front-loaded with aliases and purpose. Uses clear structure with examples and bullet points. A bit verbose for the simple tool, but every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, description fully explains return structure (category-bucketed example questions with tool+argument shapes). Covers the tool's purpose, usage, and scope completely for a simple, read-only, optional-param tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema describes 'topic' with examples, but description adds value by explaining default behavior (full spread) and listing valid values (finance, pharma, etc.). With 100% schema coverage, the description still enhances understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description defines the tool as 'the onboarding entry point' with aliases and explicit purpose: helping an agent discover what Pipeworx can do and how to call meta-tools. It clearly distinguishes from sibling tools by stating 'use this FIRST' and listing example use cases.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use guidance: 'Use this FIRST when you do not yet know what Pipeworx can do for you'. Also mentions optional topic to narrow focus. Lacks explicit when-not-to-use, but the context strongly implies it's for initial exploration, not for direct queries.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

twilio_get_messageTwilio Get MessageA
Read-onlyIdempotent
Inspect

Get details of a specific Twilio message by its SID.

ParametersJSON Schema
NameRequiredDescriptionDefault
sidYesMessage SID (starts with SM...)
_authTokenYesTwilio Auth Token
_accountSidYesTwilio Account SID

Output Schema

ParametersJSON Schema
NameRequiredDescription
toNoDestination phone number
sidNoMessage SID
uriNoAPI URI for this message
bodyNoMessage body text
fromNoSender phone number
priceNoMessage cost
statusNoMessage status
date_sentNoISO 8601 sent timestamp
error_codeNoError code if failed
price_unitNoCurrency unit
account_sidNoAccount SID
date_createdNoISO 8601 creation timestamp
date_updatedNoISO 8601 update timestamp
error_messageNoError message if failed
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, and destructiveHint, indicating safe and idempotent behavior. The description adds the specific action of retrieving details by SID, which is consistent and adds minimal context. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single concise sentence that front-loads the action and resource. Every word is essential and there is no wasted text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the low complexity (three simple string parameters), the presence of annotations covering safety/idempotency, and an output schema documenting return values, the description is complete enough for an agent to select and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents the three parameters. The description does not add any additional meaning or guidance beyond what is in the schema, meeting the baseline of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Get details' and the resource 'specific Twilio message by its SID'. It distinguishes from sibling tools like twilio_list_messages which lists multiple messages.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The context is clear: use this tool when you have a specific message SID. No explicit exclusion of alternatives, but the sibling tools are available and the description implies a one-by-one retrieval.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

twilio_list_callsTwilio List CallsA
Read-onlyIdempotent
Inspect

List recent phone calls from your Twilio account. Returns call SID, status, duration, and direction.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMax calls to return (default 20, max 100)
_authTokenYesTwilio Auth Token
_accountSidYesTwilio Account SID

Output Schema

ParametersJSON Schema
NameRequiredDescription
endNoEnd index
uriNoAPI URI for this resource
pageNoCurrent page number
callsNoArray of call objects
startNoStart index
totalNoTotal number of calls
page_sizeNoPage size
next_page_uriNoURI for next page
first_page_uriNoURI for first page
previous_page_uriNoURI for previous page
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, destructiveHint. Description adds return fields but no additional behavioral traits beyond annotations. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose and return info, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate for a simple list tool with good annotations and output schema. Missing explicit mention of date range or ordering, but 'recent' implies recency. Mostly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are well-documented in schema. Description does not add meaning beyond what schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'List recent phone calls from your Twilio account' with a specific verb and resource, and lists return fields (SID, status, duration, direction), distinguishing it from siblings like twilio_make_call and twilio_list_messages.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implied usage from title and description, but no explicit guidance on when to use this tool versus alternatives like twilio_list_messages or twilio_make_call. No when-not-to-use or exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

twilio_list_messagesTwilio List MessagesA
Read-onlyIdempotent
Inspect

List recent SMS/MMS messages from your Twilio account. Supports filtering by to/from number and pagination.

ParametersJSON Schema
NameRequiredDescriptionDefault
toNoFilter by destination phone number
fromNoFilter by sender phone number
limitNoMax messages to return (default 20, max 100)
_authTokenYesTwilio Auth Token
_accountSidYesTwilio Account SID

Output Schema

ParametersJSON Schema
NameRequiredDescription
endNoEnd index
uriNoAPI URI for this resource
pageNoCurrent page number
startNoStart index
totalNoTotal number of messages
messagesNoArray of message objects
page_sizeNoPage size
next_page_uriNoURI for next page
first_page_uriNoURI for first page
previous_page_uriNoURI for previous page
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the description doesn't need to emphasize safety. However, it adds no behavioral context beyond 'recent' (e.g., how recency is determined, pagination details, rate limits). It does not contradict annotations, but it also doesn't enrich the behavioral understanding.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences that are front-loaded with the core function. Every word is necessary and no extraneous information is present. The structure allows an agent to quickly understand the tool's purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple list tool with comprehensive annotations and an output schema (as per context signals), the description is nearly complete. It covers the main purpose and filtering capabilities. However, it could mention default pagination limits (though present in schema) or error scenarios, but this is not critical given the tool's simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with all parameters already described in the schema. The description mentions filtering and pagination, but this is a restatement of what the schema already provides. No additional semantic context is added, so the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (list), the resource (SMS/MMS messages from a Twilio account), and notes filtering and pagination. This distinguishes it from sibling tools like twilio_get_message (single message) and twilio_send_sms (send), providing a clear purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for listing recent messages but does not explicitly state when to use this tool versus alternatives, nor does it mention any prerequisites or cases where it should not be used. The sibling names provide some contextual guidance, but explicit when-to-use and when-not-to-use are missing.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

twilio_make_callTwilio Make CallC
Read-onlyIdempotent
Inspect

Initiate a phone call via Twilio. Requires a TwiML URL or application SID to control call behavior.

ParametersJSON Schema
NameRequiredDescriptionDefault
toYesDestination phone number in E.164 format
urlYesTwiML URL that controls what happens when the call connects
fromYesTwilio phone number to call from in E.164 format
_authTokenYesTwilio Auth Token
_accountSidYesTwilio Account SID

Output Schema

ParametersJSON Schema
NameRequiredDescription
toNoDestination phone number
sidNoCall SID
uriNoAPI URI for this call
fromNoCaller phone number
priceNoCall cost
statusNoCall status (queued, ringing, in-progress, completed, etc.)
durationNoCall duration in seconds
end_timeNoISO 8601 end timestamp
directionNoCall direction (outbound)
price_unitNoCurrency unit
account_sidNoAccount SID
date_createdNoISO 8601 creation timestamp
date_updatedNoISO 8601 update timestamp
Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description contradicts the annotations: readOnlyHint is true but making a call is a mutation, and idempotentHint is true but calls are not idempotent. The description does not clarify side effects like costs or call state changes.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief (two sentences) and front-loaded with the action. It could be slightly more informative without losing conciseness, but it is not verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (external API, cost implications, authentication required), the description omits crucial context such as the need for valid Twilio credentials, potential failure modes, and billing impact. The presence of an output schema reduces the need to explain return values, but the overall completeness is lacking.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds no new parameter meaning beyond the schema, only reiterating the TwiML URL requirement that is already in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it initiates a phone call via Twilio, which distinguishes it from sibling tools like twilio_send_sms or twilio_list_calls. However, it mentions requiring a TwiML URL or application SID, but the schema only includes a URL parameter, causing slight confusion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. It does not specify prerequisites, such as needing a valid Twilio account, or scenarios where another tool (e.g., twilio_send_sms) would be more appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

twilio_send_smsTwilio Send SmsA
Idempotent
Inspect

Send an SMS message via Twilio. Returns the message SID and status.

ParametersJSON Schema
NameRequiredDescriptionDefault
toYesDestination phone number in E.164 format (e.g., +15551234567)
bodyYesMessage body text (max 1600 characters)
fromYesTwilio phone number to send from in E.164 format
_authTokenYesTwilio Auth Token
_accountSidYesTwilio Account SID (starts with AC...)

Output Schema

ParametersJSON Schema
NameRequiredDescription
toNoDestination phone number
sidNoMessage SID (unique identifier)
bodyNoMessage body text
fromNoSender phone number
priceNoCost of the message
statusNoMessage status (queued, sending, sent, failed, etc.)
date_sentNoISO 8601 timestamp when sent
error_codeNoError code if failed
price_unitNoCurrency unit (USD)
account_sidNoAccount SID
date_createdNoISO 8601 timestamp of creation
date_updatedNoISO 8601 timestamp of last update
error_messageNoError message if failed
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotations already indicate readOnlyHint=false and destructiveHint=false. The description adds the return value (SID and status) but does not elaborate on potential non-idempotency (each request sends a new message), authentication requirements (though parameters include auth fields), or error conditions. The idempotentHint=true annotation may be misleading, but the description does not contradict it.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single sentence with no wasted words. It is front-loaded and efficiently conveys the core purpose and output.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (context says 'Has output schema: true'), the description need not explain return values. However, it lacks context about authentication, error handling, and prerequisites. It is adequate but not thorough for a tool with five required parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description does not add meaning beyond the schema; it only mentions the return values, not parameter details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action ('Send an SMS message'), the service ('via Twilio'), and what it returns ('Returns the message SID and status'). It distinguishes itself from sibling tools such as twilio_make_call or twilio_list_messages.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It does not mention prerequisites, when not to use it, or compare to other Twilio tools like twilio_get_message or twilio_make_call.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

unsubscribeUnsubscribe from AlertsA
Idempotent
Inspect

Cancel a subscription by id. Ownership is enforced — you can only cancel your own subscriptions. The row is deactivated (not deleted) so its historical events stay available via recent_alerts.

ParametersJSON Schema
NameRequiredDescriptionDefault
idYesSubscription id (uuid) returned by subscribe.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate non-destructive and idempotent. The description adds that it's a soft deactivation (not deletion) and enforces ownership, which is beyond what annotations provide. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the purpose, and every sentence adds important information. There is no redundancy or unnecessary detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the single required parameter and simple behavior, the description is complete. It covers what happens, ownership constraints, and impact on historical data, leaving no ambiguity for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and already describes the 'id' parameter. The description adds value by linking the id to the 'subscribe' tool and explaining ownership enforcement, enhancing understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action 'Cancel a subscription by id' and distinguishes it from sibling tools like 'subscribe' and 'list_subscriptions'. It also explains the effect (deactivation, not deletion), making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states ownership enforcement ('you can only cancel your own subscriptions') and clarifies that the row is deactivated, not deleted, so historical events remain available. This provides clear when-to-use and implications.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_claimValidate ClaimA
Read-onlyIdempotent
Inspect

"Is it true that…" / "fact check" / "verify the claim that…" / "did X really…" / "was Y actually…" / "confirm or refute" / "true or false" — natural-language claim verification against authoritative sources. Use whenever the agent needs to check whether something a user said is factually correct. v1 supports company-financial claims (revenue, net income, cash position for public US companies) via SEC EDGAR + XBRL. Returns a verdict (confirmed / approximately_correct / refuted / inconclusive / unsupported), extracted structured form, actual value with pipeworx:// citation, and percent delta. Replaces 4–6 sequential calls (NL parsing → entity resolution → data lookup → numeric comparison).

ParametersJSON Schema
NameRequiredDescriptionDefault
claimYesNatural-language factual claim, e.g., "Apple's FY2024 revenue was $400 billion" or "Microsoft made about $100B in profit last year".
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint, and destructiveHint, which convey safety and idempotence. The description adds value by detailing the verdict types (confirmed, approximately_correct, etc.), source (SEC EDGAR + XBRL), and output format (structured form, actual value with citation, percent delta). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (about 5 sentences) and front-loaded with user-facing phrases like 'Is it true that...'. Every sentence serves a purpose: defining use cases, specifying scope, describing output, and explaining efficiency benefits. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has only one required parameter and no output schema, the description sufficiently explains the input, output, use context, and source. It covers the return format, supported claim types, and why this tool is beneficial (replaces multiple calls). No gaps left for an agent to infer.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The single parameter 'claim' has 100% schema description coverage with examples. The tool description adds meaning by clarifying it accepts natural language claims, providing sample phrasing, and specifying the supported claim type (company-financial). This enhances the semantic understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool verifies natural-language factual claims against authoritative sources, with specific examples like 'Is it true that...' and 'fact check'. It distinguishes from sibling tools by highlighting it replaces 4-6 sequential calls, indicating a specialized function. The verb 'validate' and resource 'claim' are directly aligned.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use whenever the agent needs to check whether something a user said is factually correct' and specifies scope: 'v1 supports company-financial claims'. This provides clear when-to-use guidance, though it does not explicitly mention alternatives for non-financial claims. The statement about replacing 4-6 calls implicitly suggests it is the preferred option for these use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.