gapup-mcp

Server Details

100+ agent-payable C-suite expertises with x402 micro-payments — competitive intel, SEC filings, sanctions, KYC, clinical evidence, real estate, ESG. 183 tools, free tier 100 calls/month.

Status: Healthy
Last Tested: 2026-07-25 04:53
Transport: Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

C2.5/5.0

Tool DescriptionsC

Average 3.6/5 across 271 of 271 tools scored. Lowest: 1.8/5.

Server CoherenceD

Disambiguation2/5

With 271 tools, many have overlapping purposes (e.g., multiple competitor intel tools, multiple financial modelers, multiple ESG auditors). Detailed descriptions help slightly, but the sheer volume creates confusion. Agents would struggle to select the right tool among many similar options.

Naming Consistency1/5

Tool names are wildly inconsistent: mix of English and French, snake_case and short phrases, some very generic (process, run, execute equivalents). No discernible naming convention (e.g., abm_architect vs. boundary_control vs. bp_narratif). This makes it hard to predict tool names.

Tool Count1/5

271 tools is far beyond typical well-scoped servers (3-15). This indicates an unfocused, over-bloated tool surface. Even for a general business intelligence server, this number is excessive and violates the principle of each tool earning its place.

Completeness2/5

Despite the large count, coverage feels scattered. Some domains (e.g., content, competitive intel) have many tools, while others (e.g., supply chain, HR) have gaps. The set lacks a coherent scope; it seems like a dump of many separate tool collections rather than a complete, curated surface.

Available Tools

271 tools

abm_architectB

Read-only

Inspect

Architecte ABM — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Gapup Hub — ABM 20 comptes nommés · Budget €120k · Tier 1×5 + Tier 2×15 · Playbooks 3 niveaux. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`product`	Yes
`salesTeam`	No
`icpCriteria`	Yes
`abmBudgetEur`	No
`targetAccounts`	Yes
`currentChannels`	No

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, indicating safety. The description adds that inputs are validated server-side, consistent with read-only behavior. No side effects or contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, consisting of a few sentences. It front-loads the tool name and purpose. However, the reference case could be seen as jargon-heavy, though not overly verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity with 8 parameters, nested objects, and no output schema, the description lacks detail about the deliverable's structure, prerequisites, or result format. It is insufficient for an agent to fully understand the tool's capabilities.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema description coverage at 13%, the description does not explain parameters beyond vague phrasing 'send the documented case fields'. It fails to add meaning to the 8 nested parameters, which the schema only partially documents.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description identifies the tool as 'Architecte ABM' targeting C-suite expertise for ABM strategy. It mentions a reference case and states it returns a structured deliverable, clearly distinguishing it from siblings like 'abm_lookalike_account_finder' which focuses on finding lookalike accounts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for creating an ABM architecture but does not specify when not to use it or alternatives. The reference case provides a context, but no explicit guidance on selection vs. siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

abm_lookalike_account_finderA

Read-onlyIdempotent

Inspect

As a CMO, discover 50 B2B accounts that closely match your top 10 customers' tech stacks and firmographics. This tool analyzes public web data including robots.txt and OpenGraph metadata to identify lookalike accounts for targeted ABM campaigns. Input your top customer domains and desired firmographic filters to receive a ranked list of potential targets with matching technologies and company attributes.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`tech_stack_keywords`	No	Specific technologies to match in lookalike accounts
`firmographic_filters`	No
`top_customer_domains`	Yes	List of top 10 customer domains to use as seed accounts

Output Schema

ParametersJSON Schema

Name	Required	Description
`stats`	No
`status`	Yes
`sources`	Yes
`warnings`	Yes
`lookalike_accounts`	Yes
`matched_technologies`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, openWorldHint=true, and idempotentHint=true. The description adds valuable context: it reveals that the tool analyzes public web data (robots.txt and OpenGraph metadata) and returns a ranked list. This goes beyond annotations by clarifying the data sources and result structure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the core purpose and result. Every sentence adds value: the first defines the output and target user, the second explains the methodology and data sources. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of annotations (readOnly, openWorld, idempotent) and an output schema, the description covers the main aspects: purpose, inputs, data sources, and output type. It could be improved by mentioning the async polling option and that results are ranked, but it is largely complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 75% (3 of 4 top-level parameters have descriptions). The tool description reinforces the role of firmographic filters and tech stacks but does not add new semantic details beyond what the schema provides. The description does not compensate for missing schema description on the firmographic_filters parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'discover 50 B2B accounts that closely match your top 10 customers' tech stacks and firmographics.' It specifies the input (customer domains and firmographic filters) and output (ranked list), and uses a specific verb-resource combination that distinguishes it from sibling ABM tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for targeted ABM campaigns by a CMO, but does not explicitly state when to use this tool over alternatives or provide any exclusion criteria. Without mentioning sibling tools or specific use cases, the guidance is only implied.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

account_expansion_mapperC

Read-only

Inspect

Mapping d'expansion comptes — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Notion B2B Enterprise — top 30 strategic accounts · expansion plays NRR 130%+ target · Snowflake/Shopify/Vercel/Stripe analyzed. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`accounts`	Yes
`ownership`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that it returns an audited deliverable and that inputs are validated server-side. This provides some additional context but does not delve into behavioral traits like rate limits or destructive actions beyond what annotations imply.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very short and to the point, but the inclusion of French and the jargon 'Gapup agent-payable C-suite expertise (CRO)' reduces clarity. It front-loads the main point but could be more universally understandable.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has complex nested parameters, no output schema, and no output description, the description is incomplete. It does not explain what the deliverable contains or how to interpret the result, relying solely on a reference case that may not generalize.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only 'async' has a description). The description does not explain any parameters beyond 'send the documented case fields.' This lacks semantic support for the nested objects (company, accounts, ownership) which have many fields without descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly indicates the tool returns a structured, audited deliverable for account expansion mapping, with a reference case. However, it could be more explicit about the verb (e.g., 'Generate' or 'Map') and the phrasing 'Gapup agent-payable C-suite expertise (CRO)' is confusing. It distinguishes itself from siblings due to its unique focus on expansion mapping.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives. The reference case hints at B2B enterprise expansion with strategic accounts, but no when-not or alternative tools are mentioned. Sibling tools are numerous but none directly compete.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

action_plan_esgC

Read-only

Inspect

Plan d'action ESG — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: TechCorp SAS — Plan ESG 36 mois (500 FTE, €60M CA, score 54→76/100). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`horizon`	Yes		36 mois
`ambitions`	Yes
`targetLabels`	No
`currentScores`	No
`availableResources`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint and openWorldHint. The description adds that inputs are validated server-side and returns a structured deliverable, but does not disclose potential side effects or async behavior beyond what the async parameter suggests. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes some jargon ('Gapup agent-payable C-suite expertise') that may not be helpful. It could be more streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested objects, 8 parameters, no output schema), the description lacks detail on output format, async handling, and parameter constraints. It is insufficient for fully accurate invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (13%). The description does not explain any parameters beyond general mention of 'documented case fields,' failing to compensate for the missing schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a structured ESG action plan deliverable and includes a reference case. However, it does not differentiate from sibling tools like esg_audit_multi or sustainability_report, so a slight deduction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions inputs are validated server-side but provides no guidance on when to use this tool versus alternatives. There is no explicit when-to-use or when-not-to-use context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

adversarial_input_stress_testerA

Read-onlyIdempotent

Inspect

An asynchronous risk assessment tool that evaluates AI model resilience against adversarial inputs following NIST AI Risk Management Framework (RMF) red-teaming protocols. Designed for security and compliance personas, it accepts model outputs or decision boundaries and returns structured risk scores, failure modes, and adversarial examples. Requires async:true to avoid timeout errors. Outputs include status, warnings, and source references.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`maxTests`	No	Maximum number of adversarial tests to run
`modelOutput`	Yes	The AI model's output or decision to be stress-tested
`adversarialDataset`	No	Optional custom adversarial inputs to test
`sensitivityThreshold`	No	Threshold for flagging high-risk adversarial examples

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`riskScore`	No	Normalized risk score from adversarial testing
`failureModes`	No
`adversarialExamples`	No

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, openWorldHint, and idempotentHint. The description adds value by disclosing async behavior, required async:true, and output structure (risk scores, failure modes, adversarial examples, status, warnings, source references). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, no extraneous information. Every sentence serves a purpose: purpose/framework, async requirement/target audience, output summary. Highly concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 params, output schema exists, annotations rich), the description covers key aspects: purpose, framework, async requirement, personas, and output types. It does not mention the custom adversarial dataset or sensitivity threshold, but schema descriptions already handle those, making the description complete enough.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description mentions 'model outputs or decision boundaries' which aligns with the modelOutput parameter, but does not add further semantic enrichment beyond what the schema provides for individual parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool evaluates AI model resilience against adversarial inputs following NIST AI RMF protocols, specifying verbs and resources. However, it does not explicitly differentiate from sibling tools like jailbreak_attempt_detector or model_behavior_drift_monitor, lacking direct sibling differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides usage guidance by noting that async:true is required to avoid timeout errors, and identifies target personas (security/compliance). It does not explicitly state when not to use the tool or mention alternatives, leaving the usage context implied rather than explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

affiliate_fraud_clickstream_detectorA

Read-onlyIdempotent

Inspect

Analyzes affiliate clickstream data from Common Crawl to flag potential fraud patterns (duplicate IPs, rapid clicks, device spoofing). Designed for CMOs to validate affiliate traffic quality and prevent budget waste. Inputs: affiliate network name and date range. Outputs: fraud probability score, suspicious IP list, and pattern analysis. Keywords: affiliate fraud detection, clickstream analysis, marketing attribution, traffic validation.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`threshold`	No	Fraud probability threshold (0.1-0.99)
`date_range`	Yes
`affiliate_network`	Yes	Name of the affiliate network to analyze (e.g., 'CJ Affiliate', 'Rakuten')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`suspicious_ips`	No
`fraud_probability`	No	Overall fraud probability score (0-1)
`patterns_detected`	No
`total_clicks_analyzed`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and openWorldHint=true. The description adds valuable context: data source (Common Crawl), analyzed patterns (duplicate IPs, rapid clicks, device spoofing), and output types (fraud probability score, suspicious IP list, pattern analysis). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficient: three sentences covering function, user, inputs, and outputs, plus a keywords list for searchability. It is front-loaded with the core action. The keywords section is slightly redundant but not detrimental.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (4 params, nested object, output schema exists), the description covers key aspects: purpose, data source, target user, and output types. It lacks details on error handling or data freshness, but the output schema supplements return value documentation. Overall, adequate for selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 75%, leaving threshold and async described only in the schema. The tool description mentions affiliate_network and date_range but not threshold (with default and range) or async. Since baseline is 3 with high schema coverage, the description adds minimal additional meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: analyzes affiliate clickstream data to flag fraud patterns such as duplicate IPs, rapid clicks, and device spoofing. It also specifies the target user (CMOs) and the problem it solves (preventing budget waste). This distinguishes it from other fraud detectors in the sibling list, e.g., 'fraud_detector' or 'x402_payment_fraud_detector', which target different domains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for affiliate traffic validation but does not explicitly state when to use this tool versus alternatives. It provides no exclusions or comparisons to other fraud detection tools. The phrase 'Designed for CMOs' offers some context, but lacks guidance on when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

africa_trade_barrier_breakerA

Read-onlyIdempotent

Inspect

As a COO, analyze non-tariff trade barriers (NTBs) across African trade corridors using WITS and UNCTAD STAT data. Input origin/destination countries and product HS codes to receive barrier mapping with severity scores and actionable mitigation strategies. Returns structured risk assessment, regulatory compliance gaps, and supply chain optimization recommendations. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`hs_code`	No	6-digit Harmonized System product code
`origin_country`	Yes	ISO 3-letter country code for export origin
`destination_country`	Yes	ISO 3-letter country code for import destination
`include_regulatory_details`	No	Whether to include detailed regulatory text in output

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`warnings`	Yes
`barrier_summary`	Yes
`trade_flow_impact`	No
`regulatory_details`	No
`mitigation_strategies`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and idempotentHint. The description adds value by naming data sources, output structure, and async behavior advice.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences: purpose and data sources, inputs/outputs, async advice. No redundant information, well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, inputs, outputs, data sources, and async usage. Output schema exists, so return format details are covered. Complete for this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds async usage context and mentions origin/destination and HS codes, but does not significantly enhance parameter understanding beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes non-tariff trade barriers using specific data sources (WITS and UNCTAD STAT) and distinguishes from siblings like africa_trade_finance_esg_rater by focusing on NTBs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for use as a COO and mentions async usage to avoid timeout, but does not explicitly exclude alternatives or specify when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

africa_trade_finance_esg_raterA

Read-onlyIdempotent

Inspect

As a COO, evaluate ESG compliance of African trade finance providers using World Bank WITS trade statistics and CDP climate disclosure data. Input the financial institution's name or identifier, and receive an ESG rating with breakdown across environmental, social, and governance dimensions. Ideal for due diligence on trade partners or portfolio risk assessment. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`year`	No	Assessment year (2018-2023)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`countryCode`	No	ISO 2-letter country code (e.g., 'ZA' for South Africa)
`institutionName`	Yes	Full name of the trade finance provider (e.g., 'Standard Bank Group')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`warnings`	Yes
`esgRating`	Yes
`socialScore`	No
`tradeVolume`	No	Annual trade finance volume (USD)
`carbonIntensity`	No	CO2 emissions per million USD financed (tons)
`governanceScore`	No
`environmentalScore`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds contextual details: use of specific data sources (World Bank WITS, CDP), the async parameter to avoid timeout, and the breakdown dimensions. No contradictions, and it enhances transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences: first states purpose and data sources, second describes input/output, third provides use case and async tip. Every sentence is valuable, front-loaded, and no wasted words. Excellent conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (not shown), the description does not need to detail return values. It covers key aspects: data sources, input, output dimensions, use case, and async behavior. Could be more explicit about error handling for unknown institutions, but overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description mentions 'Input the financial institution's name or identifier', which aligns with the 'institutionName' parameter description in the schema. It does not add significant new meaning beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool evaluates ESG compliance of African trade finance providers using specific data sources (World Bank WITS, CDP) and returns a rating with breakdown across E, S, G dimensions. It uniquely targets African trade finance providers, distinguishing it from broader ESG tools like 'supplier_esg_audit' or 'esg_audit_multi'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the tool is 'Ideal for due diligence on trade partners or portfolio risk assessment', which provides usage context. However, it does not explicitly state when not to use it or suggest alternative tools, leaving some ambiguity for the agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

africa_trade_preference_arbitrageA

Read-onlyIdempotent

Inspect

Analyzes AGOA (African Growth and Opportunity Act) and EBA (Everything But Arms) trade preference arbitrage opportunities for COOs evaluating export strategies. Compares tariff rates, trade volumes, and preference utilization across eligible African countries using WITS and OECD trade data. Returns structured analysis of potential duty savings, market access advantages, and compliance requirements. — pass async:true REQUIRED to avoid x402 timeout.

ParametersJSON Schema

Name	Required	Description
`year`	No	Reference year for trade data
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`hs_code`	Yes	6-10 digit Harmonized System product code
`exporting_country`	Yes	ISO 2-letter country code of African exporter
`importing_country`	No	ISO 2-letter country code of target market (US/EU)
`preference_scheme`	No	Trade preference scheme to analyze

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`duty_savings_pct`	No	Estimated duty savings percentage under preference scheme
`trade_volume_usd`	No	Annual trade volume in USD for given HS code
`market_access_score`	No	Composite score of market access advantage (0-100)
`compliance_requirements`	No	List of compliance requirements for preference eligibility
`preference_utilization_rate`	No	Percentage of eligible exports utilizing preference

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the annotations (readOnlyHint, openWorldHint, idempotentHint), the description adds critical behavioral detail: the async parameter is required to avoid x402 timeout, which is a significant operational constraint. It also mentions data sources (WITS and OECD), enhancing transparency. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (4 sentences) and front-loaded with purpose. The first sentence states the main function, the second details data and outputs, and the third provides a crucial usage note. No unnecessary words, each sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, async requirement, output schema present), the description covers the domain, data sources, and target audience. It mentions key outputs (duty savings, market access, compliance). It is nearly complete but could benefit from a brief example or reference to the output schema structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, so the baseline is 3. The description adds context by linking parameters to the tool's goal (e.g., preference_scheme tied to AGOA/EBA, exporting_country to African countries), but does not significantly deepen understanding of individual parameters beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the tool's purpose: analyzing AGOA and EBA trade preference arbitrage opportunities for COOs. It uses specific verbs and resources (compares tariff rates, trade volumes, preference utilization) and distinguishes from siblings like africa_trade_preference_optimizer and agoa_eba_intelligence by focusing on arbitrage opportunities and comparative analysis.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context: it is for COOs evaluating export strategies using specific trade data sources. However, it does not explicitly state when not to use this tool or mention alternatives among siblings like africa_trade_barrier_breaker. The implied usage is strong but lacks exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

africa_trade_preference_optimizerA

Read-onlyIdempotent

Inspect

As a COO, analyze AGOA/EBA duty savings opportunities with HS code-level trade route optimization. Input origin country, destination country, and HS code to receive duty savings estimates, optimal trade routes, and preference utilization recommendations. Uses UN Comtrade trade flow data, WCO tariff schedules, and African Union trade agreement rules. Ideal for export market evaluation, supply chain optimization, and trade agreement compliance analysis. Keywords: AGOA, EBA, duty savings, trade optimization, HS code, African trade, export strategy.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`hsCode`	Yes	6-10 digit Harmonized System code (e.g., '010121' for live horses)
`quantity`	No	Estimated annual export quantity in units
`valueUsd`	No	Estimated annual export value in USD
`originCountry`	Yes	ISO 3166-1 alpha-3 country code of export origin (e.g., 'KEN' for Kenya)
`destinationCountry`	Yes	ISO 3166-1 alpha-3 country code of import destination (e.g., 'USA' for United States)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`dutySavings`	No	Estimated annual duty savings in USD under optimal preference program
`optimalRoute`	No
`alternativeRoutes`	No
`complianceWarnings`	No	Potential compliance risks or documentation requirements

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and openWorldHint=true. The description adds behavioral context by stating it uses specific data sources (UN Comtrade, WCO tariff schedules, AU rules) and mentions the async parameter in the schema, which affects execution behavior. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two paragraphs: first covers the primary purpose and outputs, second provides keywords and use cases. It is front-loaded with the main function and avoids unnecessary verbosity. Could be slightly tighter but overall concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 6 parameters (3 required) and an output schema, the description covers the main inputs, outputs, data sources, and use cases. It does not explain the async behavior explicitly (though schema does) but is otherwise sufficient for an agent to understand when to use it. Fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with all parameters described. The description reiterates the main three required parameters but adds no additional meaning beyond the schema. For instance, the async parameter is fully described in the schema, and the description does not explain how quantity and value are used in the analysis. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it analyzes AGOA/EBA duty savings opportunities with HS code-level trade route optimization. It specifies inputs (origin country, destination country, HS code) and outputs (duty savings estimates, optimal trade routes, recommendations). While it does not explicitly differentiate from all sibling tools, the focus on duty savings and trade routes provides a clear purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions use cases ('Ideal for export market evaluation, supply chain optimization, and trade agreement compliance analysis') but does not provide guidance on when not to use this tool or suggest alternatives among the many sibling tools. The context is implied but not explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agoa_eba_intelligenceB

Read-only

Inspect

Intelligence préférentielle AGOA (US→Africa) et EBA/GSP (EU→Africa). Vérifie l'éligibilité d'un pays africain aux programmes tarifaires préférentiels, l'éligibilité d'un produit par code HS, identifie les meilleures opportunités d'export Afrique→US/EU, et fournit les règles de conformité (rules of origin, valeur ajoutée, docs). Différenciateur Africa diaspora : 39 pays AGOA + 47 LDCs EBA encodés. Sources : AGOA.info · EU EBA · EU GSP+ · WTO Tariff · UN Comtrade.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Mode d'analyse : 'country_eligibility' (statut AGOA/EBA/GSP d'un pays africain) \| 'product_eligibility' (éligibilité d'un produit par code HS) \| 'trade_opportunity' (top opportunités export Afrique→US/EU) \| 'compliance_check' (rules of origin, seuils valeur ajoutée, documentation)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`hs_code`	No	Code HS (Harmonized System) 6+ chiffres (requis pour product_eligibility). Exemple : '620342' = pantalons coton homme, '090111' = café arabica non torréfié, '060310' = fleurs fraîches.
`country_iso`	No	Code ISO 2-lettres du pays africain (requis pour country_eligibility). Exemples : KE=Kenya, NG=Nigeria, ZA=Afrique du Sud, ET=Éthiopie, LS=Lesotho, GH=Ghana.
`destination`	No	Marché de destination pour trade_opportunity : 'US', 'EU', ou 'both' (défaut). Ignoré pour les autres modes.

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare 'readOnlyHint: true' and 'openWorldHint: true'. The description adds context about data sources (AGOA.info, EU EBA) and country coverage, but does not disclose additional behavioral traits like response time, caching, or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded paragraph that efficiently conveys the tool's purpose and capabilities. It could benefit from slightly more structure (e.g., bullet points), but it is not verbose and each sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple modes, 5 parameters, no output schema), the description explains modes and differentiators but does not describe the return format or result structure. This gap in completeness is notable for an intelligence tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Parameter descriptions in the schema cover all 5 parameters (100% coverage). The description echoes the mode options and provides some examples but does not add significant new semantic information beyond what the schema already provides. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool checks AGOA/EBA/GSP eligibility for African countries, HS code product eligibility, identifies export opportunities, and provides compliance rules. It mentions specific country counts and sources, but does not explicitly differentiate from sibling tools like 'africa_trade_preference_arbitrage' or 'africa_trade_preference_optimizer'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage scenarios (checking eligibility, opportunities, compliance) but does not provide explicit guidance on when to use this tool versus alternatives, nor does it mention prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_act_incident_responseC

Read-onlyIdempotent

Inspect

Generates EU AI Act incident response playbooks with regulator notification templates for risk management teams. Inputs include incident severity, AI system type, and affected stakeholders. Outputs structured playbook steps, regulator notification drafts, and compliance checklists. Essential for high-risk AI system breaches requiring formal EU notification — pass async:true REQUIRED to avoid x402 timeout. Keywords: AI Act compliance, incident response, regulator notification, risk management, ISO 27035, NIST SP 800-61.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`severity`	Yes
`incident_type`	Yes
`ai_system_type`	No
`incident_description`	No
`affected_stakeholders`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`next_steps`	No
`playbook_steps`	No
`compliance_checklist`	No
`regulator_notification`	No

Tool Definition Quality

C2.8/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotations declare readOnlyHint: true, but the tool generates playbooks and notification drafts, which are write operations. The description does not clarify this contradiction and only mentions async behavior. The description contradicts annotations, making behavioral transparency poor.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph that front-loads the purpose, then inputs/outputs, a critical usage note, and keywords. It is reasonably concise with no redundancy, though it could be slightly shorter. Structure is effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, key inputs/outputs, and async requirement. With an output schema present, return values need not be detailed. However, given the tool's complexity (generating playbooks with multiple parameters), the description omits detailed parameter behavior and does not address the contradiction with readOnlyHint. It is adequate but has gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, yet the description only lists three of six parameters (incident severity, AI system type, affected stakeholders) and mentions async in the usage note. It does not describe incident_type, incident_description, or fully explain async. With low coverage, the description adds some value but is insufficient to compensate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates EU AI Act incident response playbooks with regulator notification templates, specifying inputs and outputs. It is specific to high-risk AI system breaches and includes relevant keywords. However, it does not explicitly differentiate from sibling tools like ai_act_training_data_audit or ai_governance_full_report_async, though the purpose is distinct enough.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates the tool is 'Essential for high-risk AI system breaches requiring formal EU notification' and advises passing async:true to avoid timeout. It provides context for when to use but does not give explicit when-not-to-use or mention alternatives. Usage guidance is present but lacks exclusions or comparisons to siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_act_sandbox_regulatory_sandboxA

Read-onlyIdempotent

Inspect

A legal-focused tool for simulating EU AI Act regulatory sandbox submissions. Provides structured feedback on compliance, risk levels, and required documentation based on EUR-Lex and OECD AI Policy Observatory sources. Accepts AI system descriptions, intended use cases, and technical specifications as input. Returns detailed assessment with warnings, citations, and actionable recommendations for legal teams and AI developers.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`sector`	No	Primary sector of application
`riskLevel`	Yes	Self-assessed risk level of the AI system
`intendedUse`	Yes	Primary and secondary use cases of the AI system
`documentation`	No	List of provided documentation types (e.g., 'technical', 'ethical', 'data')
`systemDescription`	Yes	Detailed description of the AI system including purpose, architecture, and data sources

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`assessment`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds context about accepting system descriptions and returning structured feedback with warnings and citations, which complements the annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at four sentences, front-loading the purpose. While efficient, the first sentence is somewhat lengthy, slightly reducing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, inputs, outputs, and even cites sources (EUR-Lex, OECD). Given the presence of an output schema, it adequately explains what the tool returns without needing to detail return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema covers all 6 parameters with descriptions (100% coverage). The description provides a general overview of input types but does not add specific parameter details beyond what the schema offers, warranting a baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool simulates EU AI Act regulatory sandbox submissions, specifying verb 'simulating' and resource 'regulatory sandbox submissions'. It differentiates from siblings like ai_act_incident_response by focusing on regulatory sandbox submissions, making its purpose distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly targets legal teams and AI developers, implying use for compliance testing. However, it does not specify when not to use this tool or mention alternative tools, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_act_training_data_auditA

Read-onlyIdempotent

Inspect

As a CTO, audit AI training datasets for EU AI Act compliance with bias detection and regulatory risk assessment. Inputs: dataset identifier (Hugging Face ID or URL) and optional risk thresholds. Outputs: compliance score, bias metrics, regulatory warnings, and source references. Ideal for pre-deployment risk evaluation. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`dataset_id`	Yes	Hugging Face dataset identifier or direct URL to dataset
`risk_threshold`	No
`include_bias_metrics`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`bias_metrics`	No
`compliance_score`	No
`dataset_metadata`	No
`regulatory_warnings`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, idempotentHint. The description adds value by noting the async option and timeout avoidance, and lists outputs. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first explains purpose and inputs/outputs, second advises on async usage. Front-loaded, no fluff, every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With output schema present, the description covers purpose, inputs, outputs, and usage context. The async tip adds practical guidance. Complete for a compliance audit tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 50%. The description reiterates dataset identifier and risk thresholds but adds no new meaning beyond the schema. Does not compensate for the missing schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool audits AI training datasets for EU AI Act compliance, with bias detection and regulatory risk assessment. It specifies inputs, outputs, and distinguishes from siblings like ai_act_incident_response.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'Ideal for pre-deployment risk evaluation', giving clear context, but does not explicitly state when not to use or compare with alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_governance_full_report_asyncB

Read-only

Inspect

Audit EU AI Act complet (Règlement UE 2024/1689) — implémentation native audit-grade. Classifie le système IA selon les 4 tiers de risque (unacceptable/high_risk/limited_risk/minimal_risk/gpai) sur la base de l'Annexe III et de l'Article 5. Produit : (1) classification tier + justification + articles applicables, (2) checklist conformité Articles 9-15 + 50 + 53-55, (3) gaps documentation Annexe IV, (4) mapping ISO 42001, (5) deadlines EU AI Act 2025-2029, (6) estimation coût et effort, (7) top 10 recommandations P0/P1/P2. Retourne immédiatement (<300ms) un job_id. Poller avec ai_governance_full_report_result(job_id) après eta_seconds (~90s). Cache 7 jours pour inputs identiques. Async tool — register a webhook via webhooks_manage(register, url, [job.completed]) to receive callbacks instead of polling. Faster + lighter. DISCLAIMER : non substitutif à un avis juridique professionnel.

ParametersJSON Schema

Name	Required	Description
`company_size`	No	Taille entreprise : startup (≤50), smb (51-250), mid (251-1000), large (1001-5000), enterprise (>5000)
`data_sources`	No	Sources de données utilisées par le système IA
`affected_persons`	No	Catégories de personnes affectées par les décisions du système (ex: candidats, employés, clients)
`geographic_scope`	No	Zones géographiques de déploiement (ex: 'EU', 'France', 'Global')
`intended_purpose`	Yes	Finalité prévue du système IA : à quoi sert-il concrètement
`deployment_context`	No	Contexte de déploiement : interne (usage employés), public, B2B, B2C
`ai_system_description`	Yes	Description détaillée du système IA : ce qu'il fait, comment il fonctionne, quelles décisions il prend

Output Schema

ParametersJSON Schema

Name	Required	Description
`job_id`	Yes	Identifiant unique du job — passer à ai_governance_full_report_result
`status`	Yes
`eta_seconds`	Yes	Durée estimée avant disponibilité du résultat
`submitted_at`	Yes	Timestamp ISO-8601 de soumission

Tool Definition Quality

B3.4/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotation readOnlyHint=true contradicts the description's implication that submitting a job creates a new async task, which is a mutation. The description does not resolve this inconsistency and fails to disclose the side effect of job creation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose, listing 7 output items in a dense block. While informative, it could be more structured and concise. The disclaimer adds length but is necessary.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema existence and full parameter descriptions, the description provides a comprehensive overview of the async workflow, caching, and recommendations. Missing idempotency details but otherwise complete for a complex async tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema covers all parameters with descriptions (100% coverage), so the description adds minimal semantic value. It mentions required fields but does not explain their purpose beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs a full EU AI Act audit, classifying risk tiers and producing 7 specific outputs. It distinguishes from sibling tools by mentioning async submission and webhook usage, and references the ai_governance_full_report_result poller.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear guidance on when to use this tool: for a comprehensive EU AI Act audit. It contrasts with polling (ai_governance_full_report_result) and recommends webhooks for faster interaction. However, it does not explicitly state when not to use it or alternatives like ai_governance_pilot.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_governance_full_report_resultA

Read-onlyIdempotent

Inspect

Poll the result of an ai_governance_full_report_async job. Returns status=pending while running, status=completed with the full EU AI Act governance audit report once done (risk_tier, compliance checklist Articles 9-15/50/53-55, Annex IV documentation gaps, ISO 42001 alignment, deadlines 2025-2029, cost estimate, top-10 recommendations P0/P1/P2, compliance_score), status=failed on error, or status=not_found if the job_id is unknown or expired (TTL 24h). Call this after the eta_seconds hint returned by ai_governance_full_report_async (~90s).

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	Yes	The job_id returned by ai_governance_full_report_async (prefix: aigfr_)

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds significant behavioral context beyond annotations: enumerates all possible statuses (pending, completed with report details, failed, not_found), the TTL, and the timing hint. Annotations already declare readOnly and idempotent, so the description complements them well without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose, succinctly covering return values and timing. Every sentence adds value, no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a polling tool, the description fully informs the agent: when to call (after eta_seconds), what statuses to expect, what the completed report includes (detailed list), and expiration (TTL 24h). The presence of an output schema (though not shown) complements the description.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema covers the sole parameter with a clear description (job_id returned by async tool, prefix aigfr_). The tool description adds no further meaning; schema coverage is 100%, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Poll the result of an ai_governance_full_report_async job,' using a specific verb and resource. It distinguishes itself from the sibling async initiation tool by naming it explicitly and describing the polling nature.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context: call after the eta_seconds hint (~90s), and notes the TTL (24h). Does not explicitly exclude other uses, but the polling intent is unambiguous and the sibling async tool makes usage boundaries obvious.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_governance_pilotB

Read-only

Inspect

Pilotage de gouvernance IA — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: TalentScope SAS — scoring IA candidats RH (EU AI Act Annex III §4, high-risk). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`aiUseCases`	Yes
`targetFrameworks`	Yes

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and returns a structured deliverable, which is useful but does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (3 sentences) and front-loaded with the French-English title. The reference case adds context but could be omitted for brevity. Slightly verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given complexity (5 params, nested objects, no output schema), the description does not explain the deliverable contents, how to use async, or handle results. Incomplete for proper usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (20%), and the description does not elaborate on parameter semantics beyond 'send the documented case fields'. The async parameter is described in schema but not in description. The nested objects lack guidance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is for AI governance pilot, returning a structured deliverable. It references a case. However, it does not explicitly distinguish from sibling tools like ai_governance_full_report_async.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies it's a pilot for governance, but does not provide criteria for when to use this vs. alternatives, nor when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

anti_demissions_hrC

Read-only

Inspect

Bouclier anti-démissions — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Buffer Inc — détection des at-risk parmi 80 FTEs (Q1 2026). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`signals`	Yes
`employees`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that it returns a structured audited deliverable and that inputs are validated server-side, but does not disclose any additional behavioral traits (e.g., async, rate limits, or what 'audited' entails). Minimal value added over annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (2 sentences) and front-loaded with the name and reference case. However, the second sentence is vague and lacks specificity. While concise, it sacrifices informativeness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters with nested objects, no output schema), the description is insufficient. It does not explain the deliverable's structure, required fields, or how to use the output. The reference case provides some context but is incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (async parameter described). The description does not explain the key parameters (company, signals, employees) beyond telling users to 'send the documented case fields.' This adds no semantic meaning and fails to compensate for the schema's lack of descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description provides a clear purpose: detect employees at risk of leaving (anti-démissions), returning a structured audited deliverable. The reference case with Buffer Inc illustrates the functionality. However, it could be more explicit about the exact verb and resource (e.g., 'detect at-risk employees').

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description offers no guidance on when to use this tool versus alternatives. Sibling tools include churn_defender, talent_poaching_risk, etc., but the description does not differentiate or state appropriate contexts. The note about server-side input validation is technical but not usage guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

arbitration_awards_lookupA

Read-onlyIdempotent

Inspect

Commercial arbitration intelligence for litigation lawyers, M&A due diligence teams, sovereign wealth funds and trade finance compliance. Covers 8 major institutions: ICC, AAA, LCIA, HKIAC, SIAC, CIETAC, DIAC, ICDR.

Three modes: • party_lookup — find awards by party name (searches 20 landmark public awards + JusMundi best-effort) • institution_index — browse awards and caseload stats per institution with date range filter • clause_check — audit an arbitration clause for missing elements (institution, seat, language, arbitrator count, governing law, binding nature)

Note: Most arbitration awards are confidential. This tool surfaces public awards (Yukos, Crystallex, Achmea, etc.) plus redacted statistics from institutional annual reports. Private awards are not accessible.

Cache: 24h (arbitration data is very stable). No API key required.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	party_lookup: search by party name or keyword. institution_index: browse awards by institution + stats. clause_check: audit an arbitration clause for issues.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	For party_lookup: party name or keyword (e.g. "Yukos", "Russia"). For institution_index: institution name or keyword. For clause_check: full text of the arbitration clause to audit.
`date_to`	No	ISO date filter to (YYYY-MM-DD). Applied to award_date.
`date_from`	No	ISO date filter from (YYYY-MM-DD). Applied to award_date.
`institution`	No	Filter by institution. Default 'all'.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`awards`	No
`status`	Yes
`sources`	Yes
`clause_check`	No
`quality_score`	Yes
`institution_stats`	No

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, idempotentHint), the description adds valuable behavioral context: 24h cache, no API key required, data stability, and the limitation to public awards only. This fully equips the agent to understand the tool's behavior and constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with bullet-pointed modes, front-loading the purpose. It is slightly verbose but every sentence adds value. Could be tightened by removing redundant phrases like 'Note: Most arbitration awards are confidential' which is already implied.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While the description covers the three modes well, it does not explain how date parameters (date_from, date_to) apply to all modes, only mentioning them for institution_index. The 'async' parameter is present in the schema but omitted from the description. Given the output schema exists, return values are not needed, but the missing parameter details reduce completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds meaning beyond the schema by explaining how the 'query' parameter behaves differently per mode (e.g., 'searches 20 landmark public awards + JusMundi best-effort' for party_lookup) and clarifying the 'institution' parameter limitations. However, the 'async' parameter is not mentioned at all.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: commercial arbitration intelligence with three specific modes (party_lookup, institution_index, clause_check). It distinguishes itself from sibling tools by focusing on arbitration awards and clause auditing, and no sibling tool duplicates this function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear guidance on when to use each mode and notes that most awards are confidential, only public ones are available. It also mentions cache duration and no API key needed. However, it does not explicitly state when not to use the tool or suggest alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

attack_surface_monitorB

Read-only

Inspect

Surveillance surface d'attaque — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Answers: Which Internet-facing assets of combine a critical CVE, an exposed service, and no WAF — top findings to fix in 14 days? · What is the attack surface of : subdomains, open ports, SSL/TLS grades, and associated CVEs? · Give me a CISO-ready ASM report with blast radius estimate and SLA-driven remediation plan for . · What is the email phishing risk for ? Assess SPF/DMARC posture and recommend improvements. · During M&A due diligence, what are the top cyber exposures on 's Internet-facing infrastructure? Reference case: Velora Payments — 8 assets exposés · 2 critiques (CVE-2023-44487 HTTP/2 RapidReset, Admin panel ouvert) · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`domain`	Yes
`exclusions`	No
`scope_cidrs`	No
`include_email_surface`	Yes

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate safe read (readOnlyHint: true). The description adds that it returns a structured, audited deliverable and that inputs are validated server-side, providing some behavioral context, but does not go beyond annotations meaningfully.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is somewhat verbose and includes mixed language (French/English) and a reference case. It front-loads the purpose but could be more concise and structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 6 parameters and no output schema, the description provides examples but does not fully explain the return format or all inputs. Partially complete but could be more comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, and the description does not elaborate on most parameters (focus, exclusions, scope_cidrs, etc.). Only domain and include_email_surface are implied through examples, but not clearly explained.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool monitors attack surface and returns a structured deliverable, with example queries. However, it does not explicitly distinguish itself from sibling tools like cyber_risk_auditor or cve_security_lookup, missing some specificity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides example use cases (CISO-ready report, M&A due diligence, email phishing risk) but lacks explicit guidance on when not to use or alternatives. No exclusions or comparisons to siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

audit_pre_flightC

Read-only

Inspect

Pré-audit comptable — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Spendesk — Pré-audit commissaire · Readiness 74/100 · 4 findings critiques · Checklist 18 docs. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`audit`	Yes
`company`	Yes
`systems`	Yes
`financials`	Yes
`knownIssues`	Yes

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and openWorldHint, so the description adds little behavioral context beyond stating that inputs are validated and a deliverable is returned. It does not contradict annotations but adds only marginal value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, somewhat messy paragraph that mixes French and English. The reference case is specific and not generic, making it less concise. It could be better structured with clear sections or bullet points.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, multiple required fields, no output schema), the description lacks necessary context. It does not explain the deliverable's structure, how the 'async' parameter works, or what the return value looks like, leaving significant gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 17% schema description coverage, the description should explain parameter meaning, but it does not provide any details about the six parameters (company, audit, financials, etc.). The schema itself has some descriptions, but the description fails to compensate for the low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs a pre-audit for accounting, targeting CFO-level users, and returns a structured deliverable. The reference case provides a concrete example. However, the mix of French and English may cause ambiguity, and it doesn't explicitly differentiate from similar sibling tools like 'qa_pre_flight'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool is for CFO-level expertise and mentions that inputs are validated server-side, but it does not provide explicit guidance on when to use this tool versus alternatives, nor does it specify any prerequisites or context for use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

banking_fee_negotiatorA

Read-onlyIdempotent

Inspect

As a CFO-focused tool, banking_fee_negotiator analyzes your bank's fee structures (account maintenance, wire transfers, credit lines) and provides data-driven negotiation recommendations. Input your current fees and bank details to receive benchmark comparisons from World Bank and ECB SDW, along with specific levers to reduce costs. Ideal for optimizing treasury operations and improving financial efficiency. Keywords: bank fees, cost optimization, treasury management, financial benchmarking, negotiation strategy.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`industry`	No	Industry classification (e.g., 'manufacturing', 'retail')
`bank_country`	Yes	ISO 2-letter country code of the bank
`credit_line_fee`	No	Current annual credit line fee percentage
`wire_transfer_fee`	No	Current domestic wire transfer fee in USD
`international_wire_fee`	No	Current international wire transfer fee in USD
`account_maintenance_fee`	Yes	Current monthly account maintenance fee in USD

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`negotiation_levers`	No
`credit_line_benchmark`	No	Industry benchmark for credit line fees percentage
`wire_transfer_benchmark`	No	Regional benchmark for domestic wire transfer fees in USD
`international_wire_benchmark`	No	Regional benchmark for international wire transfer fees in USD
`account_maintenance_benchmark`	No	Regional benchmark for account maintenance fees in USD

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds value beyond annotations by revealing data sources (World Bank, ECB SDW), the analytical process ('data-driven negotiation recommendations'), and output specifics (benchmarks, levers). This aligns with the readOnlyHint and idempotentHint annotations, confirming non-destructive analysis. No contradictions found.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences plus a keyword list, efficiently conveying purpose and context. It is front-loaded with the core function and target audience. The keyword section could be omitted or integrated, but overall it is concise and readable.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the rich annotations (readOnly, idempotent) and existence of output schema, the description provides sufficient context. It explains what the tool does, its inputs, and outputs (benchmarks, recommendations). The async parameter is covered by schema, and the description implies the tool is analytical, not action-oriented.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema covers 100% of 7 parameters with descriptions, so the baseline is 3. The description mentions 'current fees' and 'bank details' but does not elaborate on parameter syntax or defaults. It adds marginal context by grouping fee types but does not significantly enhance schema content.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a CFO-focused bank fee negotiator that analyzes fee structures and provides negotiation recommendations. It specifies the analysis areas (account maintenance, wire transfers, credit lines) and data sources (World Bank, ECB SDW), differentiating it from broader financial tools. However, it does not explicitly distinguish from sibling tools like 'treasury_optimizer' or 'working_capital'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when the user has current fee data and wants to optimize costs, stating 'Input your current fees and bank details to receive benchmark comparisons...' It uses keywords suggesting cost optimization and treasury management. However, it lacks explicit guidance on when not to use this tool or mention of alternatives, which would be helpful given the many sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

battle_cards_liveC

Read-only

Inspect

Fiche de combat live — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub vs McKinsey Lilli — Deal SaaS B2B €500k · Win rate +11 pts · 6 objections clés armées. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`ourOffer`	Yes
`competitor`	Yes
`dealContext`	Yes
`knownWeaknesses`	No

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint and openWorldHint, but description does not mention the async parameter present in schema, which is a significant behavioral trait. No disclosure of polling mechanism.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is relatively concise but includes a lengthy reference case that may not be essential and uses French terminology without translation, potentially confusing some users.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of 5 parameters including nested objects and no output schema, the description is incomplete: no explanation of return format, no mention of async behavior, and insufficient parameter guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema description coverage at 20%, the description should add meaning for parameters, but it only says 'send the documented case fields' without explaining any parameter details, such as the structure of competitor or dealContext objects.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states it returns a structured, audited deliverable and mentions a reference case, but the concept of 'battle card' is not clearly defined, and it does not distinguish from sibling competitor tools like competitive_deep_dive or competitor_profiles.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives. Does not mention any conditions or contexts that would make it preferable to sibling tools like competitor_intel.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

battle_planC

Read-only

Inspect

Plan de bataille marketing — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Q3 2026 · Budget €120k · Pipeline €800k · 5 chantiers prioritaires. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`quarter`	Yes
`teamSize`	Yes
`arrTarget`	Yes
`budgetEur`	Yes
`arrCurrent`	Yes
`companyName`	Yes
`topChannels`	Yes
`icpDescription`	Yes
`currentBlockers`	Yes
`primaryObjective`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, so the read-only nature is clear. The description adds that inputs are validated server-side, a minor behavioral detail. It does not disclose async behavior (despite the 'async' parameter), error handling, or any rate limits. Overall, the description adds little beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief (two sentences) and to the point, conveying the core purpose and a reference case. It avoids unnecessary words. However, it could be slightly more structured to include key aspects like output summary or parameter hints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 11 inputs and no output schema, the description is insufficient. It does not explain what the 'structured, audited deliverable' contains, how to interpret the result, or any dependencies between inputs. The reference case provides a concrete example but does not generalize. The agent likely needs more context to use this tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 9%, with only the 'async' parameter having a description. The other 10 required parameters have no descriptions in the schema. The tool description does not explain any of these parameters, relying solely on their names (e.g., companyName, arrCurrent). While parameter names are somewhat informative, the agent lacks guidance on formats, constraints (e.g., quarter format), or how to fill them correctly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The name and description clearly indicate it generates a marketing battle plan ('Plan de bataille marketing'). The phrase 'Returns a structured, audited deliverable' reinforces the purpose. However, it lacks explicit differentiation from sibling marketing tools like 'brand_builder' or 'positioning_strategist', and the deliverable's exact nature is vague.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance is given on when to use this tool vs alternatives. The reference case (Gapup Hub) provides a concrete scenario but does not explain under what circumstances this tool is appropriate or preferred over similar tools. 'Inputs are validated server-side' implies no manual validation needed, but no when-not-to-use information.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bias_amplification_trackerA

Read-onlyIdempotent

Inspect

Tracks bias amplification in LLM outputs by analyzing fairness metrics from HuggingFace's model leaderboard. Designed for risk assessment personas to detect and quantify demographic, gender, or racial bias amplification in generated text. Accepts model identifiers or output samples, returns structured bias metrics and amplification trends.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`modelId`	No	HuggingFace model identifier (e.g., 'facebook/opt-1.3b')
`outputSamples`	No	Array of LLM output strings to analyze for bias amplification
`demographicGroups`	No	Specific demographic groups to monitor (e.g., ['gender', 'race'])

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`biasMetrics`	No

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, openWorldHint. The description adds that it returns 'structured bias metrics and amplification trends,' which is consistent but does not elaborate on behavior like error handling, result update cadence, or output schema details. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences front-load key purpose and user context. No fluff, but could be slightly more structured (e.g., list inputs/outputs). Still efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's 4 optional params, output schema existence, and rich annotations, the description is mostly complete. It covers purpose, inputs, and output type. Missing are prerequisites (e.g., need for HuggingFace access) and explicit guidance on when to prefer this over similar bias/drift tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 4 parameters have schema descriptions (100% coverage). The description mentions 'model identifiers or output samples' and 'demographic groups to monitor' but adds no new meaning beyond the schema, e.g., no parameter relationships, default values, or format hints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action ('Tracks bias amplification') and resource ('LLM outputs') and specifies it analyzes fairness metrics from HuggingFace. It identifies the target user ('risk assessment personas') and differentiates from siblings like adversarial_input_stress_tester or model_safety_certification_checker by focusing on demographic/gender/racial bias amplification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context about when to use (for risk assessment, detecting bias in generated text) and inputs (model identifiers or output samples), but does not explicitly state when not to use or mention alternative tools for related concerns like overall model safety (sibling model_safety_certification_checker).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bond_covenant_esg_compliance_checkerA

Read-onlyIdempotent

Inspect

As a CFO, quickly assess whether your bond covenants meet ESG compliance standards set by BIS and ECB. This tool analyzes covenant text against regulatory benchmarks, identifying potential ESG-related risks in carbon emissions, governance practices, and social impact clauses. Input bond covenant details and receive structured compliance insights with source references. Ideal for pre-issuance due diligence or ongoing monitoring of existing bond portfolios.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`couponType`	No	Type of bond coupon
`covenantText`	Yes	Full text of the bond covenant to analyze
`issuerSector`	No	Industry sector of the bond issuer (e.g., energy, finance)
`jurisdiction`	No	Legal jurisdiction governing the bond (e.g., EU, US)
`maturityDate`	No	Maturity date of the bond

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`riskAreas`	No
`complianceScore`	No
`recommendations`	No

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, and idempotentHint. The description adds minimal behavioral context (e.g., 'quickly assess', 'source references') beyond what annotations convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences with a clear structure: purpose, method, output, use cases. Every sentence adds value, though it could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given output schema exists, the description adequately covers the tool's purpose and output ('structured compliance insights with source references'). It does not address edge cases like async behavior, but annotations and schema cover basic behavioral traits.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description implies the main input (covenantText) but does not elaborate on other parameters like couponType, issuerSector, or maturityDate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('assess') and resource ('bond covenants') with a clear ESG compliance focus. It distinguishes from siblings like 'bond_covenant_monitor' by specifying regulatory benchmarks from BIS and ECB.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states ideal usage contexts ('pre-issuance due diligence or ongoing monitoring') but does not explicitly exclude other contexts or mention alternatives like 'bond_covenant_monitor' for non-ESG covenant monitoring.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bond_covenant_monitorA

Read-onlyIdempotent

Inspect

As a CFO, monitor bond covenant compliance by analyzing leverage ratios (debt-to-equity, debt-to-EBITDA) and interest coverage ratios using real-time financial data. Input a company's ticker symbol and optional covenant thresholds to receive compliance status, key financial metrics, and SEC filing references. Ideal for proactive debt management and regulatory compliance tracking. Keywords: bond covenants, leverage ratio, interest coverage, debt compliance, SEC filings, financial health.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`ticker`	Yes	Company ticker symbol (e.g., 'AAPL')
`covenantThresholds`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`warnings`	Yes
`debtToEquity`	No
`leverageRatio`	No
`lastFilingDate`	No
`complianceStatus`	Yes
`interestCoverage`	No
`nextFilingDeadline`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, open-world, and idempotent behavior. The description adds that it analyzes real-time data and returns compliance status, metrics, and SEC filing references, consistent with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences plus keywords; all are informative and front-loaded with the core action. Could be slightly shorter but no unnecessary fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with nested parameters and output schema, the description adequately covers inputs, outputs, and use-case context. The presence of an output schema reduces the need to detail return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is high (67%), and the description adds meaning by explaining the role of covenant thresholds and the output (compliance status, metrics, SEC references), beyond the schema's parameter names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool monitors bond covenant compliance using specific ratios (debt-to-equity, debt-to-EBITDA, interest coverage). It distinguishes itself from sibling tools by focusing on covenant compliance rather than general financial analysis or SEC filing lookup.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explicitly targets CFOs for proactive debt management and regulatory compliance tracking. While it doesn't list when not to use or alternative tools, the context among siblings implies it's the go-to for covenant monitoring.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bp_narratifC

Read-only

Inspect

Business Plan narratif — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Stripe Series A 2012. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`raise`	Yes
`company`	Yes
`keyMetrics`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds context about server-side validation and the audited nature of the deliverable, but does not elaborate on handling of inputs or potential side effects beyond what annotations imply.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but not optimally front-loaded. The first sentence includes branding ('Gapup agent-payable C-suite expertise (CFO)') which is less actionable than the return characteristic. The reference case and validation note are useful but could be more tightly integrated.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with nested parameters and no output schema, the description is insufficient. It does not describe the return format beyond 'structured, audited deliverable,' nor does it compensate for low schema coverage by explaining key parameter purposes or providing examples.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 25%, yet the description adds no meaning to parameters beyond a vague reference to 'documented case fields.' It does not explain the required fields (company, raise, keyMetrics) or their structure, leaving the agent reliant on the incomplete schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool returns a structured, audited business plan narrative, with a reference to Stripe Series A 2012. It identifies the tool as a CFO-level expertise. However, it does not explicitly distinguish from sibling tools like ftg_business_plan, which may also generate business plans.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for CFO-level business planning but provides no explicit guidance on when to use this tool versus alternatives. There are no examples of when not to use it or mention of alternative tools, leaving the agent to infer the scope.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

brand_builderC

Read-only

Inspect

Architecte de marque — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Pennylane — brand identity SaaS fintech B2B FR/EU (2023). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`brand`	Yes
`target`	Yes
`founder`	Yes
`existingAssets`	No

Tool Definition Quality

C2.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true, so the description's claim of returning a deliverable aligns. The description adds minimal behavioral context beyond 'inputs validated server-side' and does not address what happens during processing (e.g., external data access despite openWorldHint). More detail on side effects (none) and execution characteristics would help.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (3 sentences) and uses a mix of French and English. While concise, it sacrifices clarity by not explaining the core function upfront. The structure is front-loaded with the French title but lacks a clear English summary of the tool's purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 5 parameters, nested objects, no output schema, and high complexity, the description fails to provide sufficient context for an AI agent to use it correctly. No information is given about the deliverable's structure, typical use cases, or how to format the input fields beyond what the schema shows.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, yet the description omits any explanation of parameters like 'founder', 'brand', 'target', or 'existingAssets'. It merely says 'send the documented case fields' without elaborating on how to structure these complex nested objects. This forces the agent to rely solely on the sparse schema, risking incorrect invocations.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description identifies the tool as 'Architecte de marque' and states it returns a structured, audited deliverable, with a reference case (Pennylane). However, it does not explicitly define what the tool does—e.g., designing a brand identity—leaving the purpose somewhat ambiguous. A more direct statement of the output (e.g., brand identity framework, recommendations) would improve clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a reference case but no guidance on when to use this tool versus siblings like 'brand_equity_voice_share_calculator' or 'positioning_strategist'. No context is given for prerequisites, alternatives, or scenarios where this tool is inappropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

brand_equity_voice_share_calculatorA

Read-onlyIdempotent

Inspect

Calculates brand equity voice share for CMOs by analyzing mentions across 500K+ news articles and forums from Common Crawl and Wayback Machine. Inputs include brand name, competitors, and time range. Outputs voice share percentage, sentiment distribution, and top sources. Ideal for competitive benchmarking and brand visibility tracking. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`brand`	Yes
`time_range`	Yes
`competitors`	No
`include_forums`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`top_sources`	No
`total_mentions`	No
`brand_voice_share`	No
`sentiment_distribution`	No
`competitor_voice_shares`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds context about data sources and async behavior. Annotations already declare readOnlyHint, openWorldHint, and idempotentHint, which the description does not contradict. The added behavioral info (async usage, data sources) is valuable.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences, front-loaded with the core purpose and key inputs. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With output schema present, the description mentions key outputs (voice share percentage, sentiment distribution, top sources) and async behavior. It could be more complete about the time_range format but is adequate for the given context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 20%, so the description should compensate. It lists brand, competitors, and time range but does not detail their schema (e.g., time_range object with start/end). The description adds some meaning but misses nested details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates brand equity voice share by analyzing mentions across news articles and forums from specific sources (Common Crawl and Wayback Machine). It distinguishes itself from siblings with a specific verb and resource.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It mentions it's ideal for competitive benchmarking and brand visibility tracking, and hints at using async to avoid timeout. However, it does not explicitly exclude alternative tools or provide when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

budget_variance_aiC

Read-only

Inspect

Analyse d'écart budgétaire — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Answers: Explain the key drivers of the budget vs actual variance for in — what are the top 10 narrative explanations? · Which cost categories drove the budget overrun for in , and what corrective actions should management take? · Revise the Q4 forecast based on observed Q3 variances for — give me 3 scenarios (base, optimistic, conservative). · Prepare a board-ready budget variance memo for — , budget €M vs actual €M, with management actions. · What are the quick wins to reduce budget overspend for by end of quarter without impacting growth targets? Reference case: Doctolib Q3 2026 — budget €38.5M vs actual €41.2M (+7.0%) — cloud + headcount + deals timing. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`entity`	Yes
`budgetContext`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and openWorldHint=true, so the description does not need to reiterate safety. The description adds that inputs are validated server-side and mentions the async parameter behavior. However, it lacks details on potential limitations (e.g., rate limits, data freshness) or what exactly the 'audited deliverable' contains.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description front-loads the purpose effectively but becomes lengthy with a list of example questions. While the examples are helpful, many could be removed or condensed to improve conciseness. The structure is clear but not as tight as it could be.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of nested objects and low schema coverage, the description leaves gaps. It does not explain what the output deliverable looks like, how to interpret results, or whether results are cached. The mention of a reference case ('Doctolib Q3 2026') provides some context but is insufficient for complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 25% schema description coverage, the description does not compensate. It does not explain the parameters 'entity' or 'budgetContext' beyond what is in the example questions. The example questions imply fields like company name and period, but the structure of 'budgetContext' with categories, amounts, and variance percentages is not clarified.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool analyzes budget variance and returns a structured, audited deliverable aimed at CFO-level expertise. It lists specific example questions that illustrate its purpose, making it distinct from sibling tools like financial model builders or cash flow analyzers.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides example questions but does not explicitly state when to use this tool over alternatives, nor does it mention when not to use it. There is no guidance on prerequisites or scenarios where the tool might be inappropriate, such as needing real-time data or non-financial inputs.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

candidate_screening_rankingA

Read-onlyIdempotent

Inspect

AI-powered candidate screening and ranking for recruiters, hiring managers, ATS providers and recruitment AI agents. Ingests a job description and 1-50 candidate resumes, returning a ranked shortlist with score breakdowns across five weighted criteria: skills_match (tech stack and soft skills extracted from JD vs resume), experience_match (years vs seniority level inferred from JD), education_match (degree level + top-school detection), role_progression (Junior to Senior to Lead patterns), culture_fit_estimate (remote/hybrid, startup vs enterprise). Per candidate: overall_score 0-100, matched/missing skills, red_flags (job hopping, employment gaps, seniority mismatch), green_flags (long tenure, promotions), 3-5 interview questions, fit_summary. Diversity signals are first-name proxies ONLY with mandatory ethical WARNING. All processing is local -- no external API calls, instant response, privacy-preserving.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`candidates`	Yes	Array of candidate objects. Maximum 50.
`role_country`	No	Optional ISO 2-letter country code for regional context (informational).
`job_description`	Yes	Full text or summary of the job description and role requirements.
`criteria_weights`	No	Optional weighting per criterion. Default: skills=0.4, experience=0.2, education=0.1, progression=0.15, culture=0.15.

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`nice_to_have`	Yes
`quality_score`	Yes
`required_skills`	Yes
`candidates_ranked`	Yes
`diversity_signals`	No
`shortlist_recommended`	Yes
`job_description_summary`	Yes

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes far beyond annotations (readOnlyHint=true, idempotentHint=true, destructiveHint=false) by detailing local processing, no external API calls, instant response, criteria weights, diversity signals with ethical warning, red/green flags, and interview questions. It is fully transparent with no contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is dense but well-structured, front-loading purpose, then listing criteria, output details, and warnings. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, nested objects, output schema exists), the description covers all aspects: input requirements, processing behavior, criteria explanation, output fields (including red flags, green flags, interview questions), diversity signal warning, and privacy guarantees. It is fully complete for an agent's decision-making.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 5 parameters clearly. The description adds context about criteria weights and output fields but does not significantly enhance parameter understanding beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is for AI-powered candidate screening and ranking, specifying input (JD + 1-50 resumes) and output (ranked shortlist with score breakdowns across five weighted criteria). It distinguishes from sibling tools like talent_intelligence or recruiting_architect by detailing the exact output and criteria.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It identifies the target users (recruiters, hiring managers, ATS providers, recruitment AI agents) and mentions local processing and instant response, which suggests when to use it (privacy-sensitive, no external calls). However, it does not explicitly state when not to use or name alternatives, leaving some inference to the agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

capacity_planningC

Read-only

Inspect

Planification capacitaire — Gapup agent-payable C-suite expertise (CHRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — 22→48 FTE en 12m · ARR €480k→€1.7M · Plan d'embauches par département. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`benchmarks`	No
`financials`	Yes
`constraints`	No
`currentTeam`	Yes
`hiringBudgetEur`	No

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds that inputs are validated server-side and returns an audited deliverable, but does not contradict annotations. However, it does not significantly extend behavioral understanding beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short but includes a reference case that adds context, though it could be more concise. The structure is reasonable but the French phrase may disrupt clarity. It could be trimmed without losing essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested objects, no output schema), the description lacks information about the deliverable's structure, how to use the async parameter, and what response to expect. It is insufficient for an agent to fully understand the tool's capabilities and output.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 14% (only 'async' parameter documented in schema). The description adds no explanations for the 7 parameters, including required ones like company, financials, and currentTeam. With low coverage, the description should compensate but fails to do so.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states this is a capacity planning tool for CHRO-level expertise, returning a structured deliverable. The reference case provides concrete context. However, it uses French ('Planification capacitaire') which may not be universally understood, and the domain (Gapup agent-payable) is niche, slightly reducing clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is given on when to use this tool versus the many siblings. There is no mention of prerequisites, exclusions, or alternatives. The description only says 'send the documented case fields' without differentiating this tool from similar planning or HR tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

capital_strategyC

Read-only

Inspect

Stratégie de financement — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: Alan assurance santé SaaS — séquence Seed→A→B→C (2016-2022). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`growthPlan`	Yes
`financialPosition`	Yes
`founderConstraints`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and returns a deliverable, but does not detail behavioral aspects like processing time or side effects. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes jargon and French text. It is front-loaded with a title but contains a reference case that may not be essential. Could be more concise for its length.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, nested objects, no output schema), the description is incomplete. It does not explain what the deliverable contains, how to interpret results, or any prerequisites for use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (one parameter described). The description says 'send the documented case fields' without explaining individual parameters or their meanings. It adds minimal value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states 'Stratégie de financement' (financing strategy) and mentions returning a structured, audited deliverable, which provides a clear purpose. However, it does not differentiate from sibling tools like cap_table_strategist or funding_hunter, lacking explicit distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, nor any conditions for usage. It only gives a reference case but no explicit context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cap_table_strategistC

Read-only

Inspect

Stratège du cap table — Gapup agent-payable C-suite expertise (FUNDRAISING). Returns a structured, audited deliverable. Reference case: Aleph AI Series B — modèle dilution multi-rounds + simulations secondaires + hygiène equity · 5 scenarios. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`plannedRounds`	Yes
`currentCapTable`	Yes
`founderObjectives`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true. The description adds that it returns a structured deliverable, but does not clarify async behavior (despite an async parameter) or any other behavioral traits. The mention of server-side validation is useful but insufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes a specific reference case that may not be universally helpful. It could be more concise and front-load the core action. The mix of French and English reduces clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested objects, 6 parameters) and lack of output schema, the description fails to provide complete context. It does not explain what the deliverable contains, how results are retrieved (async vs sync), or how to interpret outputs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, yet the description provides no additional meaning for the parameters. It merely says 'send the documented case fields', which does not explain the role of each field. The nested objects are left entirely undocumented.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a cap table strategist for fundraising, and mentions it returns a structured, audited deliverable. The reference case adds concrete context, but the purpose is somewhat implied rather than explicitly stated as a verb-action.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description mentions fundraising context but does not specify when not to use or recommend other tools. Among many sibling tools, no differentiation is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

carbon_footprint_calculatorA

Read-onlyIdempotent

Inspect

Calculate a company's greenhouse-gas footprint under the GHG Protocol (Scope 1 + 2 + 3, in tCO2eq, tier-2 accuracy ±20%). Returns the emissions breakdown, hotspot identification, 5-8 reduction levers each with capex and payback, an SBTi-aligned reduction trajectory over 5-25 years, the 15 Scope-3 categories in detail, and CSRD/ESRS reporting readiness. When to use this tool: the user needs a carbon assessment for CSRD compliance pre-audit, green-finance access, or supplier ESG scorecards. Inputs: the company profile and its activity data. Delivered by Émilie, the AI Sustainability lead of the Gapup portfolio.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`perimeter`	Yes
`scope1Sources`	No
`scope2Sources`	Yes
`reductionTargets`	No
`scope3Activities`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`kpis`	No	3-5 headline ESG KPI bubbles
`hotspots`	Yes	Top emission sources ranked by contribution
`breakdown`	Yes	Emissions breakdown by scope
`csrdReadiness`	Yes	CSRD/ESRS reporting readiness assessment
`sbtiTrajectory`	No	SBTi-aligned annual reduction trajectory
`reductionLevers`	Yes	5-8 actionable reduction levers with financial analysis
`executiveSummary`	Yes	Board-ready GHG assessment prose
`scope3Categories`	No	GHG Protocol 15 Scope-3 categories detail
`totalEmissionsTco2eq`	Yes	Total GHG footprint in tCO2eq (Scope 1+2+3 combined, ±20% tier-2 accuracy)

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, and destructiveHint=false, making the tool non-mutating and safe. The description adds behavioral context: accuracy ±20%, tier-2, and returns breakdown, hotspot, levers, trajectory, and CSRD readiness. It also mentions that if async=true, it returns a job_id. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively long but well-structured, front-loading the core function and deliverables, then providing usage guidance and inputs. Every sentence adds value without redundancy. However, it could be slightly more concise by removing the 'Delivered by Émilie' line, which is not essential for tool selection.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 parameters, nested objects, output schema), the description is remarkably complete. It details all major outputs (breakdown, hotspot, levers, trajectory, scope-3 categories, CSRD readiness) and explains when to use it. The existence of an output schema reduces the burden, but the description still adds value. Only minor gap: no mention of the specific parameters beyond vague 'activity data'.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema coverage is only 13%, meaning most parameters lack descriptions in the schema. The description adds only 'Inputs: the company profile and its activity data', which is too vague to compensate for the low coverage. While some nested fields have schema descriptions (e.g., async, category), the overall parameter meaning is not sufficiently elaborated.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it calculates a company's GHG footprint under the GHG Protocol, specifying Scope 1+2+3 with tier-2 accuracy. It lists specific outputs such as emissions breakdown, hotspot identification, reduction levers, trajectory, and CSRD readiness. This distinguishes it from sibling tools like carbon_roadmap or sustainability_report by its focus on detailed footprint calculation for compliance and financing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'When to use this tool: the user needs a carbon assessment for CSRD compliance pre-audit, green-finance access, or supplier ESG scorecards.' It also mentions inputs are company profile and activity data, providing clear when-to-use and context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

carbon_roadmapC

Read-only

Inspect

Roadmap carbone — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: Cas démo — Roadmap carbone. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`perimeter`	Yes
`scope1Sources`	No
`scope2Sources`	Yes
`reductionTargets`	No
`scope3Activities`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint and openWorldHint, covering safety and external data use. Description adds that inputs are validated server-side and returns a deliverable, but no detail on computation time, cost, or side effects beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is short with three sentences, but mixes French and English which may reduce clarity. Front-loaded with the core purpose. Could be more structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given complexity (8 params, nested objects, no output schema), the description is insufficient. It fails to describe the deliverable's format, how to interpret results, or any usage constraints. No guidance on required fields beyond implicit required list.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is very low (13%), and the description fails to compensate. It only says 'send the documented case fields' without explaining any of the 8 parameters, their structure, or relationships. No parameter descriptions are provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states it returns a structured, audited carbon roadmap deliverable for C-suite. Verb and resource are clear, but it doesn't explicitly distinguish from similar sibling tools like 'carbon_footprint_calculator' or 'sustainability_report'. Some differentiation implied by mention of C-suite expertise and audited deliverable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

References a demo case and implies use by C-suite, but no explicit when-to-use, when-not-to-use, or alternatives. No mention of when to prefer this over sibling tools like 'carbon_footprint_calculator'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

champion_mappingC

Read-only

Inspect

Cartographie du champion — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Spendesk × Decathlon (deal €120k/an) — Champion identifié : CFO Group · Plan 6 semaines multi-touch. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`deal`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`knownContacts`	Yes
`sellerContext`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. Description adds that inputs are validated server-side and returns a structured deliverable, which is consistent. However, no details on timing, output content, or side effects (none expected). Adds modest value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, with no wasted words. It includes a concrete reference case. However, it could be better structured (e.g., bullet points) and front-loads the purpose well.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, audited deliverable) and no output schema, the description is incomplete. It does not explain what the deliverable contains, how to interpret it, or any behavior for edge cases. The reference case is illustrative but insufficient for general use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (only 'async' has description). The tool description does not explain the parameters beyond a vague 'send the documented case fields'. It fails to add meaning for the complex nested parameters, leaving agents to infer from names and constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a structured, audited deliverable for champion mapping, but the verb is unclear (implies analysis/generation). The reference case helps but doesn't clearly specify the tool's primary action. It distinguishes from siblings by being about champions, but not precisely.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description mentions sending documented case fields but gives no context on prerequisites or when not to use. Sibling tools include other mapping tools, but no differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

change_failure_root_cause_classifierA

Read-onlyIdempotent

Inspect

Classifies root causes of change failures for CTO-level incident analysis. Uses GitHub PR metadata and Snyk vulnerability data to identify patterns like dependency vulnerabilities, configuration drift, or deployment process gaps. Inputs include GitHub PR URL or incident ID, and outputs structured root cause categories with confidence scores. Ideal for post-mortem analysis and change risk assessment.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`pr_url`	Yes
`incident_id`	No
`snyk_org_id`	No
`time_range_days`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`root_causes`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds behavioral traits by specifying data sources (GitHub PR metadata, Snyk vulnerability data) and output format (structured root cause categories with confidence scores). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (3 sentences), front-loaded with the main purpose, and structured logically. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 5 parameters (1 required) and an output schema, the description covers the core use case but omits details on optional parameters (async, snyk_org_id, time_range_days). This is a gap for comprehensive understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, so the description must compensate. It explains the primary inputs 'pr_url' and 'incident_id' but does not clarify 'async', 'snyk_org_id', or 'time_range_days'. Partial coverage, hence a 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's verb ('classifies'), resource ('root causes of change failures'), and scope ('CTO-level incident analysis'). It mentions specific data sources and outputs, distinguishing it from siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage context: 'Ideal for post-mortem analysis and change risk assessment'. It also mentions inputs, which helps the agent decide when to use it. However, no alternatives or exclusions are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

china_ecommerce_intelA

Read-only

Inspect

Chinese e-commerce intelligence for the ZH diaspora (50M+), import-export teams, brand IP enforcement, MENA/Africa entrepreneurs sourcing from China, and brand monitoring. Covers Taobao, Tmall, JD.com, Pinduoduo, 1688.com (B2B) and AliExpress (cross-border).

Five modes: • product_search — search products by keyword across CN platforms. Returns title ZH/EN, price CNY + USD estimate, sales 30d, rating, seller info, product URL. • seller_profile — full seller/supplier dossier: factory vs reseller detection, certifications (ISO, BSCI, CE), rating, years in business, main categories. • price_history — 12-month price trend for a product (live current price + seasonal model for CN shopping festivals: 11.11, 6.18, CNY). • brand_monitoring — detect counterfeits and grey market listings: price anomaly detection (>50% below MSRP = suspicious), counterfeit keyword scan, risk score 0-100. • market_intel — category overview: top 5 sellers by market share, avg/median price, volume estimate, price range.

Data quality note: LIVE data from Taobao/Tmall/JD/Pinduoduo REQUIRES AICI_RESEARCH_PROXY_URL with CN residential routing (Bright Data -country-cn). Without proxy: AliExpress (cross-border) + curated category fallback available.

Input formats for seller_profile: 'platform:id' e.g. 'aliexpress:123456', '1688:87654321', 'tmall:apple-store-official'. Input formats for price_history: AliExpress product URL or numeric product ID.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analysis mode. product_search=find products, seller_profile=supplier dossier, price_history=price trend, brand_monitoring=counterfeit detection, market_intel=category overview.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Keyword, product name, product_id, seller_id (platform:id), brand name, or category. Accepts Chinese characters (ZH) or English.
`region`	No	Market region. CN-domestic=full platform coverage, cross-border=AliExpress+1688 focus. Default: CN-domestic.
`platform`	No	Target platform. Default: all. Note: taobao/tmall/jd/pinduoduo require CN proxy.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`products`	No
`market_intel`	No
`platform_used`	Yes
`price_history`	No
`quality_score`	Yes
`seller_profile`	No
`brand_monitoring`	No

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false. The description adds behavioral details: live data requirement, proxy dependency, async mode (job_id), and data quality notes. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with purpose and well-structured into overview, mode list, and notes. While somewhat lengthy, every section adds necessary context. Could be slightly more concise, but appropriate given complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (five modes, multiple platforms, proxy requirements, output schema exists), the description covers all essential aspects: mode descriptions, input formats, data quality notes, and async behavior. No gaps identified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, but the description adds valuable context beyond schema: explains the five modes in detail, provides input format examples for seller_profile and price_history, and clarifies the async flag. This enriches the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides Chinese e-commerce intelligence and enumerates five distinct modes (product_search, seller_profile, price_history, brand_monitoring, market_intel), which sets it apart from siblings like 'china_market_data'. The verb+resource is specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description identifies target users (ZH diaspora, import-export teams, brand IP enforcement, etc.) and specifies when to use each mode. It also notes proxy requirements for some platforms. However, it does not explicitly compare to sibling tools or state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

china_market_dataA

Read-only

Inspect

Chinese capital market intelligence for the ZH diaspora (50M+) and institutional investors. Covers A-Shares (SSE/SZSE), H-Shares (HKEX), and ADRs across four modes:

• company — full company profile: name ZH/EN, USCC (18-digit social credit code), exchange, industry (CSRC classification), chairperson, registered capital, SOE flag • market_quote — real-time quote: price (CNY or HKD), change%, volume, market cap, P/E ratio, dividend yield, last update timestamp • sector_overview — sector snapshot: top 5 companies by market cap, avg P/E, 30-day sector index change. Supported sectors: semiconductor, ev, battery, technology, finance, energy, realestate, consumer, pharma, telecom • regulatory_filing — recent regulatory disclosures (HKEX filings: annual, quarterly, announcements, mergers, IPOs) with title, date, document URL

Input formats accepted: • 6-digit A-Share ticker (e.g. '600519' for Moutai SSE) • HKEX ticker (e.g. '0700.HK' or '700' for Tencent) • Company name in EN or ZH (e.g. '腾讯', 'Kweichow Moutai') • Sector keyword (e.g. 'semiconductor', '半导体')

Data sources: Yahoo Finance (primary, always accessible), Eastmoney push2 + CompanySurvey (via Bright Data proxy when AICI_RESEARCH_PROXY_URL is set), HKEX filing API. Note: Eastmoney/CSRC/SSE are blocked from datacenter IPs without proxy — set AICI_RESEARCH_PROXY_URL to unlock full coverage.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analysis mode. company=full profile, market_quote=price data, sector_overview=top 5 by sector, regulatory_filing=recent filings.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Ticker (6-digit A-share, 4-digit HK, Yahoo format), company name (ZH or EN), or sector keyword.
`exchange`	No	Exchange filter. Default: all. Affects sector_overview ticker selection.
`period_days`	No	Lookback period in days for regulatory filings. Default: 30.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`status`	Yes
`company`	No
`sources`	Yes
`market_quote`	No
`quality_score`	Yes
`sector_overview`	No
`regulatory_filings`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, openWorldHint), the description details data source dependencies (Yahoo Finance, Eastmoney via proxy) and async behavior (returns job_id when async=true). This adds valuable behavioral context not captured in annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections and bullet points, front-loading the core purpose. It is somewhat lengthy but efficient given the tool's complexity. Every sentence adds value; no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (four modes, multiple input formats, async option, proxy dependency), the description covers all essential aspects. An output schema exists, so return values need not be explained. Slight gaps in usage guidelines limit completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description adds substantial meaning beyond the schema: it explains each mode's output, acceptable input formats (ticker patterns, company names in ZH/EN, sector keywords), and default behavior (e.g., period_days default 30). Schema coverage is 100%, but the description enriches understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides Chinese capital market intelligence, specifies four modes (company, market_quote, sector_overview, regulatory_filing), and distinguishes it from siblings like india_market_data by geographic focus. The verb 'covers' and resource detail are specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context on when to use different modes and input formats, but does not explicitly guide when not to use this tool versus alternatives (e.g., india_market_data, sec_filing_decoder). No exclusion criteria or direct comparisons to other tools are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

churn_defenderC

Read-only

Inspect

Bouclier anti-churn — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Spendesk — portefeuille 400 clients PME/ETI, détection churn Q2 2025 (€8M ARR). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`accounts`	Yes
`csrContext`	No
`analysisWindowDays`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, so the tool is read-only. The description adds context about server-side validation and returning a structured deliverable, but does not disclose additional behaviors like authentication requirements or side effects. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences and a reference case. It is front-loaded with the tool's purpose. However, it could be more structured to separate purpose from usage notes.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of the input schema (nested objects, many fields) and no output schema, the description is incomplete. It does not explain what the deliverable contains or how the tool processes the input, leaving gaps for agent selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (20%) and the description does not explain parameter meanings beyond referencing 'documented case fields'. It adds minimal value over the schema, which has many fields without descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states 'Bouclier anti-churn' and 'Returns a structured, audited deliverable', clearly indicating it is an anti-churn analysis tool. The name 'churn_defender' reinforces this. However, it does not differentiate from sibling tools like 'renewal_optimizer' or 'save_plays' which may also deal with churn.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide guidance on when to use this tool versus alternatives. It mentions 'C-suite expertise (CRO)' but no explicit when/when-not conditions or comparisons with sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

climate_scenario_rcpA

Read-only

Inspect

Projections climatiques long terme par scénario IPCC (RCP AR5 + SSP AR6) pour toute localisation. Scénarios : RCP_4_5, RCP_8_5 (AR5), SSP1_2_6, SSP2_4_5, SSP3_7_0, SSP5_8_5 (AR6), ou 'all' (compare tous). Horizons : 2030–2100. Métriques : température (delta vs baseline 1990-2010, jours >35°C, nuits chaudes), précipitations (delta%, événements extrêmes, sécheresses), hausse du niveau de la mer (cm vs 2000), événements extrêmes (ouragans, inondations P100, sécheresses), indice incendie. Sorties : comparaison multi-scénarios, probabilité IPCC, signaux d'impact business par secteur. Sources : Open-Meteo CMIP6 (keyless), IPCC AR6 Atlas lookup, NOAA SLR projections. Usages : TCFD/CSRD physical risk, due diligence actifs long terme, assurance catastrophe, planification infrastructure. Cache 7j. SLA ≤20s.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`metrics`	No	Métriques à inclure. Défaut : toutes.
`location`	Yes	Localisation : {city, country?} ou {lat, lon}
`scenario`	Yes	Scénario IPCC. 'all' génère une comparaison multi-scénarios.
`horizon_year`	Yes	Année horizon de la projection (2030–2100)
`compare_baseline`	No	Comparer vs baseline 1990-2010 (défaut true)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`location`	Yes
`scenario`	Yes
`projections`	Yes
`horizon_year`	Yes
`quality_score`	Yes
`baseline_period`	No
`ipcc_likelihood_label`	Yes
`business_impact_signals`	Yes
`multi_scenario_comparison`	No

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations: async capability, cache duration (7j), SLA (≤20s), and details on multi-scenario comparison, probability, and business impact signals. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single dense paragraph covering all key aspects without redundancy. It could be slightly more structured with bullet points, but it is efficient and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple scenarios, metrics, a nested location object), the description is very complete. It explains output types, sources, and use cases, and notes that an output schema exists. The description fully compensates for the lack of separate output schema text.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds valuable context for parameters like 'async' (returns job_id) and 'metrics' (explains default), going beyond the schema enumerations.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides long-term climate projections by IPCC scenario for any location, listing scenarios, horizons, metrics, and outputs. It distinguishes itself from siblings like 'weather_climate_intel' by its specific scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists use cases (TCFD/CSRD physical risk, due diligence, etc.), giving clear context. However, it does not explicitly state when not to use or mention alternative tools, but the specificity makes usage clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

clinical_evidence_brieferB

Read-only

Inspect

Brief évidence clinique (GRADE) — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Answers: Review the clinical evidence for <drug/intervention> in — GRADE rating, key trials, safety signals. · Scan safety signals for in — adverse events, severity, frequency from FAERS and trial data. · Assess comparative effectiveness of versus for — what does the evidence show? · Is there evidence supporting drug repurposing of for — existing trials and GRADE quality? · What are the evidence gaps for in before formulary adoption? Reference case: Semaglutide 2.4mg · Chronic weight management in non-diabetic adults · GRADE high efficacy · studies found · nausea/GI signals · FDA approved · PubMed+ClinicalTrials+OpenFDA. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`topic`	Yes
`max_studies`	Yes
`intervention`	No
`evidence_focus`	Yes		all
`target_disease`	No
`date_range_years`	Yes
`intervention_type`	No

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint and openWorldHint. The description mentions it returns a structured, audited deliverable and that inputs are validated server-side, but omits the async behavior parameter documented in the input schema. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a solid paragraph with a bulleted list of examples, but contains jargon ('Gapup agent-payable C-suite expertise (RISK)') and could be more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 8 parameters, no output schema, and missing async behavior explanation, the description is not complete enough for an agent to reliably use the tool. It does not detail the return format beyond 'structured, audited deliverable'.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%, and the description only partially compensates by mentioning parameters like topic, intervention, and target_disease in examples, but does not explicitly describe all parameters, leaving many (e.g., max_studies, date_range_years) unexplained.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it briefs clinical evidence with GRADE ratings, key trials, and safety signals, and provides example queries. However, it does not explicitly differentiate from siblings like clinical_pharma_intel.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides example questions and a reference case, giving context for use. But it lacks explicit guidance on when not to use or alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

clinical_pharma_intelA

Read-only

Inspect

Clinical and pharmaceutical intelligence for biotech analysts, healthcare fund managers, pharma BD teams, catalyst-driven hedge funds and health journalists. Aggregates live data across five modes: • trials — active/completed clinical trials (ClinicalTrials.gov v2 + EU CTR in parallel, 450k+ records) • pipeline — full pipeline by sponsor: trial count by phase + top indications • approvals — FDA drug label approvals + mechanism of action (OpenFDA) • recalls — FDA enforcement recalls classified by severity (Class I/II/III) • adverse_events — FAERS aggregated reactions: top 10 reactions + serious%

Signal detection (P0/P1/P2): P0 if Class I recall OR trial terminated for safety reason P1 if serious adverse events >30% OR ≥3 recalls in 12 months P2 otherwise (standard monitoring)

All sources are public and keyless. Optional env OPENFDA_API_KEY raises daily quota from 1,000 to 120,000 requests. SLA: ≤16s p95 (parallel fetch, 8s budget per source). Cache: 6h trials, 24h approvals, 12h recalls, 6h adverse events.

ParametersJSON Schema

Name	Required	Description
`mode`	No	Analysis mode. Default "trials". trials=clinical trials, pipeline=sponsor overview, approvals=FDA approvals, recalls=enforcement, adverse_events=FAERS
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`phase`	No	Filter trials by phase (1/2/3/4/NA). Only applies to modes trials and pipeline.
`query`	Yes	Drug name, indication, sponsor or molecule (e.g. "atezolizumab", "metastatic NSCLC", "Roche", "semaglutide")
`country`	No	ISO 2-letter country code to filter trial sites (e.g. US, FR, DE).
`max_results`	No	Maximum number of results to return. Default 20.
`status_filter`	No	Filter trials by status. Only applies to modes trials and pipeline.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`status`	Yes
`trials`	No
`recalls`	No
`signals`	Yes
`sources`	Yes
`pipeline`	No
`approvals`	No
`quality_score`	Yes
`adverse_events`	No

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, destructiveHint=false, and openWorldHint=true. The description adds significant value by disclosing source public access, optional API key quota increase, SLA of ≤16s p95, and cache durations per mode. It also explains signal detection logic (P0/P1/P2), exceeding annotation requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with bullet points and sections (modes, signal detection, SLA, caching). It is concise despite its length, with every sentence providing meaningful information. No redundant statements.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, multiple modes, signal detection, caching, SLA), the description covers all critical aspects. An output schema exists, so the description does not need to detail return values. It is fully complete for an AI agent to understand and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already provides 100% coverage with detailed descriptions for all 7 parameters. The description adds extra context beyond the schema, such as the signal detection logic and what each mode returns, but does not add new parameter-specific details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool aggregates clinical and pharmaceutical intelligence across five specific modes (trials, pipeline, approvals, recalls, adverse_events). It uses specific verbs like 'aggregates' and identifies target users (biotech analysts, fund managers, etc.). While there is a sibling 'clinical_evidence_briefer', this description uniquely differentiates its scope and capabilities.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists target users and explains each mode's purpose, providing context for when to use each. However, it does not explicitly state when not to use this tool or compare it to alternatives like clinical_evidence_briefer, leaving some ambiguity for agents.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cloud_cost_ri_optimizerA

Read-onlyIdempotent

Inspect

Analyzes AWS and Azure cloud pricing data alongside RIPE regional demand trends to generate Reserved Instance purchase recommendations for CTOs. Inputs include target cloud provider, instance family, region, and desired commitment term. Outputs include cost savings percentage, optimal RI quantity, and regional demand insights. Ideal for reducing cloud spend with data-driven decisions. Keywords: cloud cost optimization, reserved instances, AWS pricing, Azure pricing, RIPE demand trends.

ParametersJSON Schema

Name	Required	Description
`term`	No
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	Yes
`utilization`	No
`cloud_provider`	Yes
`instance_family`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`ri_cost`	No
`sources`	No
`warnings`	No
`on_demand_cost`	No
`break_even_months`	No
`regional_demand_score`	No
`cost_savings_percentage`	No
`recommended_ri_quantity`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent behavior. The description adds context about using RIPE trends and outputs like cost savings and demand insights, enhancing transparency without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with four sentences, front-loading the main purpose. Every sentence adds value, and there is no unnecessary repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of annotations and an output schema, the description adequately covers tool behavior and key outputs. It mentions output fields, though it misses details on optional parameters like async and utilization.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 17% schema description coverage, the description lists key parameters (cloud provider, instance family, region, term) but adds little detail beyond enumeration. It does not fully compensate for the low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it analyzes AWS/Azure pricing and RIPE trends to generate Reserved Instance purchase recommendations for CTOs. It uses specific verbs and resources, and distinguishes itself from sibling tools by focusing on a unique niche.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for reducing cloud spend but does not explicitly state when to use versus alternatives or when not to use. No exclusions or alternative tools are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

code_review_depth_optimizerA

Read-onlyIdempotent

Inspect

As a CTO, this tool analyzes your team's historical DORA metrics (deployment frequency, lead time, MTTR, change failure rate) and GitHub pull request data to recommend an optimal code review depth. Input your repository identifier and time range, and receive a structured recommendation on review rigor (light, standard, thorough) with supporting metrics and risk-adjusted rationale.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`teamSize`	No	Number of active developers in the team
`repository`	Yes	GitHub repository identifier in format owner/repo
`riskTolerance`	No	Organization's risk tolerance level
`timeRangeDays`	Yes	Number of days of historical data to analyze

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`recommendation`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, and openWorldHint. The description adds behavioral context by mentioning the output structure (review rigor with rationale) and use of historical DORA metrics. It does not describe latency or async behavior beyond schema, but annotations reduce the burden.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first sets context and action, second details inputs and outputs. Every sentence is informative with no redundancy, front-loading the key purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the 5 parameters, existing output schema, and annotations, the description covers the high-level purpose, inputs, and output type. It mentions DORA metrics and pull request data. A minor gap: does not mention the async parameter's behavior beyond the schema, but overall sufficient for tool selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds limited extra meaning beyond the schema: hints at riskTolerance influencing 'risk-adjusted rationale' but does not elaborate on teamSize or riskTolerance effects. Parameters are clearly listed, but no new semantic depth.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes DORA metrics and GitHub PR data to recommend optimal code review depth (light, standard, thorough), distinguishing it from siblings like dora_metrics_deep_dive or tool_recommend. The verb 'analyzes' and resource 'code review depth' are specific and actionable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description frames the tool for CTOs analyzing team metrics and specifies required inputs (repository, time range). However, it does not explicitly state when not to use it or mention alternatives like dora_metrics_deep_dive for raw metric access.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

comp_benchmark_geo_deltaA

Read-onlyIdempotent

Inspect

Compares local compensation benchmarks against HQ standards for CHROs, adjusting for cost-of-living and tax differentials. Inputs include job role, local and HQ locations, and salary range. Outputs include adjusted benchmark delta, cost-of-living multiplier, and tax impact. Keywords: compensation benchmark, geographic pay equity, cost-of-living adjustment, tax differential analysis.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`jobRole`	Yes	Standardized job role (e.g., 'Software Engineer III')
`currency`	No	ISO 4217 currency code (e.g., 'USD')
`baseSalary`	No	Current base salary in local currency
`hqLocation`	Yes	HQ location (ISO 3166-2 code or city, country)
`localLocation`	Yes	Local work location (ISO 3166-2 code or city, country)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`taxImpact`	No	Estimated tax differential percentage
`adjustedSalary`	No	Salary adjusted for cost-of-living and taxes
`benchmarkDelta`	No	Percentage difference between local and HQ benchmark
`confidenceScore`	No	0-1 confidence in data quality
`costOfLivingMultiplier`	No	Local cost-of-living index relative to HQ

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare the tool as read-only, idempotent, and open-world. The description adds specific behavioral details: it adjusts for cost-of-living and tax differentials, and outputs a delta, multiplier, and tax impact. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences plus keywords, with the core purpose in the first sentence. It is concise, avoids redundancy, and every sentence contributes useful information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema, the description does not need to explain return values. It adequately covers the purpose, input categories, and expected outputs. Minor gap: no mention of potential limitations or prerequisites, but overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so all parameters are documented in the schema. The tool description only reiterates the categories of inputs (job role, locations, salary range) without adding new semantic meaning beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: 'Compares local compensation benchmarks against HQ standards for CHROs, adjusting for cost-of-living and tax differentials.' It uses a specific verb and resource, and includes inputs and outputs, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the target audience (CHROs) and context (geographic compensation comparison) but does not explicitly state when to use this tool versus alternatives like comp_plan_architect. No exclusions or when-not-to-use guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitive_deep_diveA

Read-only

Inspect

Gold-standard competitive deep dive — STRUCTURED multi-source data (no LLM narrative). Pair tool: competitor_intel for LLM-narrated board briefing + slide script. Aggregates Wikipedia, Yahoo Finance, SEC EDGAR, Wayback Machine, DuckDuckGo, HackerNews, domain scraping — all keyless. Returns agent-shaped JSON: KPIs (funding, employees, revenue, market cap), P0/P1/P2 competitive signals, pricing radar, competitor comparison matrix, Wayback timeline, positioning (sector/industry/icp_hypothesis/moat_signals), quality score. Every field is sourced or marked unavailable — no hallucinated figures. SLA: p50 ~25s, p95 ~30s · score 80+ on listed targets (US/EU/foreign) · score ~40 on private companies (no EDGAR/Yahoo data). Use sync for batch agents (≤30s tolerance). Use competitive_deep_dive_async + competitive_deep_dive_result(job_id) for conversational agents. Inputs: company name or domain (required), optional competitor list (≤5), optional depth (easy/medium/hard).

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`depth`	No	Research depth: 'easy' = Wikipedia + DDG (fast, ~15s); 'medium' = + Yahoo Finance + EDGAR + Wayback (default, ~45s); 'hard' = + HackerNews + domain surfaces + competitor deep dive (~120s)
`company`	Yes	Name or domain of the target company (e.g. 'Salesforce', 'notion.so', 'HubSpot CRM')
`competitors`	No	Optional list of competitor names or domains to include in the comparison matrix (max 5)

Output Schema

ParametersJSON Schema

Name	Required	Description
`kpis`	Yes	Key Performance Indicators sourced from public data
`company`	Yes
`quality`	Yes
`signals`	Yes	Competitive intelligence signals, severity-ranked P0 (critical) to P2 (informational)
`sources`	Yes
`comparison`	Yes	Feature/dimension comparison between target and each competitor
`depth_used`	Yes
`positioning`	Yes	Positioning analysis derived from public data
`generated_at`	Yes
`pricing_radar`	Yes	Pricing tiers extracted from public sources
`domain_resolved`	Yes
`wayback_timeline`	Yes	Historical snapshots of the company website from Wayback Machine

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the annotations (readOnlyHint, openWorldHint, destructiveHint), the description adds significant behavioral context: the tool aggregates from multiple keyless sources, returns agent-shaped JSON with sourced fields (no hallucination), specifies SLA times and quality scores for different target types, and describes the depth parameter's impact on sources and time. This fully discloses behavior and constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is dense and front-loaded with key information in the first sentence. However, it forms a single continuous paragraph that might benefit from slight structural separation (e.g., between usage guidance and output details). The length is justified by the amount of necessary context, but a touch more formatting could improve scannability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity—multiple data sources, optional parameters, synchronous vs asynchronous usage, and a rich output schema—the description covers all essential aspects: purpose, input details, output format (agent-shaped JSON with specific fields), SLA, quality expectations, and pair tools. No gaps are evident; it is fully complete for an agent to select and invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with every parameter documented in the schema. The description adds further meaning by explaining the implications of the depth parameter (e.g., 'easy' = Wikipedia + DDG, fast ~15s; 'medium' = + Yahoo Finance + EDGAR + Wayback, default ~45s), the required company field (name or domain), and the optional competitor list (max 5). This enriches the schema's raw descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a 'gold-standard competitive deep dive' that aggregates multi-source data and returns structured JSON with KPIs, signals, and a comparison matrix. It explicitly distinguishes itself from the sibling tool `competitor_intel` by noting the latter provides an 'LLM-narrated board briefing + slide script', thus differentiating their purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool versus alternatives: 'Use sync for batch agents (≤30s tolerance). Use `competitive_deep_dive_async` + `competitive_deep_dive_result(job_id)` for conversational agents.' It also identifies the pair tool `competitor_intel` for narrative versions and details SLA times and quality scores, fully addressing usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitive_deep_dive_asyncA

Read-only

Inspect

Async variant of competitive_deep_dive. Returns immediately (<200ms) with a job_id. The research runs in the background (p50≈25s, p95≈30s for depth=medium). Poll the result with competitive_deep_dive_result(job_id) after the eta_seconds hint. Use this instead of competitive_deep_dive when the agent cannot wait >15s for a response. Inputs: same as competitive_deep_dive — company (required), competitors (optional list, max 5), depth (easy/medium/hard, default medium). Async tool — register a webhook via webhooks_manage(register, url, [job.completed]) to receive callbacks instead of polling. Faster + lighter.

ParametersJSON Schema

Name	Required	Description
`depth`	No	Research depth: 'easy'≈15s, 'medium'≈30s (default), 'hard'≈60s
`company`	Yes	Name or domain of the target company (e.g. 'Salesforce', 'notion.so')
`competitors`	No	Optional list of competitor names or domains to include in the comparison matrix (max 5)

Output Schema

ParametersJSON Schema

Name	Required	Description
`job_id`	Yes	Unique job identifier — pass to competitive_deep_dive_result
`status`	Yes	Always 'queued' on submission
`eta_seconds`	Yes	Estimated seconds until result is ready
`submitted_at`	Yes	ISO-8601 submission timestamp

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses return timing (<200ms), background execution times (p50≈25s, p95≈30s for medium depth), and how to retrieve results via polling or webhooks. Annotations include readOnlyHint:true which is not contradicted as the tool initiates computation without modifying existing data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is tightly written with no wasted words. Each sentence adds value: purpose, timing, usage, parameter summary, and webhook option. Front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the existence of an output schema, the description covers all necessary aspects: return type, retrieval mechanism, timing, and alternative callback method. No gaps identified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description does not need to add much. It restates parameters but provides minimal extra context beyond the schema, such as default depth and max competitors. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is the async variant of competitive_deep_dive, returns a job_id, and runs research in background. It differentiates from the sync sibling and mentions the result retrieval method, making its purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use this instead of competitive_deep_dive when the agent cannot wait >15s for a response.' Also provides webhook registration as an alternative, giving clear when-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitive_deep_dive_resultA

Read-onlyIdempotent

Inspect

Poll the result of a competitive_deep_dive_async job. Returns status=pending while running, status=completed with the full report once done, status=failed on error, or status=not_found if the job_id is unknown or expired (TTL 24h). Call this after the eta_seconds hint returned by competitive_deep_dive_async.

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	Yes	The job_id returned by competitive_deep_dive_async

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, idempotentHint, destructiveHint; description adds status details and TTL expiration without contradicting any annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded purpose, every sentence adds value: first states action, second details outcomes and usage guidance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With output schema present, description fully explains polling behavior, status meanings, and TTL; no gaps for a simple poll tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

One parameter (job_id) has full schema coverage with description; description adds no new meaning beyond schema, meeting baseline for 100% coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it polls the result of a competitive_deep_dive_async job, listing specific statuses (pending, completed, failed, not_found) and TTL. It distinguishes from the async submitter sibling tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Call this after the eta_seconds hint returned by competitive_deep_dive_async', providing clear usage timing and context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitor_intelA

Read-onlyIdempotent

Inspect

LLM-narrated competitive-intelligence BRIEFING — for human consumption (board meeting, pitch prep). Pair tool: competitive_deep_dive for raw structured multi-source data (agent-shaped JSON). Returns: recent competitor moves with severity (critical/high/medium/low), prioritised signals, pricing-radar comparison, 3-6 quantified recommendations (impact in € or %, 7/30/90/180-day horizons), and an 8-12 slide presenter script. Use when the buyer wants a narrative briefing or a deck. Inputs: your company (name + one-paragraph pitch) + 1-10 competitors. Delivered by Manue, AI CMO of the Gapup portfolio.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No	Optional — what the buyer wants to track first (e.g. pricing moves, hiring patterns)
`competitors`	Yes	1-10 competitors to analyze
`selfCompany`	Yes	Your company info

Output Schema

ParametersJSON Schema

Name	Required	Description
`kpis`	No	3-5 headline KPI bubbles
`sources`	No	Cited sources
`pricingRadar`	No	Pricing comparison across competitors
`competitorMoves`	Yes	Recent moves per competitor with severity rating
`presenterScript`	Yes	8-12 slide board presenter script
`recommendations`	Yes	3-6 actionable strategic recommendations
`executiveSummary`	Yes	Board-ready prose summary (120-400 chars)

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and non-destructive. The description adds behavioral details: it returns a narrative briefing with specific outputs (competitor moves, severity, recommendations, presenter script) and mentions async parameter behavior (returns job_id immediately if async=true). This adds value beyond annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at about 4 sentences, front-loaded with core purpose and target audience, then lists outputs and inputs efficiently. Every sentence provides essential information without fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 4 parameters (2 required), nested objects, and an output schema (though not shown here), the description covers inputs and outputs adequately. It lists key output elements (moves, severity, recommendations, script). Could mention pricing-radar explicitly but overall complete for the complexity level.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters. The description adds meaning beyond the schema: explains purpose of 'async' (polling via job_result), clarifies 'focus' is optional for tracking specific moves, and emphasizes that 'selfCompany' requires a one-paragraph pitch. It also introduces the delivery persona 'Manue'. Adds moderate value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it's an LLM-narrated competitive-intelligence briefing for human consumption, distinguishing it from the sibling tool 'competitive_deep_dive' which provides raw structured data. Specific verb (briefing) and resource (competitive intelligence) with concrete use cases: board meeting, pitch prep.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use when the buyer wants a narrative briefing or a deck' and contrasts with the paired tool 'competitive_deep_dive' for raw data. It provides clear context on when to use this tool, though it doesn't explicitly state when not to use it beyond the alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitor_movesB

Read-only

Inspect

Mouvements concurrents — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: What have my named competitors done recently — releases, pricing changes, hires, funding? · Which competitor signals are the most urgent right now and what should I do about them? Reference case: Notion — moves de ClickUp, Asana, Coda. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`competitors`	Yes
`selfCompany`	Yes

Tool Definition Quality

B3.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so safety is clear. The description adds that the deliverable is 'audited' and involves 'Gapup agent-payable C-suite expertise', providing context beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (around 80 words) and front-loaded with purpose. It mixes French and English slightly but is structured with bullet points. Every sentence adds value, though it could be slightly cleaner.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested objects, no output schema, many sibling tools), the description is insufficient. It does not explain the output format beyond 'structured deliverable', nor does it describe what inputs are expected beyond the schema. Missing guidance on expected competitor naming or pitch content.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (25% with only async described). The description does not add meaningful details about the parameters (selfCompany, competitors, focus). It only says 'send the documented case fields', which is vague and does not compensate for the coverage gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a structured deliverable about recent competitor moves (releases, pricing, hires, funding) and urgent signals. It uses specific verbs like 'returns' and answers, but does not explicitly differentiate from sibling competitor tools like competitor_intel or competitive_deep_dive.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies using when needing recent competitor moves and urgent signals, but provides no guidance on when not to use this tool versus alternatives. No when-to-use or when-not-to-use exclusions are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitor_pricing_radarC

Read-only

Inspect

Radar pricing concurrents — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: How do my competitors' pricing plans and monthly prices compare to mine? · Which competitor plan undercuts or out-features my equivalent tier? Reference case: Notion — pricing vs ClickUp, Asana, Coda. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`competitors`	Yes
`selfCompany`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and open-world behavior. The description adds that it returns a structured, audited deliverable and validates inputs server-side, providing some context beyond annotations but not extensive behavioral details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes some cryptic phrasing ('Gapup agent-payable C-suite expertise (CMO)') and mixed language, reducing clarity. It is average in conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the lack of output schema and many sibling tools, the description provides a basic understanding but does not fully explain the output format or how to interpret results, leaving gaps for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (25%). The description does not add meaningful parameter details beyond the schema's own descriptions, only vaguely mentioning 'send the documented case fields', which does not compensate for the coverage gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool compares competitor pricing plans and monthly prices, answering specific questions. However, it does not explicitly differentiate from similar sibling tools like 'competitor_pricing_scrape' or 'competitor_intel'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a reference case and the types of questions answered, implying usage for competitive pricing analysis, but it lacks explicit guidance on when to use this tool versus alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitor_pricing_scrapeA

Read-only

Inspect

Scrape and parse a competitor pricing page from a URL or domain. Fetches via proxy-aware timedFetch (tries /pricing, /plans, homepage fallback), then extracts: plan names, prices, billing cadence (monthly/annual/usage-based/one-time), key features, free tier presence, enterprise tier, estimated price range. Returns structured pricing tiers. If unfetchable or no pricing found (anti-bot, SPA, auth wall): returns a clear degraded result with warnings and signals — never fake success. ICP: founders, product managers, pricing strategists, competitive intel teams. Proxy-aware (AICI_RESEARCH_PROXY_URL). Cache TTL 6h.

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	Competitor URL or domain (e.g. 'https://notion.so/pricing', 'notion.so', 'https://www.example.com'). For best results, provide the direct pricing page URL.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.

Output Schema

ParametersJSON Schema

Name	Required	Description
`tiers`	Yes
`domain`	Yes
`status`	Yes
`warnings`	Yes
`url_fetched`	Yes
`has_free_tier`	Yes
`pricing_found`	Yes
`quality_score`	Yes
`raw_price_signals`	Yes
`has_enterprise_tier`	Yes
`plan_names_detected`	Yes
`billing_model_signals`	Yes
`estimated_price_range`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly and non-destructive nature. The description adds value by detailing proxy-awareness, cache TTL, and failure modes (degraded result with warnings). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured, front-loading the main action and then covering details. While comprehensive, it remains fairly concise, though some minor redundancy (e.g., 'proxy-aware' mentioned twice) could be trimmed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (scraping, multiple fallbacks, degraded results), the description covers all key aspects: fetching strategy, extraction details, failure handling, proxy, caching, ICP. Output schema exists, so return values are implicitly covered. Complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both 'url' and 'async'. The description expands on the 'url' parameter (accepts domain or full URL) and 'async' (returns job_id immediately), providing more meaning than the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb ('Scrape and parse') and resource ('competitor pricing page'), specifies the extracted data (plan names, prices, etc.), and distinguishes it from sibling tools like 'competitive_deep_dive' by focusing on pricing extraction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage context: proxy-aware fetching, fallback behavior, and caching. It mentions ICP and best practices for URL input, but does not explicitly contrast with alternative tools for when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitor_profilesC

Read-only

Inspect

Profils concurrents — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: What are the strengths, weaknesses and positioning of each of my competitors? · Give me a SWOT-style profile of a named competitor. Reference case: Notion — profils de ClickUp, Asana, Coda. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`competitors`	Yes
`selfCompany`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds that it returns an 'audited deliverable' and that inputs are validated server-side, and implicitly supports async via the schema. This adds some context beyond annotations but does not detail behavioral traits like data sources, latency, or error states. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise (3 sentences) but contains unclear English/French mix ('Gapup agent-payable C-suite expertise (CMO)') and could be more straightforward. The structure is front-loaded with the primary purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the nested parameter objects, low schema coverage, and absence of output schema, the description should provide more context. It lacks details on the deliverable format, how focus affects output, error handling, or the significance of the audience (CMO). The async behavior is only in schema, not described.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (25%, only async documented). The description does not explain key parameters like selfCompany.pitch, focus, or competitors structure beyond mentioning 'documented case fields'. The example is helpful but insufficient to compensate for the lack of explicit parameter meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns structured competitor profiles addressing strengths, weaknesses, and positioning. The example (Notion profiles of ClickUp, Asana, Coda) reinforces the purpose. However, it does not explicitly differentiate from similar siblings like competitive_deep_dive or competitor_intel, and includes unclear phrasing ('Gapup agent-payable C-suite expertise').

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides example queries and a reference case, implying use cases for competitive analysis. However, it offers no guidance on when to choose this tool over the many sibling competitor tools, no exclusion criteria, and no mention of prerequisites or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitor_recommendationsA

Read-only

Inspect

Recommandations concurrentielles — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: Given my competitors, what strategic actions should I take and in what order? · What should my 7/30/90/180-day competitive response plan look like? Reference case: Notion — actions face à ClickUp, Asana, Coda. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`competitors`	Yes
`selfCompany`	Yes

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and openWorldHint. Description adds agent-payable context, audit delivery, and server-side validation, but does not elaborate on safety or side effects beyond the hints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is front-loaded with key questions but includes a French title and some redundant phrasing. It could be more concise without losing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema exists, and description only vaguely mentions 'structured, audited deliverable' without return field details. Input parameter details are left to the schema, which is partially described. A more complete description would specify output format and parameter constraints.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (async field described). Description does not explain other parameters like focus, competitors, or selfCompany beyond the schema's basic field definitions, missing opportunity to clarify usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states tool returns strategic competitive recommendations and a structured plan, distinguishing it from siblings like 'competitive_deep_dive' or 'competitor_intel' by focusing on actionable steps and timelines.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description gives explicit context for when to use (strategic actions based on competitors) and includes example questions, but does not contrast with sibling tools or state when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

comp_plan_architectB

Read-only

Inspect

Architecture plan de commissionnement — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Comp Plan 8 rôles commerciaux · OTE €65-280k · Budget comp €2.1M · Quota coverage 3.2×. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`targets`	Yes
`geography`	No
`salesTeam`	Yes
`currentChallenges`	Yes
`preferredStructure`	No

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds that it returns a 'structured, audited deliverable' and notes server-side validation, but does not elaborate on what the deliverable contains or any other behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (two sentences) and front-loaded with the main purpose, but the reference case adds length. Overall, it is efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with nested objects, 7 parameters, and no output schema, the description is incomplete. It does not describe the output structure, return format, or how the deliverable is 'audited', leaving the agent with significant ambiguity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 14%, and the description does not add meaning for the 7 parameters. It merely says 'send the documented case fields' without explaining each parameter or its constraints, leaving the agent to rely on the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates an 'Architecture plan de commissionnement' for a specific role (CRO), includes a reference case with concrete numbers, and distinguishes it from sibling tools like comp_benchmark_geo_delta.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives or when not to use it. The description only instructs to 'send the documented case fields' without providing context for selection among the many sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_audience_profileA

Read-only

Inspect

Return the audience targeting profile of a content entity — its enrichment tags reframed as audience facets with confidence, corroboration and full provenance (verifiable, sourced). The response also carries an entity-level provenance block (average confidence, data freshness). When to use this tool: an ad-tech or marketing agent needs a machine-readable, verifiable audience descriptor for a franchise or work. Input: an entity_id and its type.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entity_id`	Yes	Entity id from content_catalog
`entity_type`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`entity_id`	Yes
`provenance`	Yes	Entity-level trust & freshness summary.
`entity_type`	Yes
`audience_facets`	Yes	Map facet → array of { label, confidence, corroboration, source_count, sources }

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and openWorldHint, so the tool is safe and may return variable data. The description adds useful behavioral context by mentioning full provenance, verifiability, and entity-level provenance block with average confidence and data freshness, which goes beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences plus a usage line, front-loaded with the main purpose. Every sentence adds value with no fluff. Very concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description adequately explains the response content (audience facets, provenance block). It covers purpose and a use case. Lacks mention of async behavior handling but schema covers that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 67%, and the description mentions the two key parameters (entity_id and its type). However, it adds little beyond what the schema already provides (e.g., entity_id is 'from content_catalog'). No additional semantic detail for async parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns the audience targeting profile of a content entity, describing its contents (enrichment tags reframed as audience facets with confidence, corroboration, provenance). It specifies the verb 'return' and the resource 'audience targeting profile', distinguishing it from sibling tools like content_enrichment or content_catalog.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes an explicit 'When to use this tool:' section, stating it's for ad-tech or marketing agents needing a machine-readable, verifiable audience descriptor. It mentions input requirements but does not state when not to use it or list alternatives, though context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_catalogA

Read-only

Inspect

Browse the Gapup gold-standard content catalogue — video games, films, TV series and music. Returns franchises with their works (title, release year). When to use this tool: an agent needs structured, audited metadata for a cultural franchise, wants to resolve a title to a canonical entity, or browses a domain's catalogue before requesting enrichment. Inputs: a content domain and an optional case-insensitive name filter. Each franchise id can be passed to content_enrichment for its fine-grained tag profile.

ParametersJSON Schema

Name	Required	Description
`name`	No	Optional case-insensitive substring filter on franchise name
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No	Maximum number of franchises to return (default 20)
`domain`	Yes	Content domain to browse

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`domain`	Yes
`franchises`	Yes

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that it returns franchise data with title and release year, which is consistent but doesn't add critical new behavioral details beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is front-loaded with purpose, then usage, then input details. Every sentence adds value, no redundancy. Efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given output schema exists, the description is complete. It names the output fields (franchise id, title, release year) and links to a related tool, satisfying context needs for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description adds little beyond the schema. It mentions 'optional case-insensitive name filter' and domain enum, but these are in the schema. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: browse a content catalog, returning franchises with works. It distinguishes from sibling tools by name-dropping content_enrichment as a downstream use for franchise ids.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit use cases: needing structured metadata, resolving a title, or browsing before enrichment. No explicit when-not scenarios, but clear enough for agents.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_compareA

Read-only

Inspect

Compare the tag profiles of two content entities (franchises or works) and measure how similar they are. Returns a Jaccard similarity score, the list of shared tags, the tags unique to each entity, and a breakdown of shared tags by facet. When to use this tool: an agent needs to compare two franchises or works (e.g. 'how similar are Dark Souls and Elden Ring?', 'what do Street Fighter and Mortal Kombat have in common?', 'on which axes do these two games differ?'), find positioning overlap, identify cross-sell opportunities, or answer 'if you liked X you might like Y' questions backed by data. Works for any domain (video-games, music, film, tv).

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entity_a`	Yes	Id of the first entity from content_catalog (e.g. 'game-dark-souls', 'music-daft-punk').
`entity_b`	Yes	Id of the second entity from content_catalog (e.g. 'game-elden-ring', 'music-justice').
`entity_type`	No	Whether both ids are franchises or works (applies to both). Defaults to 'franchise'.

Output Schema

ParametersJSON Schema

Name	Required	Description
`entity_a`	Yes
`entity_b`	Yes
`similarity`	Yes	Jaccard index = \|shared\| / \|union\|, rounded to 2 decimal places. 0 = no overlap, 1 = identical profiles.
`a_tag_count`	Yes
`b_tag_count`	Yes
`entity_type`	Yes
`shared_tags`	Yes	Tags present in both entities (up to 40).
`unique_to_a`	Yes	Tags present only in entity_a (up to 40).
`unique_to_b`	Yes	Tags present only in entity_b (up to 40).
`shared_count`	Yes
`shared_by_facet`	Yes	Count of shared tags per facet (e.g. { genre: 3, theme: 5 }). Shows which dimensions drive the similarity.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, which the description supports by describing a read operation. Description adds context on domain-agnostic behavior and output details. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise, no fluff. Front-loads purpose and outputs, then usage scenarios. Every sentence serves a purpose. Well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With output schema present, description doesn't need to detail return values. Covers purpose, usage, and parameters sufficiently for a tool with 4 parameters and 1 enum. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and description reinforces parameter meanings with examples (e.g., 'game-dark-souls') and clarifies that entity_type applies to both. Adds value beyond schema by providing usage context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it compares tag profiles of two content entities and measures similarity. Lists specific outputs (Jaccard score, shared tags, unique tags, facet breakdown). Verb 'Compare' is specific and resource is defined. Differentiates from siblings like content_similar by naming the metric and facet detail.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly provides 'When to use this tool:' with concrete examples such as comparing franchises, finding overlap, cross-sell, or recommendation. Could mention not to use for non-content entities, but overall gives clear context for appropriate use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_discoveryA

Read-only

Inspect

Discover content franchises within a domain. Two modes: pass tag for a precise taxonomy match (every game tagged 'co-op'), or pass query for free-text SEMANTIC search powered by pgvector embeddings — finding franchises by meaning ('dark atmospheric games about isolation') even when no literal tag matches. Results are verifiable: tag mode carries tag confidence/corroboration, semantic mode carries a similarity score; both carry entity freshness. When to use: an agent wants a domain-scoped shortlist by tag or by intent. Inputs: a domain plus either a tag or a free-text query.

ParametersJSON Schema

Name	Required	Description
`tag`	No	Tag label to match precisely (e.g. 'thriller', 'co-op'). Mutually exclusive with `query`.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No	Maximum franchises to return (default 25)
`query`	No	Free-text intent for semantic search (e.g. 'melancholic synth-pop about heartbreak'). Mutually exclusive with `tag`.
`domain`	Yes	Content domain to search within

Output Schema

ParametersJSON Schema

Name	Required	Description
`tag`	No
`count`	Yes
`query`	No
`domain`	Yes
`method`	Yes
`franchises`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint. The description adds that results are verifiable with confidence/similarity scores and freshness, and explains the semantic search mechanism (pgvector embeddings), providing useful behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, efficient and front-loaded with the purpose, followed by mode details and usage guidance. No unnecessary words, every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with two modes, 5 parameters, and an output schema (not shown), the description covers the main use cases, result types, and inputs adequately. The async parameter is not mentioned, but its description in the schema is clear, and the description focuses on the core functionality. Minor gap but overall complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all 5 parameters. The description adds value by explaining the two modes (tag vs query) with examples, their mutual exclusivity, and the meaning of results (confidence vs similarity), which goes beyond the schema's parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool discovers content franchises within a domain, explains two modes (tag and semantic query) with concrete examples, and implicitly distinguishes itself from a large set of sibling tools by being domain-scoped and franchise-focused.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'When to use: an agent wants a domain-scoped shortlist by tag or by intent' and details the mutual exclusivity of tag and query. It lacks negative guidance on when not to use it, but the context is clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_engineC

Read-only

Inspect

Moteur de contenu — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Notion — content engine 2026 (productivity B2B). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`brand`	Yes
`months`	Yes
`cluster`	Yes
`maxArticlesPerMonth`	Yes

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, indicating a safe read operation. The description adds that inputs are validated server-side and that an audited deliverable is returned, which provides some behavioral context, but it lacks details on rate limits, authentication, or error behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (three sentences) and reasonably concise, but it lacks structure and clarity. It mixes French and English and does not front-load the most critical information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has complex nested inputs, no output schema, and open-world semantics. The description fails to explain the deliverable's format, pagination, or how to use the async parameter. Important context is missing for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, yet the description does not compensate by explaining the parameters. It only says 'send the documented case fields' without clarifying what fields or how they relate to the result. The nested objects are not explained beyond schema structure.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a 'structured, audited deliverable' and mentions a reference case (Notion), but the verb and resource are vague. It does not clearly specify what action the tool performs or what type of content engine it is, making it somewhat ambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives among the many sibling tools. It mentions 'Gapup agent-payable C-suite expertise (CMO)' but does not explain context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_enrichmentA

Read-only

Inspect

Return the enriched tag profile of a content entity — the Gapup moat. Each tag carries a facet (genre, theme, play-mode, perspective…), a confidence score, a corroboration score and its full provenance (which sources corroborated it, when). The response also carries an entity-level provenance block (average confidence, data freshness). When to use this tool: an agent has a franchise or work id (from content_catalog) and needs a fine-grained, machine-readable, verifiable characterisation for matching, recommendation, contextual targeting or analysis. Inputs: an entity id and its type.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entity_id`	Yes	Entity id from content_catalog (e.g. 'music-daft-punk', 'film-the-dark-knight-collection:the-dark-knight')
`entity_type`	No	Whether the id is a franchise or a work (default franchise)

Output Schema

ParametersJSON Schema

Name	Required	Description
`tags`	Yes
`entity_id`	Yes
`tag_count`	Yes
`provenance`	Yes	Entity-level trust & freshness summary.
`entity_type`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds value by detailing the response structure (tag facets, confidence, provenance) and entity-level provenance block, providing context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two well-organized paragraphs: first explaining functionality and response structure, second detailing usage and inputs. Every sentence adds value, no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the existence of an output schema, the description is complete: it covers purpose, usage context, input parameters, and high-level response contents. It addresses the complexity of the tool's output without overexplaining.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with clear parameter descriptions. The tool description briefly mentions inputs but adds minimal new information beyond what the schema already provides. The baseline of 3 is appropriate as the schema carries the full burden.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns an enriched tag profile of a content entity, detailing components like facet, confidence, and provenance. It distinguishes itself by specifying the use case for matching, recommendation, etc., which sets it apart from sibling tools like content_catalog or content_similar.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool: when an agent has a franchise or work id and needs fine-grained characterization. It provides context and purpose but does not explicitly state when not to use or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_evergreen_score_analyzerA

Read-onlyIdempotent

Inspect

Evaluates content evergreen potential for CMOs by analyzing historical traffic patterns and backlink authority. Takes a content URL and optional time range, returns an evergreen score (0-100), traffic trend analysis, and backlink profile. Ideal for content strategy planning, SEO optimization, and identifying high-value evergreen assets. Uses Wayback Machine and Common Crawl public APIs.

ParametersJSON Schema

Name	Required	Description
`url`	Yes	Content URL to analyze
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`toDate`	No	End date for historical analysis (YYYY-MM-DD)
`fromDate`	No	Start date for historical analysis (YYYY-MM-DD)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`lastSeen`	No
`warnings`	Yes
`firstSeen`	No
`trafficTrend`	Yes
`backlinkCount`	No
`evergreenScore`	Yes
`backlinkDomains`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, and openWorldHint=true. The description adds value by explaining the data sources (Wayback Machine and Common Crawl public APIs) and the nature of the analysis. No contradictions with annotations. Minor gap: no mention of potential rate limits or errors, but acceptable given annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph of three sentences, concise and front-loaded. It includes some slightly extraneous phrasing ('Ideal for...'), but overall efficient. Could be tightened slightly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown but noted as present), the description need not explain return values. It covers purpose, inputs, use cases, and data sources. Does not mention prerequisites or error handling, but for a read-only, idempotent tool, it is fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with clear descriptions for each parameter. The description reiterates the key inputs ('Takes a content URL and optional time range') but adds no new details beyond the schema. Baseline of 3 is appropriate as schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool evaluates content evergreen potential for CMOs by analyzing historical traffic patterns and backlink authority. It specifies the output (evergreen score 0-100, traffic trend analysis, backlink profile) and distinguishes from sibling content tools by focusing on evergreen analysis and audience (CMOs).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states ideal use cases: 'content strategy planning, SEO optimization, and identifying high-value evergreen assets.' It does not explicitly mention when not to use or list alternative tools, but the context of sibling tools and the phrase 'Ideal for' provides clear guidance on appropriate scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_provenanceA

Read-only

Inspect

Audit the full data provenance of a content entity — all its enrichment tags with their extraction source, corroboration score, source list and last verification date, plus an entity-level freshness summary. Use this tool before citing or relying on enriched content data in a high-stakes context (ad targeting, editorial, analysis). Inputs: entity_id (required) and entity_type (franchise or work).

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entity_id`	Yes	Entity id from content_catalog (e.g. 'video-game-elden-ring')
`entity_type`	No	Whether the id is a franchise or a work (default: franchise)

Output Schema

ParametersJSON Schema

Name	Required	Description
`lineage`	Yes	Full tag lineage from v_data_lineage — one entry per tag.
`entity_id`	Yes
`entity_type`	Yes
`freshness_summary`	Yes	Entity-level freshness & trust summary from v_entity_freshness.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds useful behavioral detail by describing what the tool returns (tags, scores, freshness summary), complementing the annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences: the first explains the tool's output, the second provides usage guidance and inputs. It is front-loaded with key information and contains no extraneous words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the existence of an output schema, the description need not detail return values. It adequately covers the tool's purpose, usage context, and inputs, making it sufficiently complete for an agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description reiterates the required parameters (entity_id, entity_type) but adds no new semantic information beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Audit the full data provenance of a content entity' and lists specific outputs (enrichment tags, extraction source, corroboration score, etc.). It distinguishes itself from sibling content tools by specifying a high-stakes context, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage context: 'Use this tool before citing or relying on enriched content data in a high-stakes context (ad targeting, editorial, analysis).' However, it does not name alternative tools or specify when not to use it, which is a minor gap given the large set of siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_rankingA

Read-only

Inspect

Return the TOP-ranked content entities in a category, by a chosen criterion — the direct answer to superlative / decision queries: 'best video games', 'top RPGs', 'cheapest games', 'best value RPGs', 'best FPS playable right now', 'most popular music artists'. Criteria: critic_score, popularity, price, value (critic score per unit price). direction flips it (asc = cheapest/lowest first). available_only restricts to entities currently buyable. Sliceable by genre and release-year window; every result carries its score, price and source. When to use: an agent must produce a ranked shortlist to support a recommendation, a purchase or a 'what is the best X' decision.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`genre`	No	Optional genre filter, e.g. 'RPG', 'FPS', 'thriller'
`limit`	No	Number of ranked results (default 20)
`domain`	Yes	Content domain to rank within
`year_to`	No	Optional latest release year
`criterion`	No	critic_score (0-100, default) · popularity · price · value (critic score per unit price)
`direction`	No	desc = best/highest first (default); asc = cheapest/lowest/least first. Defaults to asc for price.
`year_from`	No	Optional earliest release year
`available_only`	No	If true, restrict to entities currently available to buy/play (default false)

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`genre`	No
`domain`	Yes
`ranking`	Yes
`year_to`	No
`criterion`	Yes
`direction`	No
`year_from`	No
`available_only`	No

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Consistent with readOnlyHint=true; describes read operation. Adds behavioral details: returns scores, price, source. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Front-loaded with purpose, then details criteria, direction, available_only. Efficiently structured, no wasted words, though slightly long but justified by content richness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all essential aspects: purpose, parameter details, usage context, and output description (scores, price, source). With output schema present, description is fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions. Description adds meaning beyond schema: explains criteria choices (critic_score, popularity, price, value), direction defaults, and available_only behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns top-ranked content entities by a chosen criterion, with specific examples like 'best video games', 'top RPGs'. It distinguishes from sibling tools by focusing on superlative/decision queries.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'produce a ranked shortlist to support a recommendation, a purchase or a 'what is the best X' decision.' Provides context for criteria, direction, and available_only. However, does not mention when not to use or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_similarA

Read-only

Inspect

Find content entities similar to a given one. For embedded franchises this uses SEMANTIC vector similarity (pgvector) over the enrichment profile — surfacing entities that feel alike even when their tags differ literally. Falls back to shared enrichment-tag overlap for works or non-embedded entities. Each result carries a similarity score and its entity-level freshness/confidence (verifiable, sourced). When to use this tool: an agent wants recommendations or lookalikes for a franchise or work. Input: an entity_id and its type.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No
`entity_id`	Yes	Entity id from content_catalog
`entity_type`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`method`	Yes	How similarity was computed.
`similar`	Yes
`entity_id`	Yes
`source_provenance`	Yes	Provenance of the source entity used to compute similarity.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint. The description adds significant behavior: semantic vs tag-based similarity, fallback logic, result content (score, freshness, confidence), and async support. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences that front-load purpose, explain mechanics, and state when to use. Every sentence adds value; no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given output schema exists (not shown) and annotations declare read-only, the description covers algorithm choice, result fields, fallback, and async option. Sufficient for an AI to decide and invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 50% (async and entity_id have descriptions). The description adds meaning for entity_type by linking to algorithm behavior and for entity_id as required input. It doesn't explain limit or async further, but these are standard and schema covers async.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Find content entities similar to a given one' and explains different approaches for franchises vs works. It distinguishes from sibling tools like content_catalog, content_compare, and content_discovery by focusing on similarity/recommendations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'When to use this tool: an agent wants recommendations or lookalikes for a franchise or work.' It provides context but does not mention when not to use it or alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_taxonomyA

Read-only

Inspect

Return the enrichment taxonomy of a content domain — every tag grouped by facet (genre, theme, mood, play-mode…). When to use this tool: an agent needs the controlled vocabulary to filter, classify or query content. Input: a domain.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`domain`	Yes	Content domain

Output Schema

ParametersJSON Schema

Name	Required	Description
`domain`	Yes
`taxonomy`	Yes	Map facet → array of tag labels
`tag_count`	Yes
`facet_count`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint: true and openWorldHint: true. The description adds no additional behavioral context beyond confirming a read-only query. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose and usage. The final sentence ('Input: a domain') is somewhat redundant given the schema, but overall concise and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, annotations cover safety, and the description explains purpose and usage, the description is complete. It provides enough context for an agent to select and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema already documents both parameters. The description only mentions 'Input: a domain' without adding new semantics or usage details beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Return') and resource ('enrichment taxonomy') and clarifies the output structure ('every tag grouped by facet'). It clearly distinguishes this tool from siblings like content_catalog or content_ranking by focusing on the controlled vocabulary.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool ('an agent needs the controlled vocabulary to filter, classify or query content'), providing clear context. However, it does not mention exclusions or alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contract_risk_scannerB

Read-only

Inspect

Scanner de risques contractuels — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Salesforce MSA — revue d'un client SaaS B2B EMEA. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`contractText`	Yes
`contractContext`	Yes

Tool Definition Quality

B3.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and openWorldHint=true, and the description adds that it returns a deliverable and performs server-side validation. No contradiction. The description provides context about the output being structured and audited.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short and front-loaded with purpose, but includes cryptic jargon ('Gapup agent-payable C-suite expertise (RISK)') that may confuse some agents. It earns its place but could be clearer.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested parameters, risk scanning) and lack of output schema, the description provides a reference case but falls short of fully explaining input details or output format. Adequate but not complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 25% (async parameter has a description). The description does not explain 'contractText' or 'contractContext' beyond implying they are inputs, and does not clarify 'focus' or 'async.' The phrase 'send the documented case fields' is vague.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a contractual risk scanner that returns a structured, audited deliverable, and provides a concrete reference case (Salesforce MSA). However, it does not differentiate from sibling tools like legal_clause_extractor or ip_contract_clause_extractor.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for contract risk scanning but gives no explicit guidance on when to use this tool over alternatives, no exclusions, and no prerequisites beyond 'send the documented case fields.'

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

corporate_registry_lookupA

Read-onlyIdempotent

Inspect

Resolve legal information about a company from its national corporate registry. Returns a normalised, sourced company profile: legal status, registration number, directors, shareholders, recent filings, registered address, share capital, and a quality score (0–100). Coverage: France (INPI, keyless — full SIREN/SIRET with directors), 3M+ entities worldwide via GLEIF LEI (keyless, large companies), UK (Companies House, optional key), Netherlands (KvK, optional key), and OpenCorporates (token required since 2026). Sources are tried in cascade; quality_score increases with each source that succeeds. When to use: due-diligence, KYC screening, supplier verification, M&A research, or any workflow needing verified company identity and legal status. Optional env vars: COMPANIES_HOUSE_API_KEY (UK), KVK_API_KEY (NL), OPENCORPORATES_API_TOKEN (OpenCorporates token).

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`country`	No	ISO 3166-1 alpha-2 country code (e.g. 'FR', 'GB', 'NL', 'DE', 'SG', 'AU', 'US'). If omitted, inferred from legal suffix in company name, then falls back to global search.
`identifier`	No	Optional registry identifier for a fast direct lookup: SIREN (FR, 9 digits), Companies House number (GB, 8 chars), KvK number (NL, 8 digits), etc.
`company_name`	Yes	Company name or trading name to look up (e.g. 'Sanofi', 'Tesco PLC', 'Notion Labs Inc')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`registry`	Yes
`directors`	Yes
`freshness`	Yes	ISO timestamp
`identifier`	Yes
`legal_form`	No
`legal_name`	No
`company_name`	Yes
`jurisdiction`	Yes
`shareholders`	Yes
`quality_score`	Yes	0-100 confidence score
`share_capital`	No
`filings_recent`	Yes
`incorporation_date`	No
`registered_address`	No

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses source cascade, quality score increment, optional API keys, and coverage limitations, adding significant value beyond annotations which already indicate read-only and idempotent behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single well-organized paragraph front-loads purpose and covers key aspects, though could be slightly more structured (e.g., bullet points) for readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given complexity, it fully covers functionality, coverage, cascade, optional keys, and usage context. Output schema exists, so return values are not required in description.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds minimal extra parameter meaning beyond what schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool resolves legal information from national corporate registries, lists returned fields, and specifies coverage by country. It is specific and distinct from siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (due-diligence, KYC, etc.) and mentions cascade logic, but does not provide when-not-to-use or alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

court_filings_multiA

Read-only

Inspect

Aggregate court filings, judgments and litigation records for a company or individual across five major legal jurisdictions: US (CourtListener / PACER), UK (National Archives — EWHC/EWCA/UKSC/UKUT), EU (ECHR HUDOC — European Court of Human Rights), France (Légifrance / Cour de cassation) and Germany (BGH / BVerfG). Returns structured case records with type classification (civil/criminal/antitrust/bankruptcy/administrative/unknown), status (filed/pending/decided/appealed/unknown), parties extracted from case titles, opinion URLs and verbatim snippets. Cross-case pattern recognition produces severity-ranked signals (P0–P2) for criminal, antitrust, bankruptcy, regulatory, data-breach and IP categories. Use when: due diligence on a counterparty, vendor risk assessment, competitive intelligence (litigation history), regulatory exposure mapping. All sources are public and keyless. Optional env var COURTLISTENER_API_KEY raises US rate limits beyond the default 5 req/s anonymous tier. SLA: ≤25s p95 (all jurisdictions fetched in parallel, 8s budget per source). Quality score: 20 pts per jurisdiction with ≥1 case retrieved, +10 if signals detected, +5–10 if ≥2–3 distinct sources contributed.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`date_to`	No	ISO date YYYY-MM-DD — latest filing or decision date to include
`date_from`	No	ISO date YYYY-MM-DD — earliest filing or decision date to include
`party_name`	Yes	Name of the company or individual to search (e.g. "Apple Inc", "TotalEnergies", "Volkswagen AG")
`jurisdiction`	No	Jurisdictions to search. Defaults to all ["US","UK","EU","FR","DE"].

Output Schema

ParametersJSON Schema

Name	Required	Description
`cases`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`party_name`	Yes
`quality_score`	Yes
`by_jurisdiction`	Yes
`jurisdictions_searched`	Yes

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description extensively discloses behavior beyond annotations: it returns structured case records with classification, status, parties, URLs, and snippets; cross-case pattern recognition produces severity-ranked signals; it indicates parallel fetching of jurisdictions with a 25s SLA; explains async mode and quality scoring. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the core action, then covers output structure, use cases, technical details, and scoring. While lengthy, every sentence contributes value. A minor reduction could improve conciseness without losing information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multi-jurisdiction, structured output, async mode, scoring), the description covers purpose, parameters, output, use cases, limitations, SLA, and quality metrics. It is complete and leaves no major gaps for an agent invoking this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents parameters adequately. The description adds minimal extra semantics (e.g., that jurisdiction defaults to all five). Baseline 3 is appropriate; no substantial enhancement beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as aggregating court filings across five jurisdictions, specifying the verb 'aggregate' and the resource ('court filings, judgments and litigation records'). It explicitly lists the jurisdictions and output components, making its unique purpose unmistakable among siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit use cases (due diligence, risk assessment, competitive intelligence, regulatory mapping) and critical context (public sources, keyless access, optional API key for rate limits, SLA). It does not mention when to avoid this tool or name alternatives, but the given guidance is sufficient for informed selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

crm_connectorAInspect

Push, update, search and log activities in HubSpot, Salesforce or Pipedrive. 4 modes: push_lead (create contact/lead), update_opportunity (update deal stage/amount), search_contact (lookup by email), log_activity (call/email/meeting/note). Returns resource_id, direct CRM URL, signals and quality_score. If credentials are absent, returns a mock result with a warning signal. Auth: HubSpot via Bearer access_token; Salesforce via access_token + base_url; Pipedrive via api_key.

ParametersJSON Schema

Name	Required	Description
`data`	Yes	Payload depending on mode. push_lead: {email,first_name,last_name,company,phone,job_title}. update_opportunity: {deal_id/opportunity_id,stage,amount,close_date}. search_contact: {email}. log_activity: {type,body,contact_id/person_id,subject}.
`mode`	Yes	Action to perform in the CRM
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`provider`	Yes	CRM provider to target
`credentials`	No	Auth credentials. HubSpot: access_token. Salesforce: access_token + base_url. Pipedrive: api_key.

Output Schema

ParametersJSON Schema

Name	Required	Description
`url`	No
`mode`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`success`	Yes
`provider`	Yes
`data_synced`	No
`resource_id`	No
`quality_score`	Yes

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are minimal (readOnlyHint=false, openWorldHint=true). The description adds useful behavioral context: mock result when credentials are absent, auth details per provider, and return fields (resource_id, URL, signals, quality_score). However, it does not mention async behavior (though the schema includes an 'async' parameter) or error conditions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (5 sentences), front-loaded with core purpose, and each sentence adds value. No fluff; structured logically from purpose to modes to return behavior to auth.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, multiple modes, multiple CRMs), the description covers purpose, modes, fallback behavior, and auth. It is missing guidance on async behavior (present in schema but not in description) and error handling/rate limits. Overall fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the description adds limited new parameter info beyond what the schema already provides. It summarizes payload structures and auth requirements, which are already described in the schema, but does not add significant new meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: push, update, search, and log activities in HubSpot, Salesforce, or Pipedrive. It lists the four modes with specific actions, distinguishing it from sibling tools that are more specialized.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains what the tool does but provides no explicit guidance on when to use it versus other CRM-related tools in the sibling list. Usage is implied through the description of modes, but alternatives are not mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cross_sell_recoB

Read-only

Inspect

Recommandations cross-sell — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Alan × Gapup Hub — 3 produits recommandés · Fit 'perfect' × 2 · ARR potentiel +€18k. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`account`	Yes
`company`	Yes
`portfolio`	Yes

Tool Definition Quality

B3.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description states the tool returns a structured, audited deliverable, which aligns with the annotation readOnlyHint=true (safe read operation). It adds context about server-side validation and the nature of the output (audited). The openWorldHint annotation is supported by the reference to a real-world case. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes a long reference case that may not be essential for tool selection. The structure front-loads the purpose but is somewhat cluttered with jargon. It could be more concise and clearer.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, nested objects) and lack of output schema, the description provides only a rough idea of the output ('structured, audited deliverable') and an example. It does not specify the response format or fields, leaving the agent with incomplete information to handle the output.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is low (25%), and the description does not explain the meaning or usage of the parameters (company, account, portfolio) beyond saying 'send the documented case fields.' The reference case does not map to the schema fields. Therefore, the description adds little semantic value to the parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as providing cross-sell recommendations ('Recommandations cross-sell') and mentions it returns a structured, audited deliverable. The reference case illustrates the output with concrete numbers, making the purpose concrete. However, it does not differentiate from the sibling tool 'upsell_hunter' or other recommendation tools, and the jargon 'Gapup agent-payable C-suite expertise (CRO)' may be overly specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives a usage hint ('Inputs are validated server-side — send the documented case fields') and provides a reference case, implying applicability to similar scenarios. However, it lacks explicit guidance on when to use this tool versus alternatives like 'upsell_hunter' or 'tool_recommend', and no exclusion criteria are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

crypto_wallet_intelA

Read-only

Inspect

Multi-chain on-chain analytics for crypto trading agents, on-chain analysts, AML/compliance teams and DeFi BD. Covers Ethereum, Base, Polygon, BSC, Arbitrum, Optimism — EVM-compatible addresses only.

5 modes: • wallet_profile — full wallet summary: type (EOA/contract/CEX/protocol), inferred persona (whale/MEV-bot/DeFi-user/hodler…), age, tx count, native balance, ERC-20 count, NFT collections, OFAC sanctions flag • token_flows — ERC-20 inflows/outflows per token on the selected period, priced in USD via CoinGecko • pnl_estimate — FIFO realized + unrealized P&L on the period with confidence rating (high/medium/low) • counterparties — top 20 counterparties ranked by USD volume with CEX/DEX/protocol labels • defi_positions — active DeFi positions detected via Etherscan interaction history (Aave/Compound/Uniswap/Curve/Lido/Balancer/SushiSwap)

Signal detection (P0/P1/P2): P0 if OFAC SDN match OR direct Tornado Cash / sanctioned-protocol interaction P1 if >$1M volume on wallet <30 days old OR MEV-bot pattern OR >80% volume on single counterparty P2 informational (CEX wallet, new wallet, no anomaly)

Sources: Etherscan family (keyless free-tier, optional API key per chain), DefiLlama (keyless), public EVM RPC (keyless), CoinGecko free tier (keyless). Cache TTL: 5 min (wallet activity evolves fast). Budget: 8s per source.

Env vars (all optional, raise Etherscan rate-limit from 1 req/5s to 5 req/s): ETHERSCAN_API_KEY · BASESCAN_API_KEY · POLYGONSCAN_API_KEY BSCSCAN_API_KEY · ARBISCAN_API_KEY · OPTIMISM_API_KEY

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analysis mode. wallet_profile=full wallet summary + persona + sanctions flag. token_flows=ERC-20 inflows/outflows per token priced in USD. pnl_estimate=FIFO realized+unrealized P&L with confidence. counterparties=top 20 counterparties by volume. defi_positions=active positions on Aave/Compound/Uniswap/Curve/Lido/etc.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`chain`	No	Chain to analyze. Default "ethereum". Use "all" to scan all 6 chains (slower, ~30s).
`address`	Yes	EVM-compatible wallet address (0x... 40 hex chars). Works on all supported chains.
`period_days`	No	Lookback window in days for token_flows, pnl_estimate, counterparties, defi_positions. Default 30.
`min_value_usd`	No	Minimum USD value filter for token_flows and counterparties. Default $100.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`address`	Yes
`signals`	Yes
`sources`	Yes
`token_flows`	No
`pnl_estimate`	No
`quality_score`	Yes
`counterparties`	No
`defi_positions`	No
`wallet_profile`	No
`chains_analyzed`	Yes

Tool Definition Quality

A4.3/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description extensively details behavioral traits beyond annotations: data sources (Etherscan, DefiLlama, etc.), caching (5 min TTL), budget (8s per source), signal detection levels (P0/P1/P2), and optional API keys. Annotations only provide readOnlyHint and openWorldHint, so the description carries the full burden and does so thoroughly.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with sections but is quite lengthy. While it front-loads the purpose, the detail on modes, signal detection, and env vars could be tightened. It is functional but not optimally concise for an AI agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, 5 modes, multiple chains, signal detection), the description is remarkably complete. It covers all relevant aspects: modes, signal levels, data sources, caching, budget, and environment variables. The presence of an output schema reduces the need to explain return values, and the description fills all other gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema coverage is 100%, so baseline is 3. The description adds context about modes and data sources but does not significantly enhance parameter meaning beyond what the schema's param descriptions already provide.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as 'Multi-chain on-chain analytics' and enumerates 5 specific modes (wallet_profile, token_flows, etc.), with explicit chain support (Ethereum, Base, etc.) and EVM-only restriction, making it distinct from sibling tools which are mostly unrelated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use the tool (e.g., for EVM-address analytics, specific modes for different analyses) and includes a negative guideline (EVM-compatible addresses only). However, it does not explicitly compare to alternatives or state when not to use it beyond the EVM restriction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

customer_marketingC

Read-only

Inspect

Marketing clients & ambassadeurs — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Gapup Hub — 12 clients analysés · 4 ambassadeurs identifiés · Programme + 6 case studies + référral. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`goals`	Yes
`company`	Yes
`product`	Yes
`customers`	Yes
`targetUseCases`	No
`contentBudgetEur`	No

Tool Definition Quality

C2.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. Description adds that it returns an audited deliverable and validates inputs server-side, which is consistent with a read operation. No contradictions, but it does not disclose additional behavioral traits beyond the deliverable nature.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is fragmented, mixing French and English, and includes unnecessary jargon. It could be more concise and structured, with sentences that are not well-organized for quick understanding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (7 parameters, nested objects, no output schema), the description lacks important context about the deliverable's contents, expected results, or how to interpret outputs. The reference case is incomplete guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 7 parameters with nested objects, but description coverage is only 14% (likely just async). The description does not explain any of the required fields like company, goals, customers, or their semantics. It merely says 'send the documented case fields' without elaboration.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states it returns a structured deliverable related to marketing clients and ambassadors, but the purpose is vague and mixed with jargon (e.g., 'Gapup agent-payable C-suite expertise'). It does not clearly specify the core action beyond 'Marketing clients & ambassadeurs'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus sibling tools like event_marketing or marketing_roi_dashboard. The description provides a reference case but no when-not or alternative selection criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

customer_voice_synthC

Read-only

Inspect

Synthèse voix client — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Alan (assurance santé) — 3 personas · Top 5 douleurs · Repositionnement messagerie recommandé. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`dataSources`	Yes
`targetSegments`	Yes
`repositioningFocus`	No

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, indicating no state change and acceptance of unknown fields. The description adds no additional behavioral context beyond stating it returns a deliverable and inputs are validated. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences but lacks clarity: the first sentence is a tagline ('Gapup agent-payable C-suite expertise'), the second provides a reference case, and the third briefly notes validation. It is somewhat fragmented and could be more structured and concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite the tool's complexity (nested objects, 5 parameters, no output schema), the description fails to explain the deliverable's structure, how to use parameters effectively, or what the output contains. The reference case provides some context but is insufficient for complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 20% (only async parameter has a description). The description mentions 'send the documented case fields' but does not explain the meaning or usage of parameters like company, dataSources, or targetSegments. The reference case provides implicit hints, but this is insufficient for low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it performs 'Synthèse voix client' (customer voice synthesis) and returns a 'structured, audited deliverable', referencing personas, pain points, and messaging repositioning. This clearly identifies the tool's function, though the output format could be more explicit. It distinguishes from sibling tools by its focus on customer voice synthesis for C-suite/CMO.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus the many sibling tools (e.g., brand_builder, positioning_strategist). The description only mentions that inputs are validated server-side, but does not specify prerequisites, use cases, or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cve_security_lookupA

Read-only

Inspect

Look up CVE vulnerability data for enterprise security teams, DevSecOps and SOC analysts. Supports two modes: exact CVE ID lookup (e.g. 'CVE-2024-3094') or keyword search by product/vendor (e.g. 'openssl', 'Apache Tomcat'). Cross-references four authoritative keyless sources: NVD NIST (official CVE database, CVSS v3 scores, affected CPEs), CISA KEV (Known Exploited Vulnerabilities catalog — exploit_in_wild flag), EPSS FIRST (exploit probability 0-1), GitHub Security Advisories (ecosystem-specific: npm/pypi/maven). Returns structured vulnerability records with CVSS v3 scores, affected product version ranges, CWE weakness classification, references and exploitation status. Signals engine produces P0/P1/P2 alerts: P0=CVSS>=9 + active exploitation, P1=CVSS>=7 or EPSS>=70%, P2=CWE pattern clusters. Relevant for EU NIS2 and DORA supply chain risk obligations. Optional env: NVD_API_KEY (raises NVD rate-limit 5→50 req/30s), GITHUB_TOKEN (raises GHSA GraphQL rate-limit). Cache TTL 6h. SLA <=25s p95.

ParametersJSON Schema

Name	Required	Description
`mode`	No	Override auto-detection: "lookup" for exact CVE ID, "search" for product/vendor keyword.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	CVE ID (e.g. "CVE-2024-3094") or product/vendor keyword (e.g. "openssl", "Apache Tomcat"). Mode is auto-detected from the CVE-YYYY-XXXXX pattern.
`max_results`	No	Maximum number of vulnerabilities to return (default 20, max 50).
`severity_min`	No	Minimum CVSS v3 severity to include in results (default: no filter).
`published_after`	No	ISO date YYYY-MM-DD — only include CVEs published after this date. Defaults to 365 days ago for search mode.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`quality_score`	Yes
`vulnerabilities`	Yes

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, openWorldHint=true, and destructiveHint=false. The description adds significant behavioral context: cache TTL of 6 hours, SLA <=25s p95, rate-limit details for optional API keys, P0/P1/P2 alert signals from the engine, and cross-referencing behavior. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is comprehensive but lengthy (multiple paragraphs with details about regulatory compliance, alert levels, and API keys). While it front-loads the core purpose, some information could be condensed without losing critical guidance. It earns its keep but could be more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, output schema exists), the description covers all essential aspects: query modes, data sources, optional env variables for rate limits, caching, SLA, severity filtering, and alert classification. It is complete for an agent to select and invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with all six parameters having descriptions. The tool description adds value by explaining the two modes in more detail, the default for published_after (365 days for search), and the meaning of severity_min values. However, the schema already does a good job, so the description adds marginal but useful context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it looks up CVE vulnerability data, supports exact ID lookup and keyword search, and names specific authoritative sources (NVD NIST, CISA KEV, EPSS FIRST, GitHub Security Advisories). It distinguishes itself from sibling tools by focusing on security vulnerability intelligence with multiple data sources.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly explains when to use exact CVE ID lookup vs. product/vendor keyword search, and notes that mode is auto-detected from the query pattern. It provides usage context like cache TTL and SLA, but does not explicitly state when not to use this tool or how it differs from alternatives like a general web search for CVEs.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cyber_risk_auditorC

Read-only

Inspect

Auditeur de risque cyber — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Qonto — Audit cyber risque B2B FinTech · Score 58/100 → roadmap 90j · 8 findings critiques/high · économie prime -28%. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`techStack`	Yes
`currentPosture`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true (safe read). The description adds that it returns a structured deliverable and inputs are validated server-side, but no further behavioral details (e.g., no mention of external API calls or data retention). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph mixing French and English, which may confuse some agents. It includes a reference case that adds context but is not front-loaded with the core purpose. Could be more structured and concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite complex nested parameters and no output schema, the description only vaguely mentions a 'structured deliverable'. It lacks details on the output format, risk scoring, or what the findings entail. The reference case gives some context but is insufficient for complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%; only the 'async' parameter has a description. The description does not add meaning for the complex nested objects (company, techStack, currentPosture). It merely says 'send the documented case fields', leaving the agent to rely solely on the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a cyber risk auditor that returns a structured deliverable, with a concrete reference case example. However, it does not differentiate from numerous sibling audit tools like privacy_compliance_audit or vendor_risk_assessor.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives. The description mentions 'Gapup agent-payable C-suite expertise' but does not specify prerequisites, when not to use, or list alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

deal_coachC

Read-only

Inspect

Coach de deal MEDDIC — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Datadog Enterprise deal Société Générale €1.2M ARR — coaching MEDDIC + escalation plays + 14 next actions. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`deal`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`knownContext`	Yes
`buyingCommittee`	Yes

Tool Definition Quality

C2.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and mentions a reference case, but doesn't disclose additional behavioral traits beyond annotations, so it adds minimal value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph with some unnecessary detail (reference case, jargon) but is not overly long. It could be more concise and front-loaded with key purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested objects, 5 parameters, no output schema), the description is inadequate. It doesn't specify what the structured deliverable contains, how to interpret results, or provide sufficient guidance for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 20% (only the 'async' parameter is described). The description does not explain the meaning of the required parameters (deal, buyingCommittee, knownContext) or their fields, failing to compensate for low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it's a 'Coach de deal MEDDIC' and returns a structured deliverable, but uses jargon without a clear verb+resource. It doesn't distinguish from sibling tools like 'meddic_scoring', making purpose moderately clear but not specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. It mentions inputs are validated server-side but provides no when-to-use or when-not-to context, leaving the agent without decision criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

deal_structurerC

Read-only

Inspect

Structuration de deal — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: Agicap × Kyriba — Partenariat API Banking · 5 structures comparées · Term sheet 7 clauses · Score 83/100 JV. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`deal`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description confirms it returns a deliverable (consistent with read-only) and references a case (consistent with open world). It adds no further behavioral traits beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately concise but includes extraneous details (reference case, jargon like 'Gapup agent-payable C-suite expertise'). It could be streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given complex nested parameters and no output schema, the description is incomplete. It does not clarify the return format, meaning of scores/structures, or how to interpret the output.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 33% (only async described). The description does not explain the nested parameters (company, deal fields). It merely states inputs are validated server-side, providing no meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is for 'deal structuration' and returns a structured, audited deliverable. It provides a specific reference case illustrating the output format. However, it does not explicitly differentiate from sibling deal-related tools like deal_coach or term_sheet_negotiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description only mentions input validation; it does not provide when-to-use or when-not-to-use context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dependency_vulnerability_scanA

Read-only

Inspect

SCA (Software Composition Analysis) — scans a project dependency manifest and returns known vulnerabilities for each dependency. Supports: package.json (npm), requirements.txt (Python), go.mod (Go), Cargo.toml (Rust), composer.json (PHP), Gemfile.lock (Ruby), CycloneDX SBOM JSON. PRIMARY source: OSV.dev (keyless, free, covers npm/PyPI/Go/crates.io/Packagist/RubyGems + GHSA advisories federated). CVSS enrichment: NVD NIST (when OSV lacks score). Exploitation flag: CISA KEV (known-exploited-vulnerabilities catalog). Returns per-vuln CVE/GHSA IDs, severity, CVSS score, fixed version, and actionable upgrade recommendations. Relevant for EU NIS2 supply chain risk obligations, DORA, SOC 2 vendor assessments. Cache TTL 6h. Parallel OSV queries (concurrency=10). SLA <=30s p95.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Manifest type: "package_json"=npm, "requirements_txt"=pip, "go_mod"=Go modules, "cargo_toml"=Rust, "composer_json"=PHP, "gem_lock"=Ruby, "sbom_cyclonedx"=CycloneDX SBOM JSON.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`severity_min`	No	Minimum severity to include in results (default: "medium").
`manifest_content`	Yes	Raw text content of the manifest file to scan (e.g. full contents of package.json, requirements.txt, etc.).
`include_transitive`	No	Include transitive/indirect dependencies in results (default: true).

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`sources`	Yes
`summary`	Yes
`ecosystem`	Yes
`quality_score`	Yes
`recommendations`	Yes
`vulnerabilities`	Yes
`dependencies_parsed`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, destructiveHint), the description adds significant behavioral context: data sources (OSV.dev, NVD, CISA KEV), cache TTL (6h), parallel query concurrency, SLA, and output details. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is information-dense and well-organized, but slightly verbose with regulatory references and SLA details. Could be trimmed without losing clarity, but still effectively communicates key points.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple modes, data sources, output schema), the description covers all essential aspects: purpose, input formats, output details, performance, and regulatory relevance. It complements the existing output schema well.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for each parameter. The description further clarifies mode mappings (e.g., 'package_json'=npm), explains async behavior (returns job_id), and notes defaults (severity_min=medium, include_transitive=true), adding value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is an SCA tool that scans a dependency manifest and returns known vulnerabilities. It lists supported formats and data sources, making the purpose unambiguous and distinct from sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit when-to-use scenarios (e.g., EU NIS2, DORA, SOC 2) and performance characteristics (cache TTL, SLA). It does not explicitly state when not to use or name alternatives, but the context is clear enough for an agent to decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

discovery_prepC

Read-only

Inspect

Préparation discovery — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Discovery Salesforce × Airbus — VP Digital Marc Legrand · Signaux achat confirmés · +28 pts conversion demo. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`contact`	Yes
`ourOffer`	Yes
`prospect`	Yes
`meetingGoal`	No

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly and openWorld. Description adds server-side validation and deliverable nature, but no rate limits, speed, or restrictions beyond that.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, but first is in French which may confuse agents, and the reference case is overly specific. Could be more concise and structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema and no description of return value structure; missing details on what the 'audited deliverable' contains, and parameter semantics are insufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 20% schema description coverage, description should compensate but only vaguely refers to 'documented case fields' without explaining prospect, contact, ourOffer, or meetingGoal.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a structured, audited deliverable for discovery preparation, and gives a reference case, but the action verb is implicit and not clearly distinguished from sibling tools like competitive_deep_dive or battle_cards_live.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Only guidance is server-side validation and sending documented case fields; no explicit when-to-use or when-not-to-use compared to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

diversity_inclusion_metricsC

Read-only

Inspect

Métriques diversité & inclusion — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: Cas démo — Métriques diversité & inclusion. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`ambitions`	Yes
`currentState`	Yes
`regulatoryContext`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, and the description aligns by stating it returns a deliverable, implying no mutation. Adds that inputs are validated server-side and output is audited, but does not detail potential limitations or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no fluff, front-loading purpose and context. Could be slightly more structured but is efficiently brief.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (6 params, nested objects, no output schema), the description is too sparse. It does not clarify the deliverable's content, async behavior, or how to populate fields like ambitions. Leaves the agent underinformed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 17%, yet the description adds no parameter-specific guidance. It only says 'send the documented case fields' without explaining the meaning of required objects (company, currentState) or optional fields (focus, regulatoryContext).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a structured, audited deliverable for diversity and inclusion metrics, with reference to a demo case and C-suite expertise. This distinguishes it from many ESG-related siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. Mentions server-side validation and a reference case but lacks criteria for selection or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

domain_tech_fingerprintB

Read-only

Inspect

Empreinte tech d'un domaine — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: What is the tech stack of — frontend, CMS, analytics, CRM, CDN, hosting? · What buying signals does 's technology footprint reveal for sales prospecting? · Analyze for supply-chain technology risk and third-party vendor exposure. · What is the best outreach angle for a sales rep targeting based on their detected stack? · Run a CISO-style technology fingerprint on — identify legacy tech, missing security headers, and vendor risk. · Has recently changed their marketing or analytics stack — any vendor adoption signals? Reference case: velora-payments.io · Next.js + Cloudflare + Stripe + GA4 + HubSpot · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`depth`	Yes		standard
`focus`	Yes		tech-buying
`target_domain`	Yes

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true and openWorldHint=true. The description adds that the tool returns a structured deliverable and validates inputs server-side, but does not detail rate limits, performance, or side effects. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is somewhat verbose, listing many questions and a reference case. It is front-loaded with the main purpose but could be more concise. The structure is clear but not optimal.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of annotations and no output schema, the description adequately conveys the tool's purpose but lacks details on parameter choices (e.g., depth levels, focus options) and the expected output format. It is sufficient but not thorough.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 25%, and the description does not explain the target_domain, focus, or depth parameters beyond what is in the schema. The async parameter is described in schema but not elaborated in description. No additional semantics provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool returns a structured audit of a domain's tech stack, buying signals, and supply-chain risk. It distinguishes from siblings by focusing on technology fingerprinting and sales prospecting insights, though some sibling tools like 'competitive_deep_dive' may overlap.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage through a list of questions it answers (e.g., 'What is the tech stack...?'). It provides a reference case but lacks explicit when-to-use or when-not-to-use guidance compared to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dora_metrics_deep_diveA

Read-onlyIdempotent

Inspect

Analyzes DORA metrics (Deployment Frequency, Mean Time to Recovery, Change Failure Rate) with deep correlation to code review patterns. Designed for CTOs to identify bottlenecks in software delivery pipelines. Inputs include GitHub repository identifiers and optional time ranges. Outputs structured metrics with trend analysis and code review depth insights.

ParametersJSON Schema

Name	Required	Description
`repo`	Yes	GitHub repository in format 'owner/repo'
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`since`	No	Start date for analysis (ISO 8601)
`until`	No	End date for analysis (ISO 8601)
`branch`	No	Branch name to analyze (default: main)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`metrics`	No
`sources`	No
`warnings`	No

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, openWorldHint. The description adds that outputs include trend analysis and code review insights, but does not elaborate on behavioral traits like async handling or rate limits beyond what annotations imply.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with four sentences, front-loaded with the key action. No wasted words, though it could be slightly tighter.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and complete parameter descriptions, the description covers the main purpose and audience adequately. It does not need to explain return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description mentions repo and optional time ranges, matching the schema. It adds no additional semantic meaning beyond the parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes DORA metrics with correlation to code review patterns, specifies the target audience (CTOs), and the goal (identify bottlenecks). This distinguishes it from sibling tools like code_review_depth_optimizer.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context on when to use (for CTOs to identify bottlenecks) but does not explicitly mention when not to use or compare to alternatives among siblings. Clear but lacks exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dora_operational_resilience_stress_tesA

Read-onlyIdempotent

Inspect

Assess DORA operational resilience by simulating ICT failure scenarios for financial entities. Designed for legal/compliance teams to evaluate ICT risk management under DORA Article 25. Inputs include failure scenario parameters (e.g., ICT service type, duration, impact radius) and entity profile. Outputs structured resilience scores, regulatory gaps, and mitigation recommendations with EUR-Lex/FTC enforcement references.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entityType`	Yes
`impactRadius`	Yes
`ictServiceType`	Yes
`existingMitigations`	No
`failureDurationHours`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`warnings`	No
`regulatoryGaps`	Yes
`resilienceScore`	Yes
`simulationTimestamp`	No
`recommendedMitigations`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, openWorldHint, and idempotentHint. The description adds value by stating it simulates scenarios and outputs structured scores, gaps, and recommendations with legal references, aligning with the annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the main verb, and contains no redundant information. Every sentence adds value: purpose, target users, and outputs.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the moderate complexity (6 params, 4 required) and the presence of an output schema, the description adequately covers what the tool does and what it returns. The mention of regulatory references completes the context for the intended use case.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 6 parameters with only 17% description coverage. The tool description mentions failure scenario parameters (ictServiceType, duration, impactRadius) and entity type, but does not detail existingMitigations or the async parameter. It adds some meaning but leaves several parameters unexplained.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: assessing DORA operational resilience by simulating ICT failure scenarios for financial entities. It specifies the regulatory context (DORA Article 25) and distinguishes itself from siblings like 'dora_metrics_deep_dive' through its focus on stress testing and simulation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description targets legal/compliance teams for ICT risk management evaluation. It provides clear context but does not explicitly state when not to use or name alternative tools. The focus on DORA Article 25 implies its specific applicability.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dpdp_consent_artifact_generatorA

Read-onlyIdempotent

Inspect

Generates structured consent artifacts compliant with India's Digital Personal Data Protection Act (DPDP). Designed for legal teams to verify or create consent records with timestamped logs, purpose limitation, and data subject rights. Accepts data subject details, processing purpose, and legal basis as inputs. Returns a signed artifact with audit trail and validation status.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`legalBasis`	Yes	Legal basis for processing under DPDP
`dataSubjectId`	Yes	Unique identifier for the data subject
`dataCategories`	No	Categories of personal data being processed
`processingPurpose`	Yes	Specific purpose for data processing
`retentionPeriodDays`	No	Retention period in days

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`artifact`	No
`warnings`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and openWorldHint=true. The description adds behavioral details: the output is a 'signed artifact with audit trail and validation status'. This goes beyond the annotations by specifying what the tool returns. However, it does not explicitly confirm the read-only nature, leaving slight ambiguity with the word 'create'.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, covering what the tool does, its target users, inputs, and output. It is concise and front-loaded with the core purpose. No redundant information is present.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 6 parameters (including optional ones), full schema coverage, and an output schema, the description adequately covers the legal context (DPDP), target users, and key inputs. It mentions the output contains an audit trail and validation status, which is helpful. It does not discuss prerequisites or limitations (e.g., that the tool does not persist data), but overall it is fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the input schema already documents all parameters. The description adds that it 'accepts data subject details, processing purpose, and legal basis as inputs', mirroring the required parameters. It does not detail the optional parameters (dataCategories, retentionPeriodDays) or explain the async parameter, so it provides marginal additional value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool 'Generates structured consent artifacts compliant with India's Digital Personal Data Protection Act (DPDP)'. It specifies the target users (legal teams) and the type of resource (consent artifacts with timestamped logs, purpose limitation, data subject rights). Among siblings, there is no other DPDP-specific tool, so it is well-differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'designed for legal teams to verify or create consent records', providing context for use. However, it does not explicitly state when to avoid using this tool or mention alternatives like privacy_compliance_audit for broader compliance. The usage is implied but not contrasted with other tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dual_use_export_risk_mapperA

Read-onlyIdempotent

Inspect

As a COO, quickly assess export compliance risks for components in your supply chain. This tool analyzes bills of materials (BOMs) against EU dual-use export control lists and ICAO/IMO restricted items data. Input a list of part numbers, descriptions, or HS codes to receive a risk assessment with actionable insights. Output includes risk levels, applicable regulations, and source references for audit trails.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`bomItems`	Yes
`includeSources`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`results`	No
`sources`	No
`warnings`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and openWorldHint, which signal safe, repeatable behavior. The description adds value by detailing output composition risk levels, regulations, source references. This provides behavioral context beyond the annotations, such as the audit trail feature, without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences, front-loaded with the primary purpose. Every sentence adds essential information audience, tool function, inputs, outputs without redundancy. There is no verbose fluff, making it highly efficient for an AI agent to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and the presence of an output schema, the description covers key aspects: target user, input type, analysis scope, and output details. It lacks mention of the async parameter for long-running jobs, but that is partially covered by the schema. Overall, it is sufficiently complete for typical use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 33% (only async has a description), but the description compensates by explaining the bomItems parameter as a list of part numbers, descriptions, or HS codes. It also alludes to includeSources via 'source references,' adding meaning beyond the schema. However, the async parameter is omitted, slightly reducing completeness.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes BOMs against EU dual-use and ICAO/IMO restricted items to assess export compliance risks. It specifies inputs (part numbers, descriptions, HS codes) and outputs (risk levels, regulations, source references). This provides a precise verb+resource+scope, effectively distinguishing it from siblings like dual_use_tech_diversion_monitor.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies using the tool when a COO needs quick export compliance risk assessment for components, but it lacks explicit guidance on when not to use it or alternatives. No mention of prerequisites or complementary tools, leaving the agent to infer context from the tool name alone. A score of 3 reflects adequate but not explicit usage guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dual_use_tech_diversion_monitorA

Read-onlyIdempotent

Inspect

Asynchronous T5-level tool for COO persona to detect unauthorized diversion of dual-use technologies. Cross-references shipment manifests, EU sanctions lists, and ICAO/IMO transport data to identify suspicious transfers. Inputs: shipment IDs, company identifiers, or geographic routes. Outputs structured diversion risk assessment with source provenance. Requires async:true to avoid 402 timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`route`	No
`companyId`	No	Company registration number or tax identifier
`shipmentId`	No	Unique shipment identifier (e.g., bill of lading number)
`techCategory`	No	Dual-use technology category

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`matches`	No
`sources`	No
`warnings`	No
`diversionRisk`	No	Calculated diversion risk score (0-100)

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant context beyond annotations: it states the tool is asynchronous (requires async:true), T5-level, and outputs structured risk assessments with source provenance. It does not contradict annotations (readOnlyHint, idempotentHint). The description reinforces that it is a read-only monitoring tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the main purpose, and each sentence adds value. No redundant or vague language. It efficiently conveys key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, nested objects, output schema exists), the description covers the async requirement, input types, data sources, and output nature. It does not explain the output schema, but that is acceptable since an output schema is provided. Minor gaps: no mention of techCategory parameter or parameter optionality, but schema handles that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description lists input types (shipment IDs, company identifiers, routes) but does not detail each parameter beyond what the schema provides. Since schema coverage is high (80%), the baseline is 3. The description adds grouping context but no novel parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool detects unauthorized diversion of dual-use technologies, specifying the verb 'detect' and resource. It mentions cross-referencing and output types. However, it does not explicitly differentiate from the sibling 'dual_use_export_risk_mapper', which may overlap in purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for COO persona and monitoring diversion, and it notes the async requirement to avoid timeouts. However, it does not provide explicit guidance on when not to use this tool or mention alternatives among the many sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

earnings_reviewerB

Read-only

Inspect

Earnings Reviewer — Gapup agent-payable C-suite expertise (FUNDRAISING). Returns a structured, audited deliverable. Reference case: Salesforce Q3 FY2026 — call transcript + 10-Q + guidance → analyst note. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`quarter`	Yes
`analystFocus`	No
`secFilingContext`	No
`transcriptExcerpt`	Yes

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description states it returns a structured, audited deliverable, adding context to the readOnlyHint and openWorldHint annotations. It does not contradict annotations. However, it does not elaborate on potential variability or other behavioral traits beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise but includes jargon ('Gapup agent-payable') and a reference case that, while useful, adds length. The structure is acceptable but could be more streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (6 parameters, nested objects, no output schema), the description is incomplete. It does not describe the return format, how inputs relate to the output, or how to interpret the deliverable. The reference case provides some context but is insufficient for full understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema description coverage at 17%, the description adds minimal parameter meaning beyond 'send the documented case fields.' It does not explain the purpose of individual parameters like company, quarter, or transcriptExcerpt, which is inadequate for a tool with 6 parameters including nested objects.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as an 'Earnings Reviewer' for C-suite fundraising expertise, and states it returns a structured, audited deliverable. The reference case provides concrete context. However, it does not explicitly distinguish from the sibling tool 'earnings_transcript_signals'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description hints at use cases (fundraising, C-suite) and mentions inputs are validated server-side, but lacks explicit guidance on when to use versus alternatives or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

earnings_transcript_signalsA

Read-only

Inspect

Earnings call transcript signal extractor for equity research analysts, catalyst-driven hedge funds, and BD teams. Parses earnings transcripts (fetched or provided) to surface:

• signals (P0/P1/P2): guidance raise/cut, miss/beat vs consensus, buyback, dividend change, new product, executive change, capex shift, M&A intent, regulatory risk, competitive threat, supply chain, hiring • kpis_mentioned: Revenue, EBITDA, EPS, FCF, Gross Margin, Operating Margin with YoY/QoQ % • guidance: raised / maintained / cut / new_initiated items extracted • q_and_a_topics: top Q&A themes detected (AI strategy, China exposure, M&A pipeline, macro, etc.) • overall_tone: bullish / neutral / bearish

Sources fetched automatically: SEC EDGAR 8-K filings, Yahoo Finance earnings news, Motley Fool transcripts. If no transcript can be retrieved from any source, returns status:'failed' with an explicit warning and empty signals — never fabricated data. Accepts transcript_text override for direct analysis. Supports multilingual transcripts (de/fr/es/zh). European tickers (SAP.DE, BMW.DE) mapped to EDGAR-compatible equivalents automatically.

ParametersJSON Schema

Name	Required	Description
`lang`	No	Language hint for the transcript. Affects mock transcript language when fetch fails.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`quarter`	No	Fiscal quarter in format Q1-2026. Defaults to the most recent past quarter.
`transcript_text`	No	If provided, skips all external fetches and analyses this text directly. Minimum 100 characters.
`company_or_ticker`	Yes	Company name or ticker symbol (e.g. 'Tesla', 'TSLA', 'SAP', 'SAP.DE', 'Sanofi', 'SNY'). European tickers (SAP.DE, BMW.DE) are mapped to their ADR equivalents for EDGAR lookup.

Tool Definition Quality

A4.3/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations: it explains fallback behavior (returns 'failed' status with explicit warning if no transcript found), multilingual support, European ticker mapping, and the never-fabricated-data policy. Annotations already indicate read-only and open world, but the description enriches this.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with bullet points for outputs, making it easy to scan. It is front-loaded with the main purpose. However, it is somewhat lengthy, but every part adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (many signal types, 5 parameters, no output schema), the description is remarkably complete. It covers inputs, sources, fallback, output structure, and edge cases. Provides all necessary information for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all 5 parameters. The description adds extra meaning: minimum 100 characters for transcript_text, effect of lang on mock transcripts, default quarter behavior, and European ticker mapping for company_or_ticker. This goes beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool extracts signals from earnings call transcripts, listing specific outputs like signals, KPIs, guidance, etc. It is distinguishable from siblings by its focus on transcript signals, but does not explicitly differentiate from tools like 'earnings_reviewer'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description identifies target users (analysts, hedge funds, BD teams) and explains when to use it (e.g., for analyzing earnings transcripts). It does not explicitly mention when not to use it or provide alternatives, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

economic_indicatorA

Read-only

Inspect

Return a precise macroeconomic indicator for a country — the exact figure for a market-sizing, finance or strategy workflow. Indicators: gdp_usd, gdp_per_capita, gdp_growth, inflation, unemployment, population. Source: World Bank. When to use: an agent's analysis needs an authoritative country-level economic figure. Inputs: country (ISO-2 or ISO-3 code) and indicator name.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`country`	Yes	Country code, ISO-2 or ISO-3 (e.g. FR, USA)
`indicator`	Yes	Macroeconomic indicator name

Output Schema

ParametersJSON Schema

Name	Required	Description
`year`	Yes
`value`	Yes
`source`	Yes
`country`	Yes
`indicator`	Yes
`source_url`	No
`indicator_code`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and openWorldHint. The description adds value by disclosing the data source (World Bank) and implying the tool returns a single authoritative figure, enhancing transparency beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, using only three sentences with a clear structure: purpose, supported indicators, and usage context. Every sentence adds value; no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity and the presence of an output schema, the description adequately covers the necessary behavioral and usage context. It mentions the exact figure returned and the source, which is sufficient for a simple data retrieval tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the schema descriptions are already clear. The description repeats the parameter information (country code and indicator name) but adds listing the indicator options, which provides some additional clarity. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a precise macroeconomic indicator for a specific country, lists the supported indicators and source, and differentiates it from sibling tools by focusing on authoritative country-level economic data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a clear 'When to use' clause specifying the context for using the tool. It does not explicitly mention when not to use it or provide alternatives, but the context is sufficient for the intended use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

email_domain_health_checkA

Read-onlyIdempotent

Inspect

Comprehensive email domain health check: MX routing, SPF authentication, DKIM signing, DMARC policy enforcement, DNSBL blacklist status (Spamhaus/SpamCop/Barracuda), TLS certificate validity, and WHOIS registration age. Aggregates a reputation score 0-100 and generates P0/P1/P2 deliverability signals. Accepts a domain (stripe.com) or email address (info@stripe.com). Detects role-based addresses (info@, support@, admin@, noreply@) that have higher bounce rates. Detects email provider (Google Workspace, Microsoft 365, Amazon SES, etc.). P0 signals: blacklisted / no MX / TLS expired / no SPF + DMARC none. P1 signals: SPF soft-fail / no DKIM selector / DMARC no reporting. P2 signals: role-based address / TLS expires <30d / domain age <90 days. All checks are keyless (no API keys required). Cache TTL 1h. SLA <=10s p95.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`email`	No	Full email address for additional checks: format validity, role-based detection (e.g. "ceo@stripe.com").
`checks`	No	Subset of checks to run. Defaults to all 8: ["mx","spf","dkim","dmarc","blacklist","whois","tls","reputation"]. Use a subset for faster responses (e.g. ["mx","spf","dmarc","reputation"] for quick scoring).
`domain`	Yes	Domain to check (e.g. "stripe.com" or "@stripe.com"). If an email address is provided here, the domain is extracted automatically.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mx`	Yes
`spf`	Yes
`tls`	No
`dkim`	Yes
`dmarc`	Yes
`whois`	No
`domain`	Yes
`status`	Yes
`sources`	Yes
`blacklist`	Yes
`email_valid`	No
`quality_score`	Yes
`reputation_score`	Yes
`email_is_role_based`	No
`deliverability_signals`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnly, idempotent, non-destructive) are complemented with details: cache TTL 1h, SLA <=10s p95, async option, keyless checks, and signal hierarchy (P0/P1/P2). No contradictions; adds significant behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is thorough but slightly verbose; every sentence adds value. Could be tightened without losing critical information. Well-structured, front-loaded with main purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given tool complexity (multiple checks, signals, async, caching), description covers all aspects: input options, output characteristics (signals, score), performance (SLA, cache), and dependencies (keyless). Output schema likely covers return fields.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds crucial context: domain accepts email addresses with auto-extraction, checks parameter for subset usage, async for slow requests, and email for role-based detection. Extends meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Comprehensive email domain health check' and lists specific checks (MX, SPF, DKIM, DMARC, blacklist, TLS, WHOIS). It also mentions reputation score and deliverability signals. Distinguishes from siblings as no other tool on the server targets email domain health.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context for use: checking email domain health, detecting issues like blacklisting and TLS expiry. Mentions keyless operation, cache TTL, and SLA. Lacks explicit when-not-to-use or comparison to alternatives, but coverage is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

enps_autoC

Read-only

Inspect

eNPS automatisé — Gapup agent-payable C-suite expertise (CHRO). Returns a structured, audited deliverable. Reference case: BlaBlaCar — eNPS pulse mensuel · 700 FTE 8 pays · segments × tenure × manager · plays correctifs ciblés. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`context`	Yes
`toolStack`	Yes
`segmentation`	Yes
`presenterScript`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, so the description's claim of returning a deliverable is consistent and non-contradictory. The description adds that inputs are validated server-side, but overall behavioral detail beyond annotations is minimal.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes a reference case and some French text that may not be essential. It is not excessively long but could be more efficient and front-loaded with core information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, nested objects, no output schema), the description omits critical details like the exact return format, the meaning of 'audited deliverable', and prerequisites for use. It relies heavily on the implicit understanding of eNPS concepts.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 14% schema description coverage, the description fails to compensate by explaining the intended meaning of the many undocumented fields. It merely says 'send the documented case fields' without adding context to the schema properties.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it's for automated eNPS analysis targeting CHROs and returns a structured deliverable. It includes a concrete reference case (BlaBlaCar) that illustrates the tool's scope. However, it does not explicitly distinguish it from sibling tools, though no direct competitor is evident.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a reference case but lacks explicit guidance on when to use this tool versus alternatives, such as other HR analytics tools in the sibling list. No when-not-to-use or alternative tool mentions are present.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

esg_audit_multiA

Read-only

Inspect

Multi-mode ESG intelligence for ESG analysts, sustainability officers and impact investing fund managers. Aggregates live data from CDP, SBTi, Wikipedia, Yahoo Finance and web search across five modes: • company_score — ESG score 0-100 with E/S/G breakdown + heuristic rating (AAA-CCC), from CDP grade + SBTi + sector profile • controversy_check — controversies detected via web search, classified P0/P1/P2 by type (greenwashing, emissions fraud, labour, governance) • emissions — GHG Scope 1/2/3 estimates, SBTi validation flag, net-zero target year, carbon intensity per M€ revenue • esrs_readiness — CSRD gap across 12 standards (E1-E5, S1-S4, G1-G3): readiness % + gap list + CSRD deadline + effort man-days • sfdr_classification — suggested SFDR Article 6/8/9 with rationale and sustainability indicators met

Signals: P0=critical (controversy/score<40), P1=significant (score<55/SBTi missing/ESRS<50%), P2=watch. Cache 24h.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analysis mode.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Company name, ticker, ISIN or LEI (e.g. "Microsoft", "Sanofi", "Volkswagen").
`pillar`	No	ESG pillar filter (optional, default: all).
`framework`	No	ESG framework filter (optional, default: all).

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`emissions`	No
`company_score`	No
`controversies`	No
`quality_score`	Yes
`esrs_readiness`	No
`sfdr_classification`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true and destructiveHint=false. Description adds substantial context: cache duration (24h), data sources, and signal levels, which go beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with bullet points and clear sections. Front-loads the purpose and then lists modes. While detailed, it is efficient for the complexity of the tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers modes, signals, sources, and caching. Output schema is present, so return values are not needed. Could mention error handling or default modes, but overall complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with descriptions for all parameters. The tool description does not add further meaning beyond what is in the schema, so baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly identifies itself as 'Multi-mode ESG intelligence' and lists five distinct modes with brief explanations. Differentiates from sibling ESG tools by specifying the modes and data sources.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implicitly suggests usage through mode descriptions but lacks explicit guidance on when to use this tool over alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

esrs_narrative_builderC

Read-only

Inspect

Architecte du narratif ESRS / CSRD — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: L'Oréal France — narratif ESRS E1+E5 + S1 + G1 · CSRD reporting 2025-2026 · double-matérialité chiffrée. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`scope`	Yes
`company`	Yes
`context`	Yes
`presenterScript`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint; the description adds that it returns a 'structured, audited deliverable' and that inputs are validated server-side. This provides moderate context but no details on response format, error handling, or cost implications.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise and front-loaded with the purpose, but includes a long reference case example that could be shortened without losing meaning. No wasted words, but some noise from the example.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, 6 parameters, no output schema, many siblings), the description is incomplete. It does not explain the return value structure, error scenarios, or how the optional async parameter behaves. The reference case provides domain context but does not cover operational details needed for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema description coverage at only 17%, the description barely explains parameters. It only mentions 'send the documented case fields' but does not elaborate on purpose or format of company, scope, context, focus, or presenterScript, leaving agents to rely solely on field names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool builds an ESRS/CSRD narrative deliverable and references a specific case (L'Oréal). It is strong on verb+resource but does not explicitly differentiate from sibling tools like sustainability_report or sustainability_reporting_pilot, though the ESRS/CSRD specificity provides some distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description does not mention conditions, prerequisites, or when not to use it, leaving the agent without context for appropriate invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

event_marketingC

Read-only

Inspect

Marketing événementiel — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Pennylane (€120k/an budget événements) — 7 événements sélectionnés · coût-MQL -38% vs année précédente. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`teamSize`	Yes
`geography`	Yes
`objectives`	Yes
`currentEvents`	Yes
`targetAudience`	Yes
`annualBudgetEur`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint, indicating safe read behavior and potential entity creation. The description adds that inputs are validated server-side and the deliverable is audited, but does not clarify the nature of the deliverable or any side effects beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description front-loads the purpose and includes a reference case as an example. While the reference case is useful, it adds length without clarifying usage. Overall, it is fairly concise and structured with a clear lead.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 parameters, nested objects, no output schema), the description is incomplete. It does not explain the output format, what the deliverable contains, or how inputs map to the result. The mention of 'documented case fields' assumes prior knowledge not provided.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%, meaning most parameters lack descriptions. The description merely says to 'send the documented case fields' without explaining what each parameter (company, targetAudience, etc.) means or how to fill them. This fails to compensate for the poor schema documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is for event marketing ('Marketing événementiel') and that it returns a structured, audited deliverable. However, it does not differentiate itself from sibling marketing tools like customer_marketing or marketing_roi_dashboard, which could have overlapping purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

There is no guidance on when to use this tool versus alternatives. No mention of prerequisites, when not to use it, or how it compares to other tools. The description only states what it does, not the context of use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

executive_comp_peer_benchmarkA

Read-onlyIdempotent

Inspect

As a Chief Human Resources Officer (CHRO), benchmark executive compensation packages against peer companies using public SEC filings and private compensation data from Equilar and Bloomberg. Inputs include executive name, title, company ticker, and peer group criteria. Outputs structured compensation metrics (base salary, bonus, equity, total compensation) with source attribution and confidence scores.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`peerGroup`	No
`fiscalYear`	No
`companyTicker`	Yes
`executiveName`	Yes
`executiveTitle`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`compensation`	No

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description exceeds the annotation hints by detailing data sources (public SEC filings and private Equilar/Bloomberg data) and output characteristics (source attribution, confidence scores). This adds significant behavioral context beyond the readOnlyHint, openWorldHint, and idempotentHint.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences that front-load the role and action, then detail inputs and outputs. Every sentence contributes value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and the presence of an output schema, the description provides sufficient context: purpose, inputs, data sources, and output structure. It leaves no critical gaps for a benchmarking tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Although schema description coverage is only 17%, the description explains the purpose of key parameters (executive name, title, company ticker, peer group criteria). It adds meaning beyond the raw schema, but omits details for fiscalYear and async parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: benchmarking executive compensation packages against peer companies. It specifies the role (CHRO), the data sources (SEC filings, Equilar, Bloomberg), and the output metrics. This distinguishes it from sibling tools that focus on other aspects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use the tool (CHRO benchmarking) but does not explicitly state when not to use it or compare it to alternatives like comp_benchmark_geo_delta. The persona-based framing helps but lacks explicit exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

financial_model_3statementA

Read-only

Inspect

Pure-compute 3-statement financial model builder (Income Statement + Balance Sheet + Cash Flow). Feed assumptions (revenue growth, COGS%, OpEx, CapEx, working capital, tax rate, depreciation, debt schedule) → receive a full 3-5 year projection with integrated DCF valuation. Supports IFRS / US_GAAP / PRC_GAAP (中国会计准则) norms with bilingual ZH+EN labels for PRC. Modes: build (full 3-statement model) | scenario_analysis (base/bull/bear ±20% growth) | sensitivity (1 KPI × 1 input, 5-point grid). No external data needed — all computed from assumptions. ICP: VC due diligence, M&A analysts, CFO SMB, startup founders pitching investors, biotech/SaaS modeling. Returns balance_check_ok per year, DCF enterprise/equity value, and coherence warnings.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	build = full 3-statement model \| scenario_analysis = base/bull/bear \| sensitivity = 1 KPI × 1 input
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`assumptions`	Yes	Financial assumptions for the model
`sensitivity_kpi`	No	KPI to observe in sensitivity mode.
`sensitivity_input`	No	Assumption param to vary in sensitivity mode. E.g. 'growth_rates_pct[0]' or 'cogs_pct_of_revenue'.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`norms`	Yes
`status`	Yes
`sources`	No
`warnings`	Yes
`cash_flow`	No
`scenarios`	No
`sensitivity`	No
`balance_sheet`	No
`quality_score`	Yes
`valuation_dcf`	No
`income_statement`	No

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description explicitly states 'Pure-compute' and 'No external data needed — all computed from assumptions,' which aligns with the readOnlyHint annotation. It also describes return values (balance_check_ok, DCF values, coherence warnings), adding behavioral context beyond the annotation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is lengthy but well-structured, starting with the core function, then detailing modes, accounting standards, and ICP. Every sentence provides useful information, though it could be slightly more concise without losing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested assumptions, multiple modes, output schema), the description is complete. It covers all necessary aspects: inputs, modes, return values, and target users, making it easy for an AI agent to understand and invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed descriptions for each parameter. The tool description adds value by summarizing the purpose of the assumptions object and the modes, but does not repeat schema details unnecessarily.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool builds a 3-statement financial model from assumptions, specifying the outputs (3-5 year projection with DCF valuation) and supporting multiple accounting standards. It distinguishes itself from sibling tools by its specific financial modeling focus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists ideal customer profiles (VC due diligence, M&A analysts, etc.) and explains the three modes (build, scenario_analysis, sensitivity). While it does not explicitly say when NOT to use, the context is sufficient for selective invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

fraud_detectorC

Read-only

Inspect

Détecteur de fraude — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: TechManu SAS — Industriel FR €32M CA, 148 FTE · 30j · 21 anomalies · €487k à risque. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`analysisPeriodDays`	Yes
`transactionVolumes`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that it returns an 'audited deliverable' and inputs are validated server-side, but does not elaborate on side effects, authorization, or dynamic behavior. Adds some value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is short but includes unnecessary jargon and mixed languages. Could be more concise and front-loaded with critical information. Every sentence has some value but could be improved.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Tool is moderately complex with nested objects and no output schema. The description lacks information on data sources, synchronicity handling (async parameter), latency, and expected output fields. The reference case provides partial context but is insufficient for full comprehension.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only 'async' has description). The description does not detail parameters; it vaguely says 'send the documented case fields' and provides a reference case example. Low compensation for missing schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it's a fraud detector that returns a structured, audited deliverable, with a concrete reference case. The verb 'detects fraud' and resource 'company data' are clear, but the jargon 'Gapup agent-payable C-suite expertise (RISK)' is obscure. It somewhat distinguishes from specialized siblings but not explicitly.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like x402_payment_fraud_detector. No explicit when-to-use or when-not-to-use conditions. Only mentions server-side validation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_business_ideasA

Read-only

Inspect

Return vetted, automation-scored business ideas from the FTG idea bank — each with an autonomy score, monetization model and conservative/median/optimistic MRR projections. When to use this tool: an agent or founder wants ranked, buildable business ideas. Input: optional category and limit.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No
`category`	No	Optional category filter

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`ideas`	Yes

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, so the description does not need to repeat safety traits. It adds value by describing the output structure (scores, MRR projections), but does not disclose any additional behavioral nuances beyond what annotations convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise—only two sentences. The first sentence delivers the core purpose and output details, and the second adds usage context and inputs. Every sentence earns its place with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown), the description sufficiently covers the tool's objectives, input optionality, and output content. It omits mention of the async parameter, but that is likely a common pattern across tools. Overall, it is complete for a read-only idea discovery tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 67%, and the description adds that inputs are 'optional category and limit.' The async parameter is already well-described in the schema. The description confirms optionality but does not elaborate on limit or category beyond what the schema provides, keeping it at baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns 'vetted, automation-scored business ideas' with specific output details (autonomy score, monetization model, MRR projections). This differentiates it from general business idea tools, though it doesn't explicitly contrast with siblings like ftg_opportunity_scout.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'When to use this tool: an agent or founder wants ranked, buildable business ideas.' This provides direct usage context, but it lacks when-not-to guidance or explicit alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_business_planA

Read-only

Inspect

Return the business plan for a market-gap opportunity — direct-trade or local-production, with CAPEX, OPEX, ROI, payback period, automation level and the full plan. Cache-first: returns the stored plan when available, otherwise reports that generation is required (the FTG platform produces plans on demand). When to use this tool: an agent has an opportunity_id (from ftg_market_gap) and needs the investable plan. Input: an opportunity_id.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`opportunity_id`	Yes	Opportunity id obtained from ftg_market_gap

Output Schema

ParametersJSON Schema

Name	Required	Description
`plans`	No
`status`	Yes
`message`	No
`plan_count`	No
`opportunity_id`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, openWorldHint) are consistent. Description adds important transparency about caching ('Cache-first: returns the stored plan when available, otherwise reports that generation is required') and the on-demand nature. This explains the tool's behavior beyond what annotations convey, though the output schema also provides structural detail.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Extremely concise: two sentences plus a short usage note and input line. No wasted words. Front-loaded with the core purpose and key components.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and clear annotations, the description covers necessary context: how it integrates with ftg_market_gap, caching behavior, and the specific financial metrics returned. No obvious gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both 'async' and 'opportunity_id'. The description only restates that input is an opportunity_id, adding no new meaning. Baseline score of 3 is appropriate as the schema already documents parameters adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a business plan for a market-gap opportunity, specifying included components (CAPEX, OPEX, ROI, payback period, automation level). It distinguishes from siblings by noting that the opportunity_id comes from ftg_market_gap, making the tool's role in the workflow explicit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use guidance: when the agent has an opportunity_id from ftg_market_gap and needs the investable plan. It also notes cache-first behavior. It does not explicitly state when not to use or provide alternative tool names, but the context is clear enough for correct selection among many siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_country_regulationsA

Read-only

Inspect

Return import, trade and production regulations for a country — category, title, summary and source. When to use this tool: an agent checks regulatory or compliance requirements before trading or producing in a market. Input: a country, with an optional category.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No
`country`	Yes	Country ISO code or name
`category`	No	Optional regulation category filter

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`regulations`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint, so the description need not repeat those. The description adds value by disclosing the return structure (category, title, summary, source) and that category is optional. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: the first states the return value; the second gives usage context and input. Every sentence is informative and there is no redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity and the presence of an output schema, the description provides adequate coverage of inputs and outputs. It could briefly mention the async parameter for long-running queries, but the overall information is sufficient for a standard lookup tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is high (75%) with descriptions for async, country, and category. The description adds minimal extra semantics beyond 'optional category', essentially restating the schema. It does not explain async or limit, but those are covered by the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns import, trade, and production regulations for a country with specific fields (category, title, summary, source). It also provides context for when to use it (regulatory/compliance checks before trading/producing), distinguishing it from sibling tools like ftg_country_study or ftg_market_gap.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly mentions when to use the tool ('agent checks regulatory or compliance requirements before trading or producing in a market') and specifies the input (country with optional category). It does not mention when not to use or alternatives, but the context is clear enough for an AI agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_country_studyA

Read-only

Inspect

Return the in-depth FTG country study — multi-part structured analysis of a country's trade and production landscape. When to use this tool: an agent needs deep country context before a sourcing, export or investment decision. Input: a country.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`country`	Yes	Country ISO code or name

Output Schema

ParametersJSON Schema

Name	Required	Description
`note`	No
`parts`	Yes
`country`	Yes
`part_count`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, indicating safe, read-only behavior with dynamic output. The description adds that the study is 'multi-part structured analysis', which hints at complexity but does not elaborate on specific behaviors like error handling, response size, or data freshness. With annotations covering the core safety profile, the description provides incremental but limited value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences plus a brief 'Input: a country' line. Every sentence adds value: first defines the output and its structure, second states when to use it. No wasted words, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (not shown, but present), the description does not need to detail return values. It covers purpose, usage, and input. However, it omits any mention of the async parameter's behavior (job polling), which is partially covered by the schema but could be mentioned briefly. Still, overall complete enough for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%: both parameters (async and country) have descriptions in the schema. The description merely restates 'Input: a country,' which adds no new meaning beyond what the schema already provides for the country parameter. For the async parameter, the schema description is comprehensive, so the description does not need to add more. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns an 'in-depth FTG country study' with specific context: 'multi-part structured analysis of a country's trade and production landscape'. This verb+resource pairing is distinct from sibling FTG tools like ftg_country_regulations or ftg_market_gap, which cover different aspects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly tells when to use the tool: 'an agent needs deep country context before a sourcing, export or investment decision.' This is clear context for usage, though it does not provide negative guidance or alternative tool names for when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_investor_directoryA

Read-only

Inspect

Return investors from the FTG directory — VC, PE and impact funds with type, firm, website, ticket-size range, sectors and stages of interest. When to use this tool: an agent builds a fundraising shortlist. Input: optional country and limit.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No
`country`	No	Optional country ISO code or name

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`investors`	Yes

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

ReadOnlyHint and openWorldHint already declare safety and non-destructive behavior. The description adds what fields are returned but does not disclose other behavioral traits like rate limits or result size. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: first sentence defines the tool's output, second sentence gives usage context and inputs. No wasted words, every sentence earns its place, and key info is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the essential functionality for a read-only investor directory lookup. Output schema exists, so return format details are not needed. Could mention pagination or handling of large limits, but still adequate for its simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 67% coverage; the description only mentions 'optional country and limit' without adding detail beyond the schema. The 'limit' parameter lacks description in schema, and the description does not compensate by explaining its purpose or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns investors from FTG directory with specific attributes like type, firm, website, ticket-size, sectors, and stages. The verb 'Return' is specific and distinguishes it from write tools, but does not explicitly differentiate from sibling tools like 'investor_list' or 'investor_shortlist'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'When to use this tool: an agent builds a fundraising shortlist', providing clear context. However, it does not mention when not to use it or name alternative tools for different use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_market_gapA

Read-only

Inspect

Return the import/production market-gap opportunities for a country — commodities where local demand outpaces local supply. Each opportunity carries the gap value (USD/year), the gap volume (tonnes/year), a 0-100 opportunity score and the potential margin. When to use this tool: an agent needs to know what a country structurally under-produces or over-imports, for trade sourcing, import/export or local-production investment decisions. Input: a country (ISO-2 code or name).

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No	Maximum opportunities to return (default 20)
`country`	Yes	Country ISO-2 code (e.g. 'SN', 'KE') or name (e.g. 'Senegal')

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`country`	Yes
`opportunities`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint and openWorldHint. Description adds context on returned fields (gap value, volume, score, margin) beyond annotations. No side effects or limitations mentioned, but adequate for a read tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, front-loaded with purpose, then returns fields, usage guidelines, and input. Every sentence adds value; no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Output schema exists, so description doesn't need to detail return format. Covers purpose, usage, and input adequately. No significant gaps for this tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. Description mentions country ISO-2 or name and limit parameter, but adds minimal new meaning beyond schema. Does not compensate for gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns market-gap opportunities for a country with specific details (gap value, volume, score, margin). It uses a specific verb ('Return') and resource, and distinguishes from siblings like ftg_opportunity_scout which is more general.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: when agent needs to know a country's structural under-production or over-imports for trade decisions. Does not explicitly state when not to use or name alternatives, but the clear use case suffices.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_opportunity_scoutA

Read-only

Inspect

Rank the best countries for a given commodity — where the market gap, opportunity score and potential margin are highest. Cross-country scouting. When to use this tool: an agent has a commodity and needs to know WHERE to sell, export to or set up local production. Input: a commodity name.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No	Maximum countries to return (default 20)
`commodity`	Yes	Commodity name (e.g. 'rice', 'soybean', 'poultry')

Output Schema

ParametersJSON Schema

Name	Required	Description
`note`	No
`count`	Yes
`commodity`	Yes
`countries`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, so the description adds little behavioral detail beyond stating it is cross-country. No contradictions. A score of 3 is appropriate as the description does not contradict and adds minimal context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the main purpose, and includes usage guidance. No unnecessary words, highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists and sibling tools are numerous, the description covers the core purpose, when to use, and input. It lacks output details but the output schema fills that gap. Overall complete for an agent's selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters. The description only repeats the input commodity name without adding semantic depth beyond the schema. Baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it ranks countries for a commodity based on market gap, opportunity score, and potential margin. The verb 'rank' and resource 'countries for commodity' are specific. It distinguishes from sibling tools by focusing on cross-country scouting.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says when to use this tool: 'an agent has a commodity and needs to know WHERE to sell, export to or set up local production.' It provides clear context but does not mention alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_production_economicsA

Read-only

Inspect

Return production cost benchmarks (CAPEX/OPEX per unit, value ranges, scenarios, quality tiers) and agronomic yields (t/ha, cycles per year) for a commodity. When to use this tool: an agent sizes the economics of producing a commodity. Input: a commodity, with an optional country.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No
`country`	No	Optional country ISO code or name
`commodity`	Yes	Commodity name or slug

Output Schema

ParametersJSON Schema

Name	Required	Description
`yields`	Yes
`commodity`	Yes
`cost_benchmarks`	Yes

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint, so the description's addition of return content adds some value but doesn't disclose behavioral traits like async behavior or rate limits. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences, front-loading the purpose and usage guidelines. Every sentence is meaningful and adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and annotations, the description is sufficiently complete. It covers the essential inputs and purpose, though it could mention the async capability for completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 75%, and the description only reinforces the commodity and country parameters. It does not describe the 'limit' parameter (which has no schema description) or add deeper semantic meaning beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns production cost benchmarks and agronomic yields for a commodity. It specifies the verb 'return', the resource, and the context for use, effectively distinguishing it from siblings like ftg_production_methods.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool ('an agent sizes the economics of producing a commodity'), providing clear context. However, it does not mention when not to use it or suggest alternatives, which would have made it a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_production_methodsA

Read-only

Inspect

Return the production methods for a commodity — each with a description, ordered process steps, pros/cons and a popularity rank. Methods are commodity-canonical: one curated set per commodity, reusable across every country. When to use this tool: an agent evaluates HOW a commodity is produced or processed, compares techniques, or builds a production plan. Input: a commodity slug or name.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`commodity`	Yes	Commodity slug or name (e.g. 'rice', 'tomato', 'cashew')

Output Schema

ParametersJSON Schema

Name	Required	Description
`note`	No
`methods`	Yes
`commodity`	Yes
`method_count`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare `readOnlyHint: true` (safe read operation) and `openWorldHint: true`. The description adds behavioral context beyond annotations by noting that methods are 'commodity-canonical: one curated set per commodity, reusable across every country'. This informs the agent that results are consistent and not location-dependent, which is not captured in annotations. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficiently structured: the first sentence states the primary output and key attributes, followed by usage guidance. It is only four sentences long, with no redundancy. Every sentence adds value, making it highly concise and well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that an output schema exists (so return values are documented elsewhere), the description adequately covers the tool's purpose, input, and key output details (description, process steps, pros/cons, popularity rank). It also provides usage context. For a tool with only 2 parameters, the description is complete and sufficient for an agent to decide when to use it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage, with both parameters (`async`, `commodity`) fully described in the schema. The description only repeats 'Input: a commodity slug or name', adding no new meaning beyond what the schema provides. Thus, it meets the baseline for a well-documented schema but does not enhance understanding further.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states what the tool does: 'Return the production methods for a commodity'. It specifies the resource ('production methods'), the action ('return'), and includes a unique characteristic ('commodity-canonical') that distinguishes it from sibling tools like `ftg_country_regulations` or `ftg_production_economics`. The purpose is specific and well-defined.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a 'When to use this tool' section, providing clear context: evaluating how a commodity is produced, comparing techniques, or building a production plan. However, it does not explicitly state when not to use it or mention alternatives among the many sibling tools, which would earn a 5. The guidance is clear but lacks exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_seller_catalogA

Read-only

Inspect

Return seller catalogues registered on FTG — exporters and producers with their commodity, monthly capacity, certifications and target export markets. When to use this tool: an agent builds a supplier or sourcing shortlist. Input: optional seller country and commodity.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No
`country`	No	Optional seller country ISO code or name
`commodity`	No	Optional commodity filter

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`sellers`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the read-only nature is known. The description adds behavioral detail by specifying the returned attributes (commodity, capacity, certifications, markets), which goes beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences, front-loading the purpose followed by usage guidance. Every word adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (4 optional params, output schema exists), the description adequately covers what the tool does and when to use it. The schema handles parameter details, so completeness is sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 75% (3 of 4 parameters documented). The description mentions 'optional seller country and commodity' but does not add details beyond what the schema provides. The async parameter is not mentioned in description but is covered by schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Return' and the resource 'seller catalogues registered on FTG', with specific details about contents (exporters, producers, commodity, monthly capacity, certifications, target markets). It distinguishes itself from siblings like ftg_sourcing_buyers by focusing on seller-side catalogues.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'When to use this tool: an agent builds a supplier or sourcing shortlist.' This provides clear context for when to invoke it, though it does not explicitly mention when not to use it or alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_sourcing_buyersA

Read-only

Inspect

Return verified local buyers in a country — companies sourcing a given commodity, with buyer type, city, website, annual volume range and certification requirements. When to use this tool: an agent builds a sourcing or export shortlist, or needs real B2B demand contacts in a market. Input: a country and an optional commodity filter.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No	Maximum buyers to return (default 20)
`country`	Yes	Country ISO-2 code or name
`commodity`	No	Optional commodity slug to filter buyers by

Output Schema

ParametersJSON Schema

Name	Required	Description
`buyers`	Yes
`country`	Yes
`commodity`	No
`buyer_count`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint and openWorldHint. Description adds 'Return verified local buyers' consistent with readOnly. No contradictions, and provides additional context about output fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two well-structured sentences. First sentence explains what the tool does with key details; second sentence provides usage guidance. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 4 parameters and an output schema, the description covers purpose, usage, and key inputs. Could mention async behavior or pagination but overall sufficient for typical use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds meaning by listing output fields (buyer type, city, website, volume range, certifications). This goes beyond the input schema and helps the agent understand what is returned.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Return verified local buyers in a country' with specific details like buyer type, city, website, volume range, and certification requirements. This distinguishes it from sibling tools like ftg_seller_catalog or ftg_investor_directory.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'agent builds a sourcing or export shortlist, or needs real B2B demand contacts in a market.' Lacks mention of when not to use or alternatives, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

funding_hunterC

Read-only

Inspect

Chasseur de financements — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: PME deeptech cleantech FR €8M CA — top 30 dispositifs BPI+France2030+EU+VC. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`project`	Yes
`financials`	Yes
`preferences`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, which the description does not contradict. The description adds 'returns a structured, audited deliverable' and 'inputs are validated server-side.' However, it omits the critical async behavior hinted by the 'async' parameter in the schema, reducing transparency for actual use.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (two sentences) but mixes French and English, includes a reference case that may be distracting, and lacks a clear, structured outline of functionality. It could be more concise and front-loaded for agent parsing.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex nested input (5 top-level parameters) and no output schema, the description is incomplete. It fails to explain the deliverable format, async handling, or how to interpret the results, leaving critical gaps for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 20% schema description coverage, the description adds no parameter details. It says 'send the documented case fields' but never explains what those fields are or how to provide them, leaving the agent reliant entirely on the schema without additional context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states 'Chasseur de financements' (funding hunter) and mentions it returns a 'structured, audited deliverable' for C-suite expertise, clearly indicating it identifies funding opportunities. However, it relies on jargon and a French reference case, making it slightly less clear than an explicit English purpose statement.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus the many sibling tools (e.g., capital_strategy, investor_list). The description only says to 'send the documented case fields' but does not explain context, prerequisites, or cases where this tool is inappropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

fx_rateA

Read-only

Inspect

Get the current or historical foreign-exchange rate for any currency pair — the exact exchange rate, FX rate or conversion rate an agent needs to convert a currency amount or feed a finance, trading, invoicing or pricing workflow. Covers EUR/USD, USD/JPY, GBP/EUR and every ISO-4217 currency pair. Returns the latest spot rate, or a historical rate by date. Use when a workflow needs a precise live or past currency exchange rate, or to convert money between two currencies. Source: European Central Bank reference rates via Frankfurter. Inputs: from/to ISO-4217 currency codes, optional date (YYYY-MM-DD).

ParametersJSON Schema

Name	Required	Description
`to`	Yes	Quote currency, ISO-4217 (e.g. USD)
`date`	No	Optional YYYY-MM-DD for a historical rate
`from`	Yes	Base currency, ISO-4217 (e.g. EUR)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.

Output Schema

ParametersJSON Schema

Name	Required	Description
`to`	Yes
`from`	Yes
`rate`	Yes
`as_of`	Yes
`source`	Yes
`source_url`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that it returns the latest spot rate or a historical rate by date, and sources from ECB via Frankfurter. This provides useful behavioral context beyond annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, efficiently front-loaded with purpose, and no redundant information. Every sentence adds value, making it easy to scan.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists (so return values need not be detailed), the description covers purpose, usage guidance, inputs, and source. For a simple lookup tool, it is fully complete with no gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all four parameters (from, to, date, async). The description briefly restates 'from/to ISO-4217 currency codes, optional date (YYYY-MM-DD)' but adds no new meaning beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool retrieves current or historical foreign-exchange rates, specifying the verb 'Get' and resource 'foreign-exchange rate' with scope of any ISO-4217 currency pair. It distinguishes from sibling tools focused on other financial functions, making its purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use when a workflow needs a precise live or past currency exchange rate, or to convert money between two currencies,' providing clear context for when to use. However, it does not mention when not to use or list alternative tools, which is a minor gap.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

geographic_expansionC

Read-only

Inspect

Expansion géographique — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Expansion 4 marchés (DE/UK/ES/NL) · €1.8M budget · ARR cible €3.2M Y2. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`product`	Yes
`financials`	No
`constraints`	No
`targetMarkets`	Yes
`preferredEntryMode`	No
`expansionHorizonMonths`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description adding 'returns a structured audited deliverable' is consistent but not additive. No side effects disclosed, and openWorldHint is not addressed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is only two sentences plus a reference case, which is concise. However, it could be better structured with a clear verb and parameter overview.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the nested object schema and no output schema, the description is inadequate. The reference case provides some context but does not fully describe the tool's behavior or return value.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%, and the description does not explain parameters. It generically says 'send the documented case fields', leaving the agent to guess which fields are required.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool is for geographic expansion and returns a structured audited deliverable, with a reference case. The verb is implied rather than explicit, but the purpose is clear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus siblings like market_entry_strategist or geo_logistics_intel. It only mentions 'send the documented case fields' without context on alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

geo_logistics_intelA

Read-only

Inspect

Geospatial logistics intelligence for supply chain, maritime and transport agents. Four modes: (1) geocode_batch — resolve up to 50 addresses to lat/lon with confidence scores (OSM Nominatim + Open-Meteo fallback, 1 req/s rate-limit respected); (2) routing — road/cycling/walking route with distance_km, duration_seconds and ETA ISO timestamp between two addresses or lat/lon points (OSRM public, keyless, global); (3) port_congestion — congestion status for any UN/LOCODE port (e.g. NLRTM, SGSIN, CNSHA) with waiting vessel count, severity (low/medium/high/extreme) and average wait hours; (4) ship_tracking — AIS position, speed, course, destination and ETA for a vessel by its 9-digit MMSI. No API key required for geocode/routing/port. Optional env: AIS_STREAM_API_KEY for live ship data (otherwise MarineTraffic scrape best-effort). SLA: <=25s p95. Cache: 24h geocoding / 1h routing / 30min port / 5min ship. Quality score 0-100. Status: final/partial/failed.

ParametersJSON Schema

Name	Required	Description
`to`	No	routing only: destination address or 'lat,lon'
`from`	No	routing only: origin address or 'lat,lon'
`mode`	Yes	'geocode_batch': address -> lat/lon. 'routing': route + ETA. 'port_congestion': UN/LOCODE port state. 'ship_tracking': vessel by MMSI
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Primary input: address for geocode/routing, UN/LOCODE (e.g. NLRTM) for port_congestion, 9-digit MMSI for ship_tracking
`addresses`	No	geocode_batch only: up to 50 addresses (overrides query if provided)
`mode_transport`	No	routing only: transport mode. Default: driving

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`routing`	No
`sources`	Yes
`geocode_batch`	No
`quality_score`	Yes
`ship_tracking`	No
`port_congestion`	No

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and destructiveHint=false, confirmed by the description's read-only nature. The description adds significant behavioral context: rate limits (geocode 1 req/s), SLA ≤25s p95, cache durations (24h geocoding, 1h routing, 30min port, 5min ship), quality score (0-100), status (final/partial/failed), and fallback behavior for ship data. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single dense paragraph but well-organized with numbered modes and clear lists of details (SLA, cache, status). Every sentence adds value. It could be slightly improved by breaking into sections, but it is concise for the amount of information conveyed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (four modes, external services, caching, quality scoring, partial results), the description covers all essentials: mode selection, parameter requirements, rate limits, SLA, API key, cache durations, quality score, and status. An output schema exists (not shown but indicated) to clarify return values. The description leaves no critical gaps for an agent to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with descriptions for each parameter. The description adds value by explaining mode-specific parameter usage (e.g., 'geocode_batch only: up to 50 addresses (overrides query if provided)') and clarifying that 'from' and 'to' are routing-only. It enriches the schema descriptions with practical details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly defines the tool as geospatial logistics intelligence with four distinct modes: geocode_batch, routing, port_congestion, ship_tracking. Each mode has a precise verb+resource pair (e.g., 'resolve up to 50 addresses to lat/lon', 'road/cycling/walking route'). It distinguishes itself from the many sibling tools by its specific domain and functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use each mode, including input requirements (address, UN/LOCODE, MMSI) and optional parameters like transport mode. It mentions rate limits and API key needs, but does not explicitly state when not to use the tool or suggest alternatives. The guidance is adequate for an agent to select the correct mode.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

global_salary_inflation_adjusterA

Read-onlyIdempotent

Inspect

Adjusts salary benchmarks for local inflation using OECD, IMF, and World Bank data. Designed for CHROs to normalize compensation across regions with accurate inflation adjustments. Inputs include country codes, base salary, and reference year. Outputs inflation-adjusted salary with data sources and warnings.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`baseSalary`	Yes
`targetYear`	No
`countryCode`	Yes
`referenceYear`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`targetYear`	No
`countryCode`	No
`inflationRate`	No
`referenceYear`	No
`adjustedSalary`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark the tool as read-only, open world, and idempotent. The description adds behavior by naming specific data sources (OECD, IMF, World Bank) and output details (sources, warnings), providing useful context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences with no redundancy. It front-loads the action ('adjusts') and quickly covers audience, inputs, outputs, and data sources. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists and annotations are present, the description covers the main purpose, data sources, and output nature. However, it omits mention of the optional targetYear parameter and could clarify the async functionality.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description mentions country codes, base salary, and reference year as inputs but provides no detail on format, constraints, or the optional targetYear and async parameters. With only 20% schema description coverage, the description should compensate more but falls short.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool 'adjusts salary benchmarks for local inflation' using specific data sources (OECD, IMF, World Bank), targeting CHROs. This verb+resource combination is unique among sibling tools, providing clear purpose differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes intended use (normalize compensation across regions) but does not explicitly state when not to use or compare with alternatives like comp_benchmark_geo_delta. Usage guidance is implied but not explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

gl_reconcilerB

Read-only

Inspect

GL Reconciler — Réconciliation grand livre — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Answers: Identify the root causes of the GL breaks in 's ledger for — cluster them and rank by materiality. · For Q close: which accounts have unreconciled items over €? Provide a sign-off routing and resolution plan. · Run an automated GL reconciliation for — AR/AP/intercompany entries — flag open items, suggest journal entries. · What are the top 5 systemic control weaknesses causing recurring GL breaks at ? Recommend preventive controls. · Generate a month-end close reconciliation report for — breaks by account type, aging analysis, sign-off assignments. Reference case: Acme SaaS Q4 2026 — 47 breaks GL, €1.4M variance non postée. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`entity`	Yes
`ledgerContext`	Yes

Tool Definition Quality

B3.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, meaning the tool is read-only. The description adds context that the tool returns a 'structured, audited deliverable' and mentions async behavior and server-side validation. There is no contradiction with annotations. The description provides additional behavioral details beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is excessively verbose, containing a list of example queries and a reference case. It lacks a clear structure; the essential information is diluted by examples. Conciseness is poor, as many sentences could be condensed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (4 parameters, nested objects, no output schema), the description is incomplete. It does not specify the return format beyond 'structured deliverable', nor does it describe parameter behavior or result handling. The openWorldHint annotation suggests flexible input, but the description fails to guide the agent on what fields are expected.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (only 'async' has a description). The tool description does not explain the meaning or usage of any parameters, despite complex nested objects (entity, ledgerContext). It only vaguely references 'documented case fields', leaving the agent without clear guidance on how to populate parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: GL reconciliation, identifying root causes of breaks, and generating structured deliverable. It lists specific use cases. However, it does not explicitly differentiate from sibling tools like margin_doctor_finance or budget_variance_ai, which deal with related but distinct financial processes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage scenarios through example questions (e.g., 'Identify the root causes of the <N> GL breaks'). It mentions that inputs are validated server-side and async behavior. However, it lacks explicit guidance on when to use this tool versus alternatives, and no when-not-to-use conditions are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

gov_procurement_multiA

Read-only

Inspect

Aggregate public procurement tenders (calls for tender / appels d'offres) from multiple government sources simultaneously: TED Europa v3 (27 EU countries, keyless API), BOAMP France (opendatasoft, keyless), UK Contracts Finder (OCDS standard, keyless), SAM.gov United States (requires SAM_GOV_API_KEY env var), and bund.de Germany (HTML scraping, partial). Returns structured tender records with buyer authority, EU CPV sector code, estimated contract value converted to EUR via live FX rates, submission deadlines, and direct notice URLs. Use when: a B2G agent needs to find government contract opportunities matching keywords across multiple jurisdictions; building a pipeline of public tenders for bid/no-bid qualification; monitoring a domain by CPV code; market sizing public sector spend. Key inputs: query (keywords), countries (ISO-2 array), cpv_codes (EU standard codes, e.g. 72000000=IT services, 45000000=construction, 79000000=business services), min_value_eur (filter), published_after (ISO date, defaults to 30 days ago). SLA: <=25s p95 (all sources fetched in parallel, 8s budget per source). Optional env var SAM_GOV_API_KEY enables US federal tenders (free key at api.sam.gov). Quality score: 25 pts if TED EU retrieved, 15 pts per other source retrieved (max 60), 10 pts if >= 10 tenders returned, 5 pts if aggregates computed. Status: failed < 30 / partial 30-59 / final >= 60.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Keywords to search for tenders (e.g. "cybersecurity audit", "construction", "consulting AI")
`countries`	No	Countries to search. Defaults to ["EU","US","FR","UK","DE"]. Use "EU" for all 27 EU member states via TED Europa.
`cpv_codes`	No	EU Common Procurement Vocabulary codes (e.g. ['72000000'] for IT services, ['45000000'] for construction). Optional.
`min_value_eur`	No	Minimum contract value in EUR. Tenders below this are excluded. Optional.
`published_after`	No	ISO date YYYY-MM-DD. Only return tenders published after this date. Defaults to 30 days ago.

Output Schema

ParametersJSON Schema

Name	Required	Description
`query`	Yes
`status`	Yes
`sources`	Yes
`tenders`	Yes
`by_source`	Yes
`by_country`	Yes
`quality_score`	Yes
`countries_searched`	Yes

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses important behavioral traits beyond annotations: parallel fetching from multiple sources, SLA of ≤25s p95 with 8s budget per source, quality scoring (points per source retrieved, return count, aggregation), and status categories (failed/partial/final). Annotations indicate readOnlyHint and openWorldHint, which align with the read and broad search nature. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is information-rich but somewhat verbose, covering multiple aspects in a dense paragraph. It mixes usage guidance, technical details, and quality scoring. While well-organized, it could be more concise by separating sections or removing repetitive details (e.g., 'keyless API' for each source).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multi-source, parallel execution, quality scoring, SLA), the description is complete. It covers parameter semantics, return structure, error states, performance expectations, and environmental dependencies (SAM_GOV_API_KEY). Output schema exists but the description still lists return fields, which is helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with descriptions for all 6 parameters. The description adds meaningful context: examples for cpv_codes (72000000=IT services), countries enum meanings ('EU' for all 27 member states), and default for published_after (30 days ago). This provides value beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool aggregates public procurement tenders from multiple specific government sources (TED Europa, BOAMP, UK Contracts Finder, SAM.gov, bund.de) simultaneously. It explicitly lists the output fields (buyer authority, CPV code, estimated value, deadlines, URLs) and distinguishes itself from potentially similar tools like 'rfp_tender_architect' by focusing on multi-source aggregation with quality scoring.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes explicit 'Use when' scenarios (B2G agent finding opportunities, building pipeline, monitoring by CPV, market sizing). It provides context for when to use this tool but does not explicitly state when not to use it or list alternatives. The guidance is clear and actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

growth_path_architectC

Read-only

Inspect

Architecte de croissance — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: Pennylane (€30M ARR) — 3 voies de croissance · Mix recommandé : Organique + Geo EU · ARR cible €120M en 36 mois. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`constraints`	Yes
`growthTarget`	Yes
`currentDrivers`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds only that it returns a 'structured, audited deliverable' and mentions server-side validation, but no additional behavioral traits like rate limits, authentication needs, or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes a verbose reference case that might not be universally useful. It is front-loaded with the tool's purpose, but could be more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex nested input schema and lack of output schema, the description should provide more context on the deliverable's structure. It only gives a vague reference case, leaving agents without enough information to understand what the tool returns.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%. The description does not explain any parameters; it merely says 'send the documented case fields.' This adds no meaning beyond the input schema, which is insufficient given the low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it is a growth architect tool returning a structured, audited deliverable, with a reference case. It clearly identifies the verb (architect) and resource (growth path). However, it does not explicitly differentiate from sibling tools like market_entry_strategist or strategic_options_analyzer.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The only usage-related statement is about input validation, which does not help an agent decide when to invoke this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

hallucination_confidence_meterA

Read-onlyIdempotent

Inspect

Evaluates the likelihood of hallucination in LLM responses by comparing against HuggingFace model confidence scores. Designed for risk assessment personas to quantify response reliability. Accepts text snippets or model outputs, returns confidence metrics and potential hallucination warnings. Cross-references with top-performing models from the HuggingFace leaderboard.

ParametersJSON Schema

Name	Required	Description
`text`	Yes	The LLM-generated text to evaluate for hallucination risk
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`model_id`	No	Optional specific HuggingFace model ID to use for evaluation
`threshold`	No	Confidence threshold below which hallucination warnings are triggered

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`confidence_scores`	No
`hallucination_likelihood`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and idempotentHint, so the description adds value by detailing input types (text snippets or model outputs), output contents (confidence metrics and warnings), and the cross-referencing behavior. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (three sentences), front-loaded with the primary purpose, and well-structured. Minor redundancy in 'Designed for risk assessment personas' could be trimmed, but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers key aspects: purpose, input, output, and method. With an output schema present, explaining return values is not necessary. It is sufficiently complete for an agent to understand core functionality.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not add significant meaning beyond the schema; it mentions input types but aligns with the 'text' parameter description. No additional nuance for model_id or threshold.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool evaluates hallucination likelihood using HuggingFace confidence scores, specifying the resource (LLM responses), method, and target audience. It differentiates from related siblings by mentioning cross-referencing with top-performing models.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates the tool is designed for risk assessment personas, implying a usage context, but it does not explicitly state when to use versus alternatives or provide exclusion criteria. No explicit when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

historical_price_seriesA

Read-onlyIdempotent

Inspect

Fetch historical OHLCV price series for any ticker: stocks (AAPL, SAP.DE, 7203.T), ETFs, indices, commodities (GC=F for gold) or cryptocurrencies (BTC-USD). Returns a full date-indexed series of open/high/low/close/volume plus pre-computed statistics: total return, annualised return (CAGR), annualised volatility, max drawdown and Sharpe estimate (rf=4%). Automatically detects crypto tickers (→ CoinGecko) vs traditional assets (→ Yahoo Finance primary, Stooq fallback). Adjusts for dividends and splits when adjusted=true (default). Use cases: backtesting, factor analysis, performance attribution, charting, financial modelling. Sources: Yahoo Finance, CoinGecko, Stooq. All keyless. Optional env: AICI_RESEARCH_PROXY_URL for Bright Data routing (lifts Yahoo 429), TWELVE_DATA_API_KEY for higher Twelve Data quota.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`period`	No	Look-back period. Default: 1y.
`ticker`	Yes	Yahoo Finance ticker symbol. Examples: AAPL (US stock), SAP.DE (Frankfurt), 7203.T (Tokyo), BTC-USD (Bitcoin), GC=F (gold futures), ^GSPC (S&P 500).
`metrics`	No	Subset of fields to include (informational — all fields always returned).
`adjusted`	No	Adjust close prices for dividends and splits. Default: true.
`interval`	No	Bar interval. Default: 1d (daily).

Output Schema

ParametersJSON Schema

Name	Required	Description
`stats`	Yes
`period`	Yes
`series`	Yes
`status`	Yes
`ticker`	Yes
`sources`	Yes
`currency`	Yes
`interval`	Yes
`data_points`	Yes
`quality_score`	Yes
`splits_detected`	No
`resolved_exchange`	No
`dividends_detected`	No

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint, destructiveHint=false, idempotentHint. The description adds value by detailing data sources (Yahoo Finance, CoinGecko, Stooq), keyless access, optional proxy/env, dividend/split adjustment (adjusted parameter default), and auto-detection logic. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (5-6 sentences), front-loaded with the purpose, and each sentence adds value: ticker examples, use cases, data sources, env vars. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown but confirmed), the description covers all aspects: purpose, ticker formats, parameters, data sources, use cases, and optional configuration. It is complete for a data retrieval tool with good annotations and schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% so baseline is 3. The description adds meaning beyond the schema by providing examples for ticker (AAPL, SAP.DE, 7203.T, BTC-USD), explaining period and interval options, clarifying that metrics parameter is informational, and noting the async parameter for long-running requests. This compensates for the high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool fetches OHLCV price series for various asset types, with explicit examples (stocks, ETFs, indices, commodities, crypto). It distinguishes from siblings by focusing on historical price data retrieval, which is unique among the listed sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists specific use cases (backtesting, factor analysis, performance attribution, etc.) and explains ticker format auto-detection for crypto vs. traditional assets. It does not explicitly state when not to use this tool or suggest alternatives, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

hr_benefits_esg_alignerA

Read-onlyIdempotent

Inspect

Asynchronous tool for Chief Human Resources Officers (CHROs) to align employee benefits packages with ESG (Environmental, Social, Governance) goals. Uses Eurostat HR data, MSCI ESG ratings, and Sustainalytics metrics to generate actionable recommendations. Inputs include company location, industry, and current benefits structure. Outputs ESG-aligned benefits adjustments with sustainability impact scores. Requires async:true to avoid timeout errors.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`esgFocus`	No	Primary ESG pillars to prioritize
`industryCode`	Yes	NACE or ISIC industry classification code
`companyLocation`	Yes	ISO 2-letter country code of company headquarters
`currentBenefits`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`recommendations`	No
`overallESGAlignmentScore`	No

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint) already indicate no side effects. The description adds behavioral context: asynchronous execution, data sources (Eurostat, MSCI, Sustainalytics), and timeout avoidance. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no redundancy. Front-loaded with purpose and target user, followed by data sources and constraints. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema, the description adequately covers purpose, inputs, async requirement, and data sources. It is missing details on the esgFocus parameter but remains complete for selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 80% with parameter descriptions. The description summarizes three key inputs (location, industry, benefits) but does not elaborate on esgFocus or async parameters beyond the schema. Given high coverage, baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool aligns employee benefits with ESG goals, targeting CHROs. It specifies key inputs and outputs, but does not explicitly differentiate from other ESG-alignment tools like action_plan_esg or procurement_okr_esg_aligner, which are present among siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides guidance on synchronous vs. asynchronous usage via 'Requires async:true to avoid timeout errors.' However, it lacks explicit when-to-use and when-not-to-use criteria, and does not mention alternative tools for context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

incident_response_evidence_collectorA

Read-onlyIdempotent

Inspect

As a CTO, gather forensic evidence (logs, network flows, MITRE TTPs) from public breach reports and threat intelligence sources to support incident response post-mortems. Inputs include incident identifiers, date ranges, or MITRE technique IDs. Outputs structured evidence with attack patterns, indicators of compromise, and source references. — pass async:true REQUIRED to avoid x402 timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`date_range`	No
`incident_id`	Yes	Unique identifier for the incident (e.g., CVE, GitHub Advisory ID)
`mitre_technique_ids`	No	List of MITRE ATT&CK technique IDs (e.g., T1059)
`include_network_flows`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`timeline`	No
`warnings`	No
`indicators`	No
`incident_id`	No
`network_flows`	No
`attack_patterns`	No

Tool Definition Quality

A4.2/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations: it explains the async requirement to avoid x402 timeouts, confirms the read-only nature (gathering from public sources), and describes output structure. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise, front-loaded sentences with no wasted words. The second sentence lists inputs, and the third mentions outputs. The async requirement is integrated cleanly. Perfectly structured for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, inputs, outputs, and a critical behavioral note (async requirement). With an output schema present, details on return format are not needed. However, it could mention the include_network_flows parameter or clarify that date_range is optional, but overall it is sufficiently complete for a tool with this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 60% schema description coverage, the description merely lists three input types (incident_id, date_range, mitre_technique_ids) already covered by schema descriptions. It adds no new constraints or context for these parameters, and ignores include_network_flows and async (though async has its own schema description). Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: gather forensic evidence from public sources for incident response. It specifies inputs (incident IDs, date ranges, MITRE IDs) and outputs (structured evidence with attack patterns, IoCs, references), distinguishing it from siblings like ai_act_incident_response by focusing on post-mortem evidence collection.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a strong usage guideline about requiring async=true to avoid timeouts, but lacks guidance on when to use this tool vs alternatives. No explicit when-not or comparison to sibling tools like ai_act_incident_response is given, limiting the agent's ability to choose correctly.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

india_market_dataA

Read-only

Inspect

Indian capital market intelligence for the IN diaspora (30M+) and investors. Covers NSE, BSE, and MCA corporate registry across four modes:

• company — full company profile: name, CIN, exchange, NSE/BSE tickers, industry, incorporation date, paid-up capital, registered office, status, directors • market_quote — real-time quote: price (INR), change%, volume, market cap, P/E ratio. Sources: Yahoo Finance (primary), BSE API, NSE API (proxy-gated) • sector_overview — Nifty/Sensex sector snapshot: top 5 companies by market cap. Supported sectors: it, banking, pharma, energy, auto, fmcg, realestate, metals, telecom, consumer • mca_filing — Ministry of Corporate Affairs filings. Requires CIN for direct lookup.

Input formats accepted: • NSE ticker (e.g. 'RELIANCE', 'TCS.NS') • BSE 6-digit code (e.g. '500325' for Reliance) • CIN 21-char (e.g. 'L17110MH1973PLC019786') • Company name EN (e.g. 'Reliance Industries', 'Tata Consultancy') • Sector keyword (e.g. 'IT services', 'banking', 'pharma')

ENV: AICI_RESEARCH_PROXY_URL with country-in routing unlocks NSE direct API and MCA.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analysis mode.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	NSE/BSE ticker, CIN (21 chars), company name (EN), or sector keyword.
`exchange`	No	Exchange filter. Default: all.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`status`	Yes
`company`	No
`sources`	Yes
`mca_filings`	No
`market_quote`	No
`quality_score`	Yes
`sector_overview`	No

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description fully discloses the read-only nature (confirmed by annotations), sources (Yahoo Finance, BSE, NSE, MCA), async behavior via the 'async' parameter, and the environment variable requirement for proxy access. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Despite being detailed, the description is well-structured with bullet points for modes and input formats, front-loaded with the primary purpose. Every sentence adds value, no redundancy. The structure aids quick scanning by an AI agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description need not detail return values. It covers all essential aspects: modes, inputs, async polling, data sources, and environment configuration. The addition of example inputs (e.g., 'RELIANCE', '500325') enhances clarity. It appears fully complete for a data retrieval tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the description significantly enriches parameter understanding beyond the schema. It explains each mode's output (e.g., company profile details, real-time quote fields), clarifies that 'query' accepts multiple formats (NSE ticker, BSE code, CIN, company name, sector keyword), and details the async option. This goes far beyond the bare schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides Indian capital market intelligence, specifying coverage of NSE, BSE, and MCA across four distinct modes. The verb 'covers' combined with the detailed mode descriptions and explicit input formats leaves no ambiguity. It distinguishes itself from the sibling 'china_market_data' by geographic focus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use the tool (for Indian market data) and explains each mode's purpose and inputs. However, it does not explicitly state when not to use it or suggest alternatives, though the geographic focus is implied by the detailed input formats.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

industry_classifier_naics_sicA

Read-only

Inspect

Classificateur d'industrie NAICS/SIC/NACE — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: What is the NAICS code for a company that does ? · Give me NAICS + SIC + NACE classification for this company description. · Which industry sector (GICS) does this company belong to for equity analysis? · What HS code applies to products manufactured by this company? · For EU procurement compliance, what NACE Rev. 2 code applies to this company? · Classify this business into NAICS + SIC + ISIC + GICS + NACE + HS with hierarchy and confidence. · I need to segment my ICP list by NAICS 4-digit subsector — classify these company descriptions. Reference case: Helios Cold Chain EU — Freight forwarding maritime réfrigéré · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company_url`	No
`company_name`	No
`company_description`	Yes
`focus_classifications`	No
`primary_revenue_source`	No

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint, so the description supplements this by stating it returns a 'structured, audited deliverable' and mentions server-side validation. It does not detail output structure or mention rate limits/auth, but the annotations cover the safety profile. The description adds moderate context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately concise, combining a brief intro with a list of example queries and a reference case. While the example list is lengthy, each example adds value by clarifying scope. The mix of French and English may reduce clarity but does not significantly harm conciseness. It is well-structured for readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple classification systems, hierarchy, confidence) and lack of output schema, the description provides useful context through examples and mentions 'structured, audited deliverable'. However, it does not specify the exact output format or fields returned, which would help the agent process results. The openWorldHint from annotations hints at external data use, but the description could be more complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (only 'async' parameter described). The description compensates somewhat by illustrating usage examples that imply parameters (e.g., company_description, focus_classifications) but does not systematically explain all 6 parameters (e.g., company_url, company_name, primary_revenue_source). More explicit parameter guidance is needed for a low-coverage schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is an industry classifier for NAICS/SIC/NACE and related codes (GICS, HS, ISIC). The name and examples (e.g., 'Give me NAICS + SIC + NACE classification') exactly match the tool's function. It distinguishes from siblings by its unique focus on classification codes, which no other sibling tool offers.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides multiple concrete example queries showing when to use the tool, such as asking for NAICS code for a company activity or classifying for EU procurement compliance. It includes a reference case for context. However, it lacks explicit instructions on when not to use or alternatives, though the examples effectively cover the primary use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

infra_blueprint_designerB

Read-only

Inspect

Architecte infra cloud — Gapup agent-payable C-suite expertise (CTO). Returns a structured, audited deliverable. Answers: Design a cloud infrastructure blueprint for a app with expected traffic and requirements. · What is the recommended AWS vs GCP vs Azure architecture for a SaaS multi-tenant app with EU data residency and SOC2? · How should I architect my cloud infra to stay under €5k/month with GDPR compliance and a junior DevOps team? · What cloud services do I need for a with load — compute, DB, cache, CDN, observability? · Give me an end-to-end cloud architecture with scaling plan, security baseline, and IaC tool recommendation. Reference case: Spinora fintech B2B SaaS — saas-multi-tenant · medium load (1k-100k req/d) · eu-west · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`team_size`	No
`expected_load`	Yes
`workload_type`	Yes
`business_context`	No
`cloud_preference`	No
`region_preference`	Yes
`budget_monthly_eur`	No
`compliance_required`	No

Tool Definition Quality

B3.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true and openWorldHint=true. The description adds context: 'returns a structured, audited deliverable' and 'Inputs are validated server-side'. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the title and key phrase, and includes bulleted examples. It is somewhat long but well-structured. Minor redundancy could be trimmed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (9 parameters, no output schema, low schema coverage), the description provides useful examples and a reference case. However, it lacks details on return format, error handling, and behavior for invalid inputs, leaving gaps for a complex tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (11%), with only the 'async' parameter described. The description mentions variables like <workload_type> and <load> but does not explain each parameter's meaning or usage beyond the examples. It fails to compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool designs cloud infrastructure blueprints and provides example questions. It is specific about the resource (cloud infra blueprint) and verb (design). However, it does not explicitly differentiate from sibling tools like capacity_planning or cloud_cost_ri_optimizer.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage through example questions (e.g., when needing cloud architecture design), but it does not provide explicit guidance on when to use this tool versus alternatives, nor does it state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

insurance_coverage_analyzerC

Read-only

Inspect

Analyseur de couvertures d'assurance — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Gapup Hub — 3 polices · €24k prime · Score 58/100 · 3 gaps critiques · RFP template. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`arrEur`	Yes
`sector`	Yes
`objectives`	Yes
`companyName`	Yes
`riskProfile`	Yes
`jurisdiction`	Yes
`employeeCount`	Yes
`currentPolicies`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true (non-destructive) and openWorldHint=true (input may be incomplete). The description adds that the tool 'returns a structured, audited deliverable' and that 'inputs are validated server-side.' This gives some behavioral context beyond annotations but does not detail side effects, rate limits, or output format details. A score of 3 is appropriate as the description provides some extra value but is not rich.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (three sentences), but it is not optimally front-loaded. The first sentence is the purpose, but it is in French. The second sentence mixes jargon ('Gapup agent-payable C-suite expertise') and the third gives a reference case. While concise, some phrases are unclear, and every sentence could be more informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (9 parameters, nested objects, no output schema), the description is incomplete. It mentions a structured deliverable but does not specify the output fields or structure. The reference case gives an idea but is not a full specification. The description answers some questions but leaves many gaps, especially for an agent trying to understand input requirements and expected output.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 11%, meaning most parameters lack schema-level descriptions. The tool description does not describe any parameters, merely saying 'send the documented case fields.' This adds no semantic value. For a tool with 9 parameters and nested objects, this is critically inadequate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is an insurance coverage analyzer ('Analyseur de couvertures d'assurance') and that it returns a structured, audited deliverable. The reference case gives concrete context. However, the tool name is in English while the description is in French, and the mention of 'Gapup agent-payable C-suite expertise (RISK)' is somewhat cryptic. There is no explicit differentiation from sibling tools, which are numerous but none directly comparable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

There is no guidance on when to use this tool versus alternatives. The description does not mention prerequisites, conditions, or exclusions. The only note is that inputs are validated server-side, which is not usage guidance. Given the large set of sibling tools, lack of direction is a significant gap.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

interest_rateA

Read-only

Inspect

Return a precise reference interest rate — the exact figure an agent injects into a treasury, lending, valuation or trading model. Available rates: fed_funds, sofr, us_10y, us_2y, us_3m, ecb_main, euribor_3m. Source: FRED (Federal Reserve Bank of St. Louis). When to use: an agent's computation needs a current benchmark rate as a precise input.

ParametersJSON Schema

Name	Required	Description	Default
`rate`	Yes	Reference rate name
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.

Output Schema

ParametersJSON Schema

Name	Required	Description
`rate`	Yes
`unit`	Yes
`as_of`	Yes
`value`	Yes
`source`	Yes
`series_id`	No
`source_url`	No

Tool Definition Quality

A4.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the safety profile is covered. The description adds context about precision and source but does not go into details like rate update frequency, potential delays, or error responses. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences: purpose, available rates, source and usage. It is front-loaded with the key verb and resource, and every sentence adds necessary information without waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 params, output schema present, annotations), the description covers purpose, usage, available values, and source. It is complete for an agent to select and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema provides basic parameter descriptions. The description adds value by listing all rate names and their source (FRED), and by clarifying that the tool returns an exact figure for model injection. This helps agents choose the right rate beyond the enum labels.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns a precise reference interest rate, lists all available rates, and specifies the source (FRED). It distinguishes itself from potential sibling tools like fx_rate or economic_indicator by focusing on interest rates for model inputs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context: 'When to use: an agent's computation needs a current benchmark rate as a precise input.' It does not explicitly state when not to use or mention alternatives, but the usage scenario is well-defined.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

internal_communicationC

Read-only

Inspect

Communication interne — Gapup agent-payable C-suite expertise (CHRO). Returns a structured, audited deliverable. Reference case: Cas démo — Communication interne. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`context`	Yes
`audienceSegments`	Yes

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint and openWorldHint, and description confirms it returns a deliverable. It adds that inputs are validated server-side. However, no details on potential side effects, authorization needs, or rate limits are given. The description does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short but includes unnecessary jargon ('Gapup agent-payable C-suite expertise') that obscures meaning. It could be more straightforward. It is front-loaded with the tool title but not optimally concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested objects, 5 parameters, no output schema), the description is incomplete. It does not explain the return format, how to construct nested inputs, or reference any case beyond a vague mention. It leaves significant gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, leaving most parameters undocumented. The description does not add any parameter-level details, only saying 'send the documented case fields.' It fails to compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a 'structured, audited deliverable' and references a demo case, which gives some purpose. However, it uses vague jargon like 'Gapup agent-payable C-suite expertise (CHRO)' without clear explanation. It does not clearly differentiate from sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The instruction to 'send the documented case fields' implies input validation, but no comparison to siblings or usage context is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

investor_listB

Read-only

Inspect

Liste d'investisseurs + warm intros — Gapup agent-payable C-suite expertise (FUNDRAISING). Returns a structured, audited deliverable. Reference case: Agicap Série D — 25 VCs matchés · Tier A: Balderton/Accel/Partech · Warm intro path chaque investisseur. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`round`	Yes
`company`	Yes
`existingInvestors`	No

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and returns a 'structured, audited deliverable' with warm intro paths. This adds some context but does not disclose rate limits, authentication needs, or other behavioral traits beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise, front-loading the core purpose and adding a helpful reference case. However, the reference case example adds some length and might be less essential for tool selection.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested objects, 4 parameters, no output schema), the description is adequate but incomplete. It mentions return of a 'structured, audited deliverable' and warm intro paths but does not detail the response format or parameter semantics sufficiently.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (only async has a description). The description merely says 'send the documented case fields' without explaining the parameters or their meanings. With low coverage, the description fails to compensate, leaving the agent to rely on the sparse schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists investors and warm intros for fundraising (FUNDRAISING). It references a specific case (Agicap Serie D) to illustrate output. However, it does not explicitly differentiate from the sibling tool investor_shortlist, which likely overlaps.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for fundraising by capitalizing FUNDRAISING and referencing a Series D case. But it offers no explicit guidance on when to use this tool versus alternatives like investor_shortlist or when not to use it. The context is clear but not prescriptive.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

investor_shortlistA

Read-only

Inspect

Shortlist d'investisseurs ciblés — Gapup agent-payable C-suite expertise (FUNDRAISING). Returns a structured, audited deliverable. Reference case: Aleph AI — Series B €30M · 60 investisseurs EU/US matchés par stage/thèse · fit score + warm intro path + first message angle. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`round`	Yes
`company`	Yes
`preferences`	Yes

Tool Definition Quality

A3.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, and the description adds that it 'Returns a structured, audited deliverable' which goes beyond the annotation. It provides context about the output quality and validation but does not mention the async parameter or polling behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a few sentences mixing French and English, with a reference case that adds length. It is front-loaded with purpose but could be more concise and structured, perhaps separating purpose from usage details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 parameters, nested objects, no output schema), the description lacks details on return format, async handling, or what the 'structured, audited deliverable' contains beyond the reference case. It is incomplete for an agent to fully understand the tool's behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (20%) and the description does not explain any parameters. It only says 'send the documented case fields', leaving users to infer parameter meanings from the schema alone. With nested objects and many fields, this is insufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it creates a targeted shortlist of investors for fundraising, with a concrete reference case (Aleph AI). It distinguishes itself from siblings like 'investor_list' or 'funding_hunter' by emphasizing 'ciblés' and 'fit score + warm intro path + first message angle'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for fundraising C-suite expertise but does not explicitly state when to use this tool versus alternatives. It mentions 'Inputs are validated server-side' which hints at proper usage but lacks clear when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ip_contract_clause_extractorA

Read-onlyIdempotent

Inspect

For CHRO use: analyzes employment contract text to identify and extract IP-related clauses such as invention assignment, confidentiality, non-compete, and patent rights. Returns structured data with clause types, risk levels, and relevant legal context. Ideal for contract review workflows, compliance checks, and IP protection strategy. Sources: USPTO PatFT and EPO Espacenet public datasets. Keywords: employment contract, IP clause, invention assignment, confidentiality agreement, non-compete, patent rights, CHRO tool.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`contractText`	Yes	Full text of the employment contract to analyze
`jurisdiction`	No	Country/state jurisdiction for legal context (e.g., 'US-CA', 'DE')
`includeContext`	No	Whether to include legal context for each clause

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`clauses`	Yes
`sources`	No
`summary`	Yes
`warnings`	Yes

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and openWorldHint, so the description's disclosure burden is lower. It adds value by describing the output (structured data with clause types, risk levels, context) and mentioning data sources (USPTO, EPO). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a front-loaded purpose, output description, use cases, and sources. However, the 'Keywords' line is somewhat redundant and could be removed for better conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema and annotations, the description sufficiently covers the tool's functionality, output format, and use context. It mentions clause types, risk levels, legal context, and data sources, making it complete for an AI agent to understand when and how to use it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Parameter descriptions in the input schema already have 100% coverage. The tool description adds no additional meaning beyond what is in the schema (e.g., async, jurisdiction, includeContext are not elaborated). Baseline score applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: analyzing employment contract text to extract IP-related clauses. It provides specific examples (invention assignment, confidentiality, non-compete, patent rights) and explicitly targets CHRO users, distinguishing it from general legal clause extractors or patent tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions ideal use cases ('contract review workflows, compliance checks, IP protection strategy') but does not explicitly state when not to use the tool or suggest alternatives from the sibling list. Usage context is implied rather than clearly delimited.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ip_employee_invention_trackerA

Read-onlyIdempotent

Inspect

For CHROs: tracks employee patent filings and flags unassigned inventions. Input employee name or ID to retrieve their patent applications from USPTO and WIPO databases. Returns list of inventions with assignment status, filing dates, and potential ownership gaps. Useful for IP audits, inventor onboarding, and compliance checks. Keywords: patents, IP ownership, employee inventions, USPTO, WIPO.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`endDate`	No	Filter patents filed before this date (YYYY-MM-DD)
`startDate`	No	Filter patents filed after this date (YYYY-MM-DD)
`employeeId`	No	Internal employee ID (optional if name provided)
`companyName`	Yes	Exact legal name of company for assignment check
`employeeName`	Yes	Full name of employee to track (e.g., 'John Doe')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`patents`	Yes
`sources`	No
`warnings`	Yes
`employeeId`	No
`companyName`	Yes
`employeeName`	Yes
`totalPatents`	Yes
`unassignedPatents`	Yes

Tool Definition Quality

A3.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly and idempotent. Description adds value by specifying external data sources (USPTO, WIPO) and the output fields (assignment status, filing dates, gaps). No behavioral quirks disclosed, but the description complements annotations well.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is concise (under 100 words) and front-loaded with purpose. The keyword list adds utility but is slightly redundant. Overall efficient, but could be tightened by integrating keywords into prose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers main purpose and output, but omits mention of optional date range and async parameters (described in schema). For a tool with 6 parameters, the description should at least hint at filtering capabilities. With output schema present, return values are sufficiently described.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters with descriptions. The description mentions 'employee name or ID' but adds no new semantic detail beyond the schema (e.g., no examples or format hints). Baseline 3 is appropriate given high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it tracks employee patent filings and flags unassigned inventions, specifying input (employee name/ID) and databases (USPTO, WIPO). It includes keywords but does not explicitly differentiate from sibling tools like 'patent_ownership_audit' or 'patent_landscape', making it clear but not uniquely distinguishing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Identifies target user (CHROs) and use cases (IP audits, onboarding, compliance), which provides context. However, it lacks explicit guidance on when to use this tool over alternatives—e.g., not stating that this is specific to employee-level tracking vs broader patent searches.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ip_protection_pilotC

Read-only

Inspect

Pilote de protection IP — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Carbios SA — Deeptech FR recyclage PET enzymatique · 14 brevets EP/US/FR · 5 concurrents · licensing €2-8M potentiel. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`competitors`	Yes
`targetMarkets`	Yes
`patentPortfolioSummary`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and openWorldHint=true. Description adds that inputs are validated server-side and it returns a deliverable, but lacks details on async behavior, error handling, or what the deliverable contains. Does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, relatively concise. The reference case is specific but adds length without clarifying purpose. Front-loads basic info but could be more efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, and description does not explain return format or async result polling. With 6 parameters and many IP sibling tools, the description does not provide complete guidance for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is very low (17%). Only 'async' has a description in schema. The description does not explain any parameters beyond the reference case hint, failing to compensate for the schema gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a 'structured, audited deliverable' for IP protection pilot, and gives a reference case. However, the verb is unclear (pilot is a noun) and it does not explicitly differentiate from sibling tools like patent_landscape.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives. The term 'pilot' suggests initial assessment but not stated. No when-not or exclusion criteria provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

jailbreak_attempt_detectorA

Read-onlyIdempotent

Inspect

Detects potential LLM jailbreak attempts by analyzing user input against NIST AI Risk Management Framework adversarial patterns. Designed for persona risk assessment, this tool evaluates text for common jailbreak techniques such as prompt injection, role-playing, or obfuscation. Inputs include the user message and optional context, returning a risk assessment with confidence scores and pattern matches. Ideal for real-time moderation in chat applications or API gateways.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`context`	No	Optional conversation context for better pattern matching
`message`	Yes	User input text to analyze for jailbreak attempts
`threshold`	No	Confidence threshold for flagging attempts

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`riskScore`	No	Confidence score of jailbreak attempt
`patternsMatched`	No	List of detected adversarial patterns
`isJailbreakAttempt`	No	Whether the input exceeds the risk threshold

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, and idempotentHint. The description adds value by detailing the analytical approach (NIST patterns, common techniques) and the return format (risk assessment with confidence scores and pattern matches), enhancing the agent's understanding beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the core function, and includes key details without extraneous text. It is concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists (not shown but mentioned), the description sufficiently covers purpose, usage, behavior, and return type. The tool has moderate complexity, and the description provides all necessary context for an agent to select and invoke it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with all parameters described. The description adds 'Inputs include the user message and optional context' but doesn't provide additional meaning beyond the schema. Baseline score is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool detects LLM jailbreak attempts, specifies techniques (prompt injection, role-playing, obfuscation), and references the NIST AI RMF framework. It distinguishes itself from sibling tools by focusing on jailbreak detection for persona risk assessment.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly recommends use cases: real-time moderation in chat applications or API gateways. While it doesn't state when not to use or list alternatives, the context is clear and actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_postings_intelligenceA

Read-only

Inspect

Agrégation d'offres d'emploi publiques pour inférer les tendances de recrutement. Trois modes : (1) company_hiring — analyse des postings d'une société : volume, fonctions (engineering/sales/marketing/ops/finance/hr), seniorité, géographie, croissance vs période précédente, signaux stratégiques inférés ; (2) role_market — volume marché global pour un rôle (open positions estimate, top employeurs, compétences demandées, médiane seniorité) ; (3) competitor_hiring_comparison — comparaison multi-sociétés (total postings, growth%, focus areas). Sources : Adzuna (ADZUNA_APP_ID/KEY env), RemoteOK (keyless), Himalayas (keyless), baseline statique 40 top employeurs. Usages : due diligence VC, intelligence compétitive, benchmarks RH, signaux pivots stratégiques. Cache 6h. SLA ≤15s.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Mode d'analyse : 'company_hiring' \| 'role_market' \| 'competitor_hiring_comparison'
`role`	No	Intitulé de poste à analyser (pour role_market, ex. 'data scientist', 'compliance officer')
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	No	Nom de la société (pour company_hiring ou comme 1er concurrent)
`location`	No	Pays ou ville (ex. 'France', 'United States', 'London')
`competitors`	No	Liste de sociétés à comparer (pour competitor_hiring_comparison, min 2)
`period_days`	No	Fenêtre d'analyse en jours (défaut 30)

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`sources`	Yes
`role_market`	No
`quality_score`	Yes
`company_hiring`	No
`competitor_comparison`	No

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false. The description adds behavioral context: cache duration (6h), SLA (≤15s), and the async parameter option. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is dense but not overly long. It front-loads the core purpose and then details modes. However, it could be better structured (e.g., bullet points for modes) for easier scanning.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (7 parameters, 3 modes, output schema), the description is comprehensive: it covers sources, usage scenarios, performance bounds, and mode-specific outputs. The output schema exists, so return values are not required.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, and the description adds value by explaining the three modes and their analysis scope, which helps understand parameters like 'mode', 'company', 'role', and 'competitors'. It goes beyond the schema's parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool aggregates public job postings to infer hiring trends, with three distinct modes ('company_hiring', 'role_market', 'competitor_hiring_comparison'), each with specific outputs like volume, functions, seniority, and growth. It differentiates from sibling tools by focusing specifically on job posting intelligence rather than general competitive analysis.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists explicit use cases: 'due diligence VC, intelligence compétitive, benchmarks RH, signaux pivots stratégiques.' It also mentions sources and caching, but does not explicitly state when not to use this tool or provide alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_resultA

Read-onlyIdempotent

Inspect

Poll the result of any tool called with async:true. Returns status=pending while running, status=completed with the full result once done, status=failed on error, or status=not_found if the job_id is unknown or expired (TTL 24h).

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	Yes	The job_id returned by an async tool call

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds valuable context beyond annotations: explains four statuses (pending, completed, failed, not_found) and TTL of 24h, with no contradiction to readOnly, idempotent, or destructive hints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences; first states purpose, second details statuses. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple polling tool with one parameter and output schema present, the description covers behavior, statuses, and TTL completely.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the description does not add significant extra meaning beyond what the schema already provides for job_id.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it polls the result of any tool called with async:true, distinguishing it from siblings that poll results of specific tools like competitive_deep_dive_result.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Describes when to use (after calling an async tool) and details status outcomes, but does not explicitly state when not to use or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

knowledge_base_autoC

Read-only

Inspect

Base de connaissance automatique — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Klarna — knowledge base auto · Slack+Notion+Drive · 12 articles seed + structure 8 catégories. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`sources`	Yes
`topPainPoints`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, indicating no mutations. The description adds that it 'returns a structured, audited deliverable', which is consistent. However, it does not disclose other important behaviors such as whether the knowledge base is persisted, what the 'audited' nature implies, or any authorization requirements beyond server-side validation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is somewhat verbose with marketing language ('Gapup agent-payable C-suite expertise (COO)') and a reference case that is informative but not essential. The purpose is front-loaded but the text could be trimmed for better conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 parameters, nested objects, no output schema), the description is incomplete. It does not explain the deliverable's format, what 'audited' means, how to handle the response, or any additional context for using the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is only 20% (only the 'async' parameter is documented in the schema). The description adds no information about parameters, merely saying 'send the documented case fields'. It fails to explain the structure of required nested parameters like 'company' and 'sources', leaving the agent to rely solely on the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it is a knowledge base tool for C-suite expertise, returning a structured, audited deliverable. It includes a reference case, which helps clarify the scope. However, it lacks a specific verb like 'create' or 'generate', relying on 'returns', and does not explicitly distinguish it from siblings beyond its unique purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no explicit guidance on when to use this tool versus alternatives. It mentions server-side validation and a reference case but does not specify prerequisites, exclusions, or conditions for use. The agent is left to infer usage from the purpose.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kyc_screenerC

Read-only

Inspect

Screening KYC / AML / Sanctions — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Q4 2026 onboarding — 8 entités (UBO chain LLC + SPV offshore), sanctions/PEP/adverse media. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entities`	Yes
`riskAppetite`	Yes		standard
`screeningScope`	Yes
`onboardingPacket`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true, so the description does not need to emphasize safety. It adds that it returns a structured, audited deliverable, which is useful. However, it does not disclose any additional behavioral traits like rate limits or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short (3 sentences) but includes unnecessary jargon ('Gapup agent-payable C-suite expertise') and a specific reference case that may not be universally helpful. It is front-loaded with the main purpose but contains some fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested objects, 5 parameters, 4 required) and lack of output schema, the description is insufficient. It mentions 'audited deliverable' but does not specify the structure or contents of the return value. No guidance on pagination or error handling.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, so the description must compensate. It gives a reference case (UBO chain, SPV offshore) but does not explain the meaning or constraints of individual parameters like 'entities' or 'screeningScope'. The async parameter is documented in schema, but others are not.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool screens KYC/AML/Sanctions. The verb 'Screening' and resource are present, but the jargon 'Gapup agent-payable C-suite expertise (RISK)' is ambiguous and does not effectively differentiate it from sibling tools like 'sanctions_screener_multi'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Only mentions that inputs are validated server-side and to send documented case fields. No explicit guidance on when to use this tool versus alternatives (e.g., kyc_screener_batch) or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kyc_screener_batchA

Read-only

Inspect

Async batch variant of kyc_screener. Accepts 1-100 names and returns immediately (<300ms) with a job_id. The screening runs in the background (up to 10 parallel KYC calls). Poll the result with kyc_screener_batch_result(job_id) after the eta_seconds hint. Each entry can specify name, type (person/company/any), and an optional birthdate hint. Use for bulk client onboarding, UBO list screening, or periodic AML refresh batches. Async tool — register a webhook via webhooks_manage(register, url, [job.completed]) to receive callbacks instead of polling. Faster + lighter.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`names`	Yes	List of entities to screen (1-100). Each entry requires at minimum a name.

Output Schema

ParametersJSON Schema

Name	Required	Description
`job_id`	Yes	Unique job identifier — pass to kyc_screener_batch_result
`status`	Yes
`batch_size`	Yes	Number of names queued for screening
`eta_seconds`	Yes	Estimated seconds until result is ready
`submitted_at`	Yes	ISO-8601 submission timestamp

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Announces return time (<300ms), parallel call limit (10), eta_seconds hint, webhook registration option. Adds substantial behavioral context beyond annotations (readOnlyHint, openWorldHint). No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is slightly long but well-structured: purpose, usage, parameters, async behavior, alternatives. Key info front-loaded. No redundant sentences.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With output schema existing, description covers return value (job_id, eta_seconds hint), async workflow, polling, webhooks. Comprehensive for a batch async tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good descriptions. Description adds context for each field (name, type, birthdate) but does not provide new semantics beyond schema. However, it explains async parameter utility in broader workflow (polling vs webhooks), justifying a 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it is an async batch variant of kyc_screener, accepts 1-100 names, returns job_id immediately, screens in background. Lists specific use cases: bulk client onboarding, UBO list screening, periodic AML refresh batches. Distinct from siblings like kyc_screener (sync) and kyc_screener_batch_result (polling).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly guides when to use this tool (bulk screening) and alternatives: poll with kyc_screener_batch_result after eta_seconds hint, or register webhook via webhooks_manage. Clearly distinguishes from synchronous kyc_screener.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kyc_screener_batch_resultA

Read-onlyIdempotent

Inspect

Poll the result of a kyc_screener_batch job. Returns status=pending while running, status=completed with the full array of KYC results once done, status=failed on error, or status=not_found if the job_id is unknown or expired (TTL 24h). Call this after the eta_seconds hint returned by kyc_screener_batch.

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	Yes	The job_id returned by kyc_screener_batch (prefix: kycb_)

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly and idempotent. Description adds TTL of 24h and status behaviors, which are beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first states purpose, second details statuses and usage hint. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Fully covers polling behavior, possible statuses, TTL, and when to call. Output schema presumably details result structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear description of job_id. Description does not add significant extra meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it polls KYC batch results, distinguishes from sibling tools like kyc_screener_batch by specifying it returns statuses and results.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says to call after eta_seconds from kyc_screener_batch. No explicit when-not-to-use, but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

labor_law_alert_geoA

Read-onlyIdempotent

Inspect

Provides CHROs with daily alerts on new labor law changes by jurisdiction (state/country). Inputs include jurisdiction (ISO country/state code) and optional date range. Outputs structured legislative updates with summaries, effective dates, and source links. Useful for compliance monitoring, risk assessment, and policy adjustments. Keywords: labor law, compliance, legislation, jurisdiction, CHRO, HR policy.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`since`	No	Optional start date for changes (YYYY-MM-DD). Defaults to 7 days ago.
`until`	No	Optional end date for changes (YYYY-MM-DD). Defaults to today.
`jurisdiction`	Yes	ISO 3166-1 alpha-2 country code or ISO 3166-2 state/province code (e.g., 'US-CA', 'FR')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`changes`	Yes
`sources`	Yes
`warnings`	Yes
`last_updated`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, open-world, and idempotent behavior. The description adds value by detailing the output structure (summaries, effective dates, source links) and the daily alert nature, which goes beyond the annotations. No contradiction is present.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear first sentence establishing purpose, followed by input/output details and use cases. Keywords are appended for searchability. It is concise and front-loaded, with no unnecessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (4 parameters, output schema exists, annotations cover key traits), the description is fairly complete. It explains the purpose, input format, output content, and use cases. It could be slightly improved by mentioning the async parameter or providing more detail on the date range defaults.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description reiterates the jurisdiction format and optional date range, but does not add new meaning beyond what the schema already provides. It does not explain the 'async' parameter, though the schema covers it.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool provides daily alerts on new labor law changes by jurisdiction, specifying the target audience (CHROs) and the resource (labor law changes). It distinguishes itself from the long list of sibling tools by its unique function, and there are no directly similar tools in the sibling list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions use cases like compliance monitoring, risk assessment, and policy adjustments, but does not provide explicit guidance on when to use this tool versus alternatives, nor does it state when not to use it. The absence of alternative tool references limits its effectiveness in guiding the agent's selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ld_architectC

Read-only

Inspect

Architecte formation & développement — Gapup agent-payable C-suite expertise (CHRO). Returns a structured, audited deliverable. Reference case: Pennylane (180 FTE) — Catalogue 8 formations · 3 parcours individuels · ROI €480k · Payback 7 mois. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`team`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`budget`	Yes
`company`	Yes
`learningNeeds`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, so the description is not required to state safety. It adds context about server-side validation and a reference case for expected ROI, but does not detail async behavior (though schema includes an 'async' parameter). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is five sentences, including a marketing reference case that adds little functional value. It could be more concise by removing the reference case and focusing on operation. The structure places a French title first, then purpose, then output, which is acceptable.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 parameters, nested objects, no output schema), the description lacks crucial details: explanation of return format, async behavior, and parameter meanings. The reference case provides some context but not operational completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only 20% of schema properties have descriptions (the 'async' field). The description does not explain any input parameters (company, team, learningNeeds, budget) beyond a vague 'send the documented case fields'. This is insufficient for correct invocation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as an architect for training and development for C-suite/CHRO. It returns a structured, audited deliverable. However, it does not explicitly distinguish itself from similar sibling tools like lnd_ai_skill_forecast or lnd_skill_taxonomy_builder.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, no prerequisites, and no conditions for avoiding misuse. It only implies sending documented case fields.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lead_magnetsC

Read-only

Inspect

Aimants à leads — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Spendesk — Guide trésorerie startup SaaS B2B FR/EU (2024). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`icp`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`brand`	Yes
`leadMagnet`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint true, so the description's mention of 'validated server-side' adds some behavioral context but does not go beyond what annotations imply. No additional behavioral traits disclosed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, no redundant information. Efficient for the limited content, but could be more front-loaded with purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema and complex nested input, the description fails to describe the deliverable's structure or what the agent can expect. The reference case is helpful but insufficient for a complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (only async parameter described). The description does not elaborate on the required nested objects or fields, relying on 'send the documented case fields' which adds no meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states the tool returns 'a structured, audited deliverable' for lead magnets, but the verb 'Returns' is generic and the purpose is not explicitly stated as generating or creating. It doesn't distinguish from siblings like competitive_deep_dive.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives or any exclusions. The description fails to provide context for appropriate usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

legal_clause_extractorA

Read-onlyIdempotent

Inspect

Structured extraction of clauses, obligations and deadlines from legal documents (SaaS contracts, NDAs, employment agreements, loan agreements, leases, M&A deals, IP licences). Complements contract_risk_scanner with granular per-clause output.

ICP: legal ops, M&A lawyers, paralegals, contract managers, compliance officers.

Capabilities: • Auto-detects document type (7 types) and language (EN/FR/DE/ES/PT) • Extracts parties with roles (buyer, seller, licensor, employee, etc.) • Splits document into sections and classifies 16+ clause types • Per-clause: 20 obligation patterns (EN/FR/DE), 10 deadline patterns, 18 risk detectors • Document-level: red flags (liability cap, auto-renewal, IP overreach, etc.), missing clauses per doc type • Global deadline calendar with P0/P1/P2 severity • Cross-reference map between sections • Cache: 7 days (legal docs stable once provided)

100% pure compute — no external fetch required. Accepts 10k–100k char documents.

ParametersJSON Schema

Name	Required	Description
`lang`	No	Optional. Language hint (e.g. 'en', 'fr', 'de'). Defaults to auto-detection.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`document_text`	Yes	Full text of the legal document (10k–100k chars typical). Plain text or lightly HTML-formatted. EN/FR/DE/ES/PT supported.
`document_type`	No	Optional. Document type hint. Defaults to auto-detection. Use "auto" or omit to let the tool detect from content.
`target_clauses`	No	Optional. Filter extraction to specific clause types. E.g. ["term", "termination", "liability", "ip", "confidentiality", "governing_law", "indemnification"]. If omitted or empty, all clauses are extracted.

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`red_flags`	Yes
`word_count`	Yes
`jurisdiction`	No
`governing_law`	No
`lang_detected`	Yes
`quality_score`	Yes
`effective_date`	No
`cross_references`	Yes
`parties_detected`	Yes
`clauses_extracted`	Yes
`key_deadlines_global`	Yes
`document_type_detected`	Yes
`missing_clauses_expected`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint, destructiveHint) are consistent. Description adds pure compute claim, cache duration, auto-detection, async option, and document size constraints, going well beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with bullet points and clear sections. Front-loaded with purpose and ICP. Each sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 5 params and output schema, description covers all essential aspects: document types, extraction capabilities, output details (clauses, red flags, calendar), caching, and async pattern. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

100% schema coverage with clear parameter descriptions. Description adds value by explaining auto-detection behavior, typical document length, and filtering for target_clauses, enriching understanding beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool extracts clauses, obligations, and deadlines from legal documents, specifies 7 document types and 16+ clause types. Distinguishes from sibling contract_risk_scanner by mentioning complementary granular output.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Specifies ICP (legal ops, lawyers, etc.) and complements contract_risk_scanner, but lacks explicit when-not-to-use or alternative tools beyond one sibling reference.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lgpd_data_subject_rights_automatorB

Read-onlyIdempotent

Inspect

Automates LGPD Data Subject Access Requests (DSARs) for legal teams, handling Brazil-specific data retention, erasure, and access workflows. Accepts user identifiers, request type (access/rectification/deletion), and optional scope filters. Returns structured response with compliance status, warnings, and source references to Brazilian LGPD and CNIL decisions.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`scope`	No	Optional list of data categories to limit the request
`urgency`	No	Priority level for processing
`requestType`	Yes	Type of LGPD request
`userIdentifier`	Yes	CPF, email, or other unique identifier for the data subject

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`dataCategories`	No
`erasureDeadline`	No
`complianceStatus`	No
`retentionPeriodDays`	No

Tool Definition Quality

B3.1/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description contradicts annotations: it describes erasure and deletion workflows, implying state modification, while annotations have readOnlyHint=true. This is a serious inconsistency. Additional behavioral details (e.g., async processing) are omitted.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, efficiently covering purpose, inputs, and output. It is front-loaded and avoids redundancy. Slight improvement could include structured breakdown.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having output schema and annotations, the description fails to address the contradiction with readOnlyHint, omits async behavior details, and does not clarify the implications of the 'openWorldHint' annotation. This leaves gaps in understanding the tool's full behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description reiterates parameter types (user identifiers, request type, scope filters) but adds no new semantic meaning beyond what the schema provides. The mention of the output structure adds context but does not enhance parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool automates LGPD DSARs for legal teams, specifying Brazil-specific workflows. It uses strong verbs and precise resource identification, distinguishing it from generic data subject request tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for LGPD DSARs by listing accepted inputs, but lacks explicit guidance on when to use versus alternatives or when not to use. No reference to sibling tools is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lnd_ai_skill_forecastA

Read-onlyIdempotent

Inspect

Forecasts AI skill demand trends for CHROs by analyzing patent filings (USPTO PatFT) and job postings (BLS API). Returns 12-month skill demand projections with confidence scores, helping HR leaders prioritize workforce upskilling. Inputs: target AI skills (e.g., 'machine learning', 'NLP'), geographic focus (US state/country), and forecast horizon. Outputs include skill growth rates, patent filing trends, and job posting volumes. Keywords: AI workforce planning, skill gap analysis, talent strategy, patent trends, labor market data.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	Yes	Geographic focus (US state code or 'US' for national, e.g., 'CA', 'US')
`skills`	Yes	List of AI-related skills to forecast (e.g., ['machine learning', 'computer vision'])
`horizon_months`	No	Forecast horizon in months (3-24, default 12)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`forecast`	No
`metadata`	No
`warnings`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds behavioral context: it uses patent and job posting data, returns 12-month projections with confidence scores, and targets HR leaders. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences, each serving a purpose: function/data source, outputs/benefit, inputs, outputs detail, keywords. No redundant information, and it is front-loaded with the core action. Efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple data sources, spatiotemporal parameters) and the presence of an output schema, the description adequately covers purpose, inputs, and outputs. It could mention data freshness or limitations, but the output schema likely handles return value details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, providing descriptions for all parameters. The description adds meaning by linking parameters to outputs (e.g., skills and region lead to growth rates and filing trends). It explains the role of horizon_months as a forecast length. This adds value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Forecasts') and resource ('AI skill demand trends'), specifies the target audience (CHROs), and details data sources (USPTO PatFT, BLS API). It distinguishes from sibling tools by its unique focus on AI skill demand forecasting, even without explicit sibling differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for workforce planning and upskilling prioritization, but lacks explicit guidance on when to use this tool over alternatives like 'talent_intelligence' or 'job_postings_intelligence'. No when-not-to-use or alternative comparisons are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lnd_skill_taxonomy_builderA

Read-onlyIdempotent

Inspect

Generates a dynamic skill taxonomy for CHROs by cross-referencing patent filings (USPTO), job postings (BLS), and learning & development data (OECD). Inputs include industry codes, job roles, or skill clusters; outputs structured skill hierarchies with demand trends and competency gaps. Essential for workforce transformation, talent pipeline optimization, and future-proofing organizational capabilities. — pass async:true REQUIRED to avoid x402 timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`jobRole`	No	Target job role or occupation (e.g., 'Data Scientist')
`industry`	Yes	NAICS industry code or sector name (e.g., '541511' for IT services)
`timeRange`	No	Time range for trend analysis
`skillCluster`	No	Optional skill cluster to focus taxonomy (e.g., 'AI/ML')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`skillTaxonomy`	No
`industryTrends`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, and idempotentHint. The description adds behavioral context: it is computationally heavy (x402 timeout) and requires async mode for reliable execution. It also describes the multi-source cross-referencing process. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph that fronts the purpose, then covers usage context, and ends with the async requirement. It is not overly verbose, though the phrase 'Essential for...' is slightly promotional. Overall, it is well-structured for an AI agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, the description need not detail return values, but it does mention the type of output (skill hierarchies, demand trends, competency gaps). It covers data sources, inputs, and the critical async requirement. For a moderately complex tool, it provides sufficient context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by noting that async must be true to avoid timeout, which is a crucial requirement not in the schema description. It also paraphrases input types, but does not elaborate on enum values or format beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it generates a dynamic skill taxonomy for CHROs by cross-referencing specific data sources (USPTO, BLS, OECD). It specifies outputs (structured skill hierarchies with demand trends and competency gaps) and mentions valid inputs (industry codes, job roles, skill clusters). While there is a sibling tool 'lnd_ai_skill_forecast' that may overlap, the description is specific enough to stand alone.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context for use (workforce transformation, talent pipeline optimization) and explicitly requires passing async:true to avoid timeouts. However, it does not mention when not to use the tool or compare it to alternatives like 'lnd_ai_skill_forecast'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

logistics_esg_incident_trackerB

Read-onlyIdempotent

Inspect

Tracks real-time ESG incidents in logistics networks for COOs, including supply chain disruptions, regulatory violations, and sustainability risks. Inputs: geographic region, incident type (e.g., emissions, labor, deforestation), and time range. Outputs: structured incident data with severity, location, and source verification. Uses CDP open data and UNCTAD STAT for comprehensive coverage. Keywords: ESG, logistics, supply chain, sustainability, compliance, risk management.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	Yes	Geographic region filter (e.g., 'Europe', 'Asia', 'Global')
`endDate`	No	End date for incident search (ISO 8601)
`severity`	No	Minimum severity level to include
`startDate`	No	Start date for incident search (ISO 8601)
`incidentType`	Yes	Type of ESG incident to track

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`summary`	No
`warnings`	No
`incidents`	No

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent behavior. The description adds data sources (CDP, UNCTAD STAT) and output structure, but does not disclose other behavioral traits like pagination or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph that is efficient and front-loaded with the main purpose. It lists inputs, outputs, and data sources without excess words, though keywords at the end could be omitted.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of annotations and output schema, the description is fairly complete but misses mentioning the async parameter and details on severity scale. It adequately covers purpose and data sources.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description adds little beyond listing a subset of inputs (region, incidentType, time range). It does not explain the async parameter or severity enum details, which are already in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it tracks real-time ESG incidents in logistics networks, listing inputs and outputs. It identifies target users (COOs) and gives examples, but does not explicitly differentiate from sibling tools like supplier_esg_audit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context on when to use (ESG incident tracking for logistics), but lacks explicit guidance on when not to use or alternatives. No exclusions or comparative statements.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ma_arbitrage_hunterA

Read-onlyIdempotent

Inspect

As a CFO, identify cross-border M&A arbitrage opportunities by comparing target company valuations across different jurisdictions. Inputs include target company ticker, primary and secondary jurisdictions, and valuation metrics. Outputs include valuation gaps, FX-adjusted multiples, and jurisdiction-specific premiums/discounts. Uses real-time ECB FX rates, Yahoo Finance market data, and SEC EDGAR filings for public companies. Ideal for quick assessment of potential arbitrage in M&A scenarios.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`sector`	No	Industry sector for peer comparison (e.g., 'Technology')
`targetTicker`	Yes	Target company ticker symbol (e.g., 'AAPL')
`valuationMetric`	No	Valuation multiple to use for comparison
`primaryJurisdiction`	Yes	Primary jurisdiction for valuation comparison (e.g., 'US')
`secondaryJurisdiction`	No	Secondary jurisdiction for valuation comparison (e.g., 'DE')

Output Schema

ParametersJSON Schema

Name	Required	Description
`fxRate`	No
`status`	Yes
`sources`	No
`warnings`	No
`valuationGap`	No
`peerMultiples`	No
`targetCompany`	No
`primaryValuation`	No
`secondaryValuation`	No
`jurisdictionPremium`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds value by explaining the tool uses real-time data from ECB, Yahoo Finance, and SEC EDGAR, and is suitable for quick assessments. It does not contradict annotations; behavioral traits are adequately covered for a read-only, idempotent tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, with the first sentence front-loading the core purpose (verb, resource, inputs, outputs). No wasted words; every sentence adds value. The structure is clean and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, output schema exists, high schema coverage), the description provides sufficient context: it explains the problem, inputs, outputs, data sources, and use case. The existence of an output schema means the agent doesn't need return value details here, so the description is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema provides 100% coverage with descriptions for all 6 parameters. The overall description contextualizes parameters (e.g., comparing jurisdictions) but does not add significant detail beyond the schema for individual parameters like 'async' or 'sector'. Per the rules, with high schema coverage, a baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: identifying cross-border M&A arbitrage opportunities by comparing valuations across jurisdictions. It specifies inputs (ticker, jurisdictions, valuation metrics), outputs (gaps, multiples, premiums/discounts), and data sources (FX rates, market data, SEC filings), distinguishing it from sibling tools like ma_tax_efficiency_mapper or ma_deal_screener.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions it's 'ideal for quick assessment of potential arbitrage in M&A scenarios,' providing clear context. However, it does not explicitly state when not to use it or suggest alternative tools, such as ma_deal_screener for full deal screening or ma_tax_efficiency_mapper for tax-focused analysis, which are available among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ma_deal_screenerC

Read-only

Inspect

M&A Deal Screener — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: Salesforce M&A targets — 12 cibles screened · fit score + valuation + integration risk. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`acquirer`	Yes
`criteria`	Yes

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds only that it returns a deliverable and validates inputs server-side, which is minimal. It does not disclose behavioral traits like rate limits, what 'audited' entails, or error handling beyond validation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise at three sentences, but the first sentence is jargon-heavy and unclear. Some structure is present but could be better organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, 4 parameters, no output schema), the description is missing critical information about the output format, how to interpret results, and prerequisites. It does not sufficiently guide an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (25%) – only `async` has a description. The description fails to explain the key parameters (acquirer, criteria) beyond saying 'send the documented case fields,' presuming external knowledge.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific language like 'M&A Deal Screener' and 'returns a structured, audited deliverable' with a reference case, making the purpose clear. However, jargon like 'Gapup agent-payable C-suite expertise' obscures meaning, and there is no differentiation from sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus the many sibling tools (e.g., ma_tax_efficiency_mapper). It offers no when/when-not scenarios or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

manufacturing_esg_compliance_mapperA

Read-onlyIdempotent

Inspect

As a COO, quickly identify ESG compliance gaps across manufacturing facilities using EPA TRI emissions data and GRI sustainability standards. Input facility identifiers or geographic regions to receive a prioritized remediation roadmap with risk scores, regulatory violations, and suggested corrective actions. Ideal for sustainability reporting, regulatory risk assessment, and operational improvement planning. Keywords: ESG compliance, manufacturing facilities, EPA TRI, GRI standards, sustainability reporting, regulatory risk.

ParametersJSON Schema

Name	Required	Description
`year`	No	Reporting year (default: current year - 1)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	No	Geographic region (state, county, or ZIP code) for facility search
`includeGri`	No	Include GRI standards analysis (default: true)
`facilityIds`	Yes	List of EPA facility identifiers (e.g., TRIFID)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`summary`	No
`warnings`	Yes
`facilities`	Yes

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the tool is known to be safe. The description adds behavioral context: it uses EPA TRI and GRI data, outputs a prioritized roadmap with risk scores and violations. This goes beyond the annotations, though it does not cover all edge cases (e.g., missing facilities).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, front-loaded paragraph that efficiently conveys purpose and use case. It avoids redundancy, though the keyword list could be considered slightly extraneous. Overall, it is concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 5 parameters and an output schema, the description provides adequate context: it explains what inputs are used (facility IDs or regions), what data sources (EPA TRI, GRI), and what output to expect (roadmap with scores, violations, actions). It is sufficiently complete for an AI agent to understand the tool's role.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the description does not add significant meaning beyond parameter names and descriptions. It mentions 'facility identifiers or geographic regions' which maps to facilityIds and region, but these are already described in the schema. No additional constraints or format details are provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool identifies ESG compliance gaps using EPA TRI and GRI standards, and outputs a remediation roadmap with risk scores. It is specific about the resource and action, but does not explicitly differentiate from sibling tools like sustainability_report or supplier_esg_audit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions ideal use cases (sustainability reporting, regulatory risk assessment) but does not provide explicit guidance on when not to use this tool or suggest alternative tools. Usage context is implied rather than explicitly stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

manufacturing_waste_heatmapA

Read-onlyIdempotent

Inspect

Generates manufacturing waste heatmaps for COOs using EPA TRI and FAOSTAT data. Input manufacturing site identifiers or geographic regions to analyze waste streams, emissions, and resource inefficiencies. Outputs include waste intensity maps, circular economy opportunity rankings, and cost-saving potential. Ideal for sustainability strategy and operational efficiency improvements. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`year`	Yes	Analysis year (2010-2023)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	No	Geographic region (country code or sub-national region) for aggregated analysis
`site_ids`	No	List of manufacturing site identifiers (EPA TRI IDs or FAO facility codes)
`waste_types`	No	Specific waste types to analyze (e.g., ['metals', 'chemicals', 'energy'])

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`heatmap_data`	No
`opportunities`	No
`benchmark_data`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint) already indicate safety, but the description adds value by revealing async behavior ('Pass async:true to avoid timeout') and data sources (EPA TRI, FAOSTAT), which imply latency and freshness. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single focused paragraph, front-loaded with the primary action. It is concise with no redundant sentences, though the async note could be integrated more naturally.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and annotations, the description adequately covers key aspects: purpose, inputs, outputs, and timeout mitigation. It could mention how results are returned (e.g., job_id polling) but is otherwise complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed parameter descriptions. The description restates input types (sites, regions, waste types) but adds no new semantic nuance beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: generating manufacturing waste heatmaps using EPA TRI and FAOSTAT data. It specifies inputs (site identifiers or regions) and outputs (waste intensity maps, circular economy rankings, cost-saving potential), distinguishing it from siblings like 'manufacturing_esg_compliance_mapper' and 'procurement_six_sigma_waste_hunter'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the ideal use case ('sustainability strategy and operational efficiency improvements') but does not explicitly compare to alternatives or state when not to use. Sibling tools with overlapping ESG focus lack exclusion criteria, leaving usage decisions implicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

margin_doctorC

Read-only

Inspect

Marge par deal — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — 8 deals pipeline · €28k ARR sous-marge détecté · Récupération €4.2k/an · Playbook 4 scénarios. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`deals`	Yes
`company`	Yes
`product`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description adds moderate value by stating it returns a structured deliverable and that inputs are validated server-side. But no mention of output format, async behavior, or potential side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is short but includes a lengthy reference case that may not be essential. It front-loads the purpose but could be more concise and structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested objects, 4 params, no output schema), the description is incomplete. It does not explain the return structure, how to interpret the deliverable, or how this tool fits into the broader workflow.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 25% (only async described). Description does not compensate: it mentions 'documented case fields' but doesn't explain what company, product, or deals mean or how they are used. Low coverage requires more descriptive help.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it analyzes deals to find margin deficits and returns an audited deliverable. 'Marge par deal' and reference case give context. However, it's somewhat jargon-heavy and could be clearer about the exact output.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs siblings like margin_doctor_finance. No when-not-to-use or alternative recommendations. The reference case implies a typical use but lacks explicit direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

margin_doctor_financeC

Read-only

Inspect

Médecin des Marges — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Alan — ARR €60M · marge brute 68% → 79% · €3,2M fuites identifiées · Rule of 40 : 14→38. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`costBreakdown`	Yes
`marginTargets`	Yes
`unitEconomics`	Yes
`incomeStatement`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint and openWorldHint; description adds 'validated server-side' but does not contradict or significantly extend behavior beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise at two sentences plus a reference case. The case adds specificity but may distract. Front-loaded with purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Complex tool with 6 parameters including nested objects, no output schema, and no explanation of the deliverable format or caveats. The brief description leaves significant gaps for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (async documented). The description does not explain key parameters (company, incomeStatement, etc.) beyond a generic reference to 'documented case fields'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides CFO-level margin analysis ('Médecin des Marges', 'C-suite expertise (CFO)') and returns a structured deliverable, with a concrete reference case. However, it does not explicitly distinguish it from the sibling tool 'margin_doctor'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs siblings like 'margin_doctor' or other financial analysis tools. Only a vague instruction to 'send the documented case fields' without context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

market_entry_strategistB

Read-only

Inspect

Stratégie d'entrée marché — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: OpenAI Inde 2026 — entrée marché 1.4Md utilisateurs · 5 forces Porter + 4 entry modes + 18-month roadmap + risk register. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`preferences`	Yes
`targetMarket`	Yes

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that it returns a 'structured, audited deliverable' and validates inputs server-side. This provides some behavioral context but does not fully disclose the tool's AI-generated nature or potential external data usage beyond what the annotations imply.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences and a reference example. It front-loads the purpose. However, it could benefit from a more structured format (e.g., bullet points) to enhance readability for an agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with nested parameters and no output schema, the description provides a high-level understanding of the deliverable but lacks detail on return format, limitations, or error handling. The reference case helps but is not sufficient for complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is very low (20%), with only the async parameter described. The description offers no explanation of the required parameters (company, targetMarket, preferences) or their nested structure, leaving the agent to infer meaning from the schema alone. This is insufficient for correct invocation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides a market entry strategy deliverable with specific components (Porter's five forces, entry modes, roadmap, risk register) and gives a concrete reference case. However, it does not explicitly differentiate itself from similar sibling tools like geographic_expansion or growth_path_architect, leaving some ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lacks guidance on when to use this tool versus alternatives. It only mentions that inputs are validated server-side and to send documented case fields, but no explicit conditions or exclusions are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

marketing_roi_dashboardC

Read-only

Inspect

Dashboard ROI marketing — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Gapup Hub — H1 2026 · 5 canaux · ROI 3.2× · Attribution W-shaped · Budget €60k. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`arpuEur`	Yes
`channelData`	Yes
`companyName`	Yes
`periodLabel`	Yes
`totalRevenueAttribEur`	Yes
`targetAttributionModel`	Yes
`currentAttributionModel`	Yes
`totalMarketingBudgetEur`	Yes

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, and description adds validation context ('Inputs are validated server-side'), but no additional behavioral traits (e.g., permissions, side effects) are disclosed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short but starts with somewhat cryptic phrases ('Gapup agent-payable C-suite expertise') rather than a clear purpose statement. It is not front-loaded with critical info.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 9 parameters (8 required) and no output schema, the description lacks details on output format, parameter usage, or expected values. The reference case is helpful but insufficient for complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 11%; only the 'async' parameter has a description. The description does not explain required fields like 'companyName', 'channelData', or attribution models, offering minimal help beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description mentions 'Dashboard ROI marketing' and 'Returns a structured, audited deliverable', hinting at ROI computation, but it does not state the exact function (e.g., compute ROI) and lacks differentiation from sibling tools like 'market_entry_strategist'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives. The reference case provides an example but does not explain ideal scenarios or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

market_research_briefA

Read-only

Inspect

Generate a structured, sourced market research brief on any market, sector or industry. Returns a machine-readable note with six sections: an executive overview, a market-size estimate (with assumptions and sources — no invented figures), key players, demand & technology trends, risk factors, and a traceable source list. When to use this tool: an agent needs to assess a new market, validate a business opportunity, prepare a pitch, or benchmark a sector before a strategic decision. Data is assembled live from keyless public sources: Wikipedia (sector context), World Bank (macro GDP/population for market sizing), REST Countries (geo context). Fields that cannot be sourced are marked 'unavailable' rather than estimated. Inputs: topic (required), geo and sector (optional refinements).

ParametersJSON Schema

Name	Required	Description
`geo`	No	Optional geography to scope the brief (country name, region, or continent — e.g. 'France', 'Southeast Asia')
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`topic`	Yes	Market or sector to research (e.g. 'electric vehicle batteries', 'B2B SaaS CRM Europe', 'telemedicine Africa')
`sector`	No	Optional parent sector to disambiguate the topic (e.g. 'healthcare', 'energy', 'software')

Output Schema

ParametersJSON Schema

Name	Required	Description
`geo`	Yes
`risks`	Yes
`topic`	Yes
`sector`	Yes
`trends`	Yes
`sources`	Yes	All sources consulted, with URL and retrieval status
`overview`	Yes	Executive summary of the market
`key_players`	Yes
`generated_at`	Yes	ISO-8601 timestamp of generation
`market_size_estimate`	Yes	Market size estimate with hypotheses. All figures sourced or marked unavailable.

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses live data sources (Wikipedia, World Bank, REST Countries) and honesty policy ('unavailable' rather than estimated). This adds significant context beyond annotations (readOnlyHint, openWorldHint), which already indicate safe read behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single focused paragraph that front-loads purpose and lists sections efficiently. Minor improvement would be breaking into bullet points for readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, output schema exists), the description covers purpose, sourcing behavior, output format, and missing data handling comprehensively. The output schema handles return values, so this is sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good individual parameter descriptions. The tool description merely restates which parameters are required/optional, adding no extra semantic value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates a structured market research brief with six specific sections, distinguishing it from sibling tools like 'market_sizing' or 'competitive_deep_dive' which focus on narrower aspects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit 'when to use' scenarios are provided (assess new market, validate opportunity, prepare pitch, benchmark sector). However, it does not explicitly state when not to use this tool versus alternatives, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

market_sizingB

Read-only

Inspect

Dimensionnement marché TAM/SAM/SOM — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Gapup Hub — TAM/SAM/SOM IA décisionnelle C-suite Europe · TAM €48Md · SOM €280M Year-3. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`target`	Yes
`horizon`	No
`product`	Yes
`approach`	No
`competitorComps`	No

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that it returns a structured, audited deliverable and inputs are validated server-side. This provides some behavioral context but does not fully disclose the output format or potential side effects beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short and front-loaded with the main purpose. It includes a useful reference case. However, the mix of French and English ('Dimensionnement marché') may reduce clarity for some agents. Overall, it is efficient but could be polished.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 params, nested objects, no output schema), the description provides a high-level purpose and validation hint but lacks details on expected output structure, exact usage steps, or parameter relationships. The reference case offers some context, but completeness is moderate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (only 'async' has a description). The main description mentions 'send the documented case fields' but does not elaborate on parameter semantics. For a tool with 6 parameters including nested objects, the description provides insufficient guidance on how to populate fields like 'target', 'product', 'approach', etc.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs market sizing (TAM/SAM/SOM) and returns a structured deliverable. The verb 'Dimensionnement marché' is specific, but it does not explicitly differentiate from siblings like 'market_entry_strategist' or 'market_research_brief'. However, the focus on TAM/SAM/SOM is distinct enough.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use when you have 'documented case fields' and provides a reference case, but no explicit guidance on when to use vs alternatives, prerequisites, or when not to use. It is implicit rather than directive.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ma_tax_efficiency_mapperA

Read-onlyIdempotent

Inspect

For CFOs evaluating cross-border M&A deals: analyzes tax efficiency by mapping withholding tax rates, transfer pricing regulations, and permanent establishment risks across specified jurisdictions. Inputs include acquirer/target jurisdictions, deal structure, and transaction value. Outputs jurisdiction-specific tax exposure, efficiency scores, and risk flags. Uses World Bank Tax Rates API, IMF SDR data, and SEC EDGAR filings for corporate tax disclosures. — pass async:true REQUIRED to avoid x402 timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`deal_structure`	No	Type of M&A transaction structure
`transaction_value`	No	Deal value in USD millions
`target_jurisdiction`	Yes	ISO 3166-1 alpha-3 country code of the target entity
`acquirer_jurisdiction`	Yes	ISO 3166-1 alpha-3 country code of the acquiring entity
`include_transfer_pricing`	No	Whether to analyze transfer pricing risks

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`tax_treaties`	No
`efficiency_score`	No
`target_tax_rates`	No
`acquirer_tax_rates`	No
`transfer_pricing_risk`	No
`permanent_establishment_risk`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond annotations: it mentions external data sources ('World Bank Tax Rates API, IMF SDR data, and SEC EDGAR filings') and the async requirement to avoid timeout. Annotations already declare readOnlyHint=true, idempotentHint=true, and openWorldHint=true, and the description is consistent with these. The description also outlines output types (tax exposure, efficiency scores, risk flags).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: one paragraph that front-loads the target audience and purpose. Every sentence adds value (audience, function, inputs, outputs, data sources, async requirement). Minor improvement could be separating the async note as a warning, but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, external APIs, async requirement) and the presence of annotations and output schema, the description covers purpose, inputs, outputs, data sources, and a critical usage constraint. It does not explain output interpretation in detail, but that is likely handled by the output schema. Completeness is high for the available structured information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description lists some parameters (acquirer/target jurisdictions, deal structure, transaction value) but does not add semantic detail beyond what the schema provides. For example, it does not explain the enum values for deal_structure or the unit for transaction_value beyond what is in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'analyzes tax efficiency by mapping withholding tax rates, transfer pricing regulations, and permanent establishment risks across specified jurisdictions.' It uses specific verbs ('analyzes', 'maps') and resources ('tax efficiency across jurisdictions'), distinguishing it from sibling tools like ma_arbitrage_hunter and tax_optimization.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context: 'For CFOs evaluating cross-border M&A deals' and lists required inputs (acquirer/target jurisdictions, deal structure, transaction value). It also includes a critical usage note: 'pass async:true REQUIRED to avoid x402 timeout.' However, it does not explicitly state when not to use this tool or suggest alternatives, which would have made it a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

meddic_scoringC

Read-only

Inspect

Scoring MEDDIC du pipeline — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Pipeline 8 deals · €2.1M · MEDDIC score moyen 62/100 · 3 deals at-risk. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`deals`	Yes
`company`	Yes
`product`	Yes
`salesCycle`	No
`targetWinRate`	No

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description adds moderate value by mentioning server-side validation and structured output. However, it does not detail return format, auth needs, or any edge cases beyond the reference case.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise with four sentences. The purpose is front-loaded, but the reference case could be considered extraneous. Overall, it is efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex nested input (6 params, 3 required) and no output schema, the description is insufficient. It does not explain the MEDDIC methodology, how to structure the deal objects, or what the deliverable contains beyond 'audited'.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, yet the description provides no parameter details beyond 'send the documented case fields'. It fails to explain how to populate the complex nested 'deals' array or the meaning of MEDDIC dimensions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose (scoring MEDDIC for a pipeline) and mentions it returns a structured deliverable. It provides a specific reference case. However, it does not differentiate from sibling tools like 'deal_coach' or 'sales_pipeline_forecast'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description lacks explicit context for usage, such as prerequisites or scenarios. It does not mention when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

model_behavior_drift_monitorA

Read-onlyIdempotent

Inspect

Monitors AI model output drift by comparing current model responses against MLCommons safety benchmarks. Designed for risk and compliance personas to detect behavioral deviations that may indicate safety or alignment issues. Accepts model outputs or identifiers and returns structured drift metrics with statistical significance. Sources data from MLCommons public benchmark APIs.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`threshold`	No	Drift threshold for alerting
`currentOutputs`	No	Recent model outputs to analyze for drift
`baselineMetrics`	No
`modelIdentifier`	Yes	Unique identifier for the model being monitored

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`driftMetrics`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, and open-world hints. The description adds context about data source (MLCommons), async polling via parameter, and acceptance of model outputs or identifiers, providing useful behavioral details beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (3 sentences), front-loaded with purpose, and efficiently covers audience, inputs/outputs, and data source without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 parameters, nested objects, async support) and presence of output schema, the description is largely complete. It covers purpose, audience, data source, and input options, though it could mention error handling or offline behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 80%, but the tool description does not elaborate on parameter meanings. The baselineMetrics parameter lacks a description in both schema and description, leaving some ambiguity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it monitors AI model output drift against MLCommons safety benchmarks, specifying the target persona (risk and compliance) and the output structure. It distinguishes from siblings by focusing on drift detection with a specific benchmark source.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description identifies the intended users (risk and compliance) and the use case (detecting behavioral deviations), but lacks explicit guidance on when not to use or alternatives among sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

model_safety_certification_checkerA

Read-onlyIdempotent

Inspect

Verifies AI model safety certifications against MLCommons and IEEE 7000 standards. Designed for risk management personas to assess model compliance with established safety benchmarks. Accepts model identifiers or certification IDs and returns structured verification results with source references.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`model_id`	Yes	Unique identifier for the AI model
`standard`	No	Safety standard to check against
`certification_id`	No	Specific certification ID to verify

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`compliance`	No
`last_verified`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, and idempotentHint. The description adds that it returns structured verification results with source references, which goes beyond the annotations by clarifying output format and content.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, with the first sentence stating the core purpose. All sentences are necessary and tightly written, with no fluff or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of detailed input schema and output schema, the description covers the tool's purpose, input types, supported standards, and output characteristics sufficiently for an agent to understand when and how to invoke it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description adds minimal value. It reiterates that the tool accepts model identifiers or certification IDs, but does not clarify the relationship between the two parameters beyond what the schema already conveys.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool verifies AI model safety certifications against MLCommons and IEEE 7000 standards, with a specific verb and resource. It distinguishes itself from siblings by focusing on safety certification, a unique capability not offered by other tools in the list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates it is designed for risk management personas and for assessing model compliance, providing some context. However, it does not explicitly mention when not to use this tool or suggest alternatives, leaving out exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

monte_carlo_portfolioA

Read-only

Inspect

Pure-compute Monte Carlo portfolio simulation using Geometric Brownian Motion (GBM). Models a multi-asset portfolio across time with contributions, withdrawals, and annual rebalancing. Returns full probability distribution of terminal wealth, percentile paths, drawdown stats, and Sharpe ratio. Modes: simulate (full Monte Carlo) | glide_path (lifecycle 110-age target-date allocation) | stress_test (4 historical crises: 2008 GFC / 2000 dotcom / 1970s stagflation / 2020 COVID). No external data needed — all computed from asset assumptions. Ticker defaults built-in: SPY/VOO/VTI 7%/15%, QQQ 9%/20%, TLT/BND 3%/6%, GLD 5%/18%, BTC 30%/70%. ICP: asset managers, family offices, retail wealth advisors, robo-advisor agents, retirement planners. 10k simulations × 30 years runs in <3s on V8 JIT.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	simulate = full Monte Carlo GBM \| glide_path = lifecycle target-date allocation \| stress_test = 4 historical crisis scenarios
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`assets`	Yes	Portfolio assets. Weights must sum to 1.0 (auto-normalized if not).
`simulations`	No	Number of Monte Carlo simulations (1000-100000). Default 10000.
`horizon_years`	Yes	Investment horizon in years (1-50).
`target_value_eur`	No	Target terminal portfolio value in EUR. Used to compute probability_target_achieved.
`confidence_intervals`	No	Percentiles to compute in the output distribution. Default [5, 25, 50, 75, 95].
`initial_investment_eur`	Yes	Initial capital in EUR (e.g. 100000 for €100k).
`withdrawals_annual_eur`	No	Annual withdrawal amount in EUR for decumulation phase (e.g. 50000 for €50k/yr).
`contributions_annual_eur`	No	Annual contribution in EUR (e.g. 12000 for €1000/month).

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations: it specifies computational properties (pure-compute, no external data, performance timing), output details (probability distribution, percentile paths, drawdown stats, Sharpe ratio), and modes. No contradiction with annotations (readOnlyHint=true, openWorldHint=true).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise given the complexity, front-loading the core purpose and modes. Every sentence adds value, though it could be slightly tightened. No redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 10 parameters and no output schema, the description is remarkably complete: it explains all modes, provides examples, mentions output types, performance, and default tickers. It covers edge cases and usage intent well.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description does add some value by explaining modes and ticker defaults not in schema, but it does not elaborate on parameter syntax or constraints beyond what the schema provides. Thus adequate but not exceptional.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a 'Pure-compute Monte Carlo portfolio simulation using Geometric Brownian Motion (GBM)' and lists specific modes (simulate, glide_path, stress_test). It is unique among siblings, many of which are unrelated or from different domains, so no confusion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context on when to use this tool (e.g., 'No external data needed', '10k simulations × 30 years runs in <3s'), and lists target users. However, it does not explicitly contrast with sibling tools or say when not to use it, so a slight deduction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

mttr_breakdown_analyzerB

Read-onlyIdempotent

Inspect

As a CTO, analyze your team's incident response efficiency by breaking down Mean Time To Recovery (MTTR) into root causes: code defects, infrastructure failures, or process bottlenecks. This tool ingests GitHub issue and pull request data alongside Snyk vulnerability reports to provide a detailed breakdown of MTTR components, helping you identify systemic weaknesses in your incident resolution pipeline. Input your GitHub repository details and time range to receive a structured analysis of MTTR contributors with actionable insights.

ParametersJSON Schema

Name	Required	Description
`repo`	Yes	Full GitHub repository name (owner/repo)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`since`	Yes	Start date for analysis (ISO 8601)
`until`	Yes	End date for analysis (ISO 8601)
`snykToken`	No	Snyk API token for vulnerability data (optional)
`githubToken`	Yes	GitHub personal access token for API access

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`breakdown`	No
`topContributors`	No

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint=true, openWorldHint=true, idempotentHint=true) are consistent. Description adds context about data sources (GitHub, Snyk) and breakdown categories. However, it does not mention the async parameter's existence or potential latency, which is a gap given the tool's complexity.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, reasonably concise, and front-loaded with audience and action. Could be tighter but no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has an output schema, so detailed return format is not needed. However, the description lacks mention of external API usage, potential rate limits, or the async feature, which are relevant for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and each parameter is described. The description does not add new meaning beyond the schema; it only reiterates 'GitHub repository details and time range'. The async parameter is not mentioned in the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: analyze MTTR by breaking down into root causes (code defects, infrastructure failures, process bottlenecks) using GitHub and Snyk data. It is specific about inputs and outputs, but does not explicitly differentiate from sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use (analyzing incident response efficiency) but lacks guidance on when not to use or alternatives. No comparison to similar tools like incident_response_evidence_collector or change_failure_root_cause_classifier.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

nis2_supply_chain_dependency_mapA

Read-onlyIdempotent

Inspect

Generates a visual dependency map of supply chain relationships under the NIS2 Directive, scoring criticality based on regulatory sources like EUR-Lex and CNIL decisions. Designed for legal and compliance teams to identify high-risk third-party dependencies. Inputs include organization identifiers and optional scope filters. Outputs structured dependency data with criticality scores and regulatory references.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`depth`	No	Dependency chain depth to analyze
`scope`	No	Analysis scope: full supply chain or critical dependencies only
`sector`	No	NIS2 sector classification (e.g., 'energy', 'transport')
`organizationId`	Yes	Unique identifier for the organization (e.g., VAT number or LEI)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`dependencies`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, and openWorldHint, establishing safety and idempotency. The description adds context about the output (structured data, criticality scores, regulatory references) and the tool's analytical approach, which is consistent with annotations and provides additional behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loading the core purpose and method in the first sentence, and target audience and output in the second. Every sentence adds value with no redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the annotations and the presence of an output schema, the description covers the essential aspects: purpose, regulatory sourcing, target audience, and output characteristics. It is sufficient for an agent to understand the tool's role and invocation context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description briefly mentions 'organization identifiers and optional scope filters' but does not add significant meaning beyond the schema. It does not compensate for any missing parameter details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the tool generates a visual dependency map for NIS2 supply chain relationships, scoring criticality based on regulatory sources. It distinguishes from sibling tools like supplier_esg_audit and supply_chain_fx_exposure_dashboard by its unique regulatory focus and output structure.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states it is designed for legal/compliance teams to identify high-risk dependencies, implying usage context. However, it lacks explicit guidance on when not to use this tool versus alternatives, nor does it mention any prerequisites or exclusion criteria. The guidance is implied but not explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

observability_log_pattern_minerA

Read-onlyIdempotent

Inspect

As a CTO, extract anomalous log patterns from public breach reports (e.g., Verizon DBIR) and MITRE ATT&CK techniques to optimize SIEM rules and observability pipelines. Inputs include threat actor groups, MITRE tactics (e.g., 'TA0005'), or log sources (e.g., 'AWS CloudTrail'). Outputs structured patterns with MITRE mappings, prevalence scores, and detection recommendations. Ideal for reducing false positives and improving breach detection coverage. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`tactic`	Yes	MITRE ATT&CK tactic ID (e.g., 'TA0005')
`technique`	No	MITRE ATT&CK technique ID (e.g., 'T1059')
`log_source`	No	Log source type (e.g., 'AWS CloudTrail', 'Windows Event Log')
`max_results`	No
`threat_actor`	No	Threat actor group name (e.g., 'APT29')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`metadata`	No
`patterns`	Yes
`warnings`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, and idempotentHint. The description adds behavioral context about potential timeouts and async usage: 'Pass async:true to avoid timeout.' This goes beyond annotations. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph that is dense yet clear, front-loading the purpose and adding supporting details efficiently. Every sentence contributes value, with no redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, output schema, multiple inputs), the description covers purpose, inputs, output structure, and a usage tip. While the output schema is not described in detail, its existence is noted, and the description is sufficient for an agent to understand the tool's role in SIEM optimization.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description maps business inputs (threat actor groups, MITRE tactics, log sources) to parameters and adds context for async. With high schema coverage (83%), the description enriches understanding by explaining the tool's domain-specific use beyond the schema's technical descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'extract anomalous log patterns from public breach reports and MITRE ATT&CK techniques'. It specifies the verb 'extract' and the resource 'anomalous log patterns', and differentiates from siblings like 'observability_metric_anomaly_detector' by focusing on log patterns from breach reports rather than metrics.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context for when to use the tool ('optimize SIEM rules', 'reduce false positives') and mentions input types (threat actor groups, MITRE tactics, log sources). However, it does not explicitly contrast with alternatives like 'observability_metric_anomaly_detector' or state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

observability_metric_anomaly_detectorA

Read-onlyIdempotent

Inspect

As a CTO, quickly identify anomalous cloud metrics (CPU, latency, memory) by comparing your infrastructure against AWS public benchmarks and CVE-linked hardware risks. Input your observed metrics (e.g., CPU utilization, request latency) and receive a risk assessment with potential root causes. Ideal for performance troubleshooting, security hardening, and capacity planning. Keywords: cloud observability, anomaly detection, CVE hardware risks, AWS benchmark comparison.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	No
`metricType`	Yes
`instanceType`	No
`observedValue`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`cveRisks`	No
`warnings`	No
`anomalyScore`	No
`benchmarkValue`	No
`deviationPercent`	No

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, openWorldHint=true, idempotentHint=true, so the description does not need to reiterate read-only behavior. It adds 'anomaly detection' context but lacks details on output format or limitations (e.g., what if metrics are out of range). Minimal extra value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately concise but includes unnecessary elements like 'As a CTO' and keyword tags. The main action is front-loaded, but could be shorter without fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 5 parameters and an output schema, the description provides high-level purpose but lacks details on how benchmarks and CVE risks are used, input format for optional parameters, and the nature of the risk assessment. It is adequate but not comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%. The description explains metricType and observedValue (examples given) but does not cover region, instanceType, or async. Given low coverage, more parameter guidance was needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool identifies anomalous cloud metrics by comparing to AWS benchmarks and CVE risks. It lists specific metric types (CPU, latency, memory) and output (risk assessment, root causes). This uniquely distinguishes it from siblings like 'observability_log_pattern_miner' or 'cloud_cost_ri_optimizer'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions use cases: 'performance troubleshooting, security hardening, and capacity planning'. While clear, it does not explicitly state when not to use the tool or compare to alternatives, missing some guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

onboarding_salariesC

Read-only

Inspect

Onboarding opérationnel des salariés — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Pennylane (FR fintech SaaS, ~250 FTE) — 5 parcours 30/60/90 jours · Engineering / Sales / CS / Design / People Ops. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`roles`	Yes
`company`	Yes

Tool Definition Quality

C2.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, and the description adds context about returning an audited deliverable and server-side validation, which aligns with read-only behavior. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short (3 sentences), but includes a reference case that may be unnecessary for general usage. It is front-loaded with the title but could be more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex nested input schema and no output schema, the description does not explain what the deliverable contains or its format. It leaves significant gaps for an agent to understand the full behavior and output.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 25%, and the description adds no information about the parameters (async, focus, roles, company). It only says 'send the documented case fields' without explaining any parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns a structured, audited deliverable for operational onboarding of employees, referencing a specific case. However, it does not differentiate from sibling tools like 'comp_benchmark_geo_delta' or 'global_salary_inflation_adjuster', which may also deal with salaries.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions it is for C-suite (COO) and that inputs are validated server-side, but provides no guidance on when to use this tool over alternatives, nor any exclusions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

operational_dashboardsC

Read-only

Inspect

Dashboards opérationnels — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Qonto (5 départements · 12 KPIs) — 4 dashboards live en 3 semaines · time-to-décision -55%. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`techStack`	Yes
`departments`	Yes
`kpiRequests`	Yes
`primaryDashboardTool`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint and openWorldHint. Description adds that it returns a structured deliverable and validates inputs server-side. This supplements annotations modestly but does not contradict them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph mixing French and English, moderately concise. Could be better structured but is not overly verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of nested objects and no output schema, the description is incomplete. It does not describe the deliverable structure, return values, or usage nuances.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, and the description does not explain any parameters beyond 'send the documented case fields'. It fails to compensate for the low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns a structured, audited deliverable for operational dashboards targeting C-suite (COO). The reference case provides concrete context. However, it does not explicitly distinguish from sibling tools, which is acceptable given the unique focus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description implies it's for operational dashboard creation, but lacks when-not-to-use or alternative suggestions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

oss_dependency_velocity_trackerA

Read-onlyIdempotent

Inspect

As a CTO, track the update velocity of your project's open-source dependencies to assess their impact on DORA metrics like deployment frequency and lead time. This tool fetches release history and version adoption data from npm registry and libraries.io, providing insights into dependency freshness, update frequency, and potential risks. Input a list of package names and optional version ranges to analyze. Outputs structured dependency velocity metrics and warnings about stale or rapidly changing packages.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`packages`	Yes
`lookbackDays`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`metrics`	No
`sources`	No
`warnings`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint, covering safety and idempotency. The description adds value by disclosing external data sources (npm, libraries.io) and output types (structured metrics, warnings), which is beyond what annotations provide. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences, front-loaded with context, and each sentence adds distinct value (user, purpose, data sources, inputs, outputs). No redundant or verbose language.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (3 parameters, output schema exists), the description covers key aspects: inputs, outputs, data sources, and connection to DORA metrics. It omits error handling and data freshness, but the presence of an output schema and annotations compensates.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (33%), with only 'async' documented. The description clarifies the 'packages' parameter expects a list of names and optional version ranges, partially compensating. However, 'lookbackDays' remains undocumented in both schema and description, leaving ambiguity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: tracking update velocity of open-source dependencies to assess impact on DORA metrics. It specifies target user (CTO), data sources (npm registry, libraries.io), and outputs (structured metrics, warnings). This distinguishes it from sibling tools like dependency_vulnerability_scan, which focus on security, not velocity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context (CTO, DORA metrics) implying when to use, but it does not explicitly state when to use this tool versus alternatives. It lacks exclusions or comparisons to siblings like ossf_scorecard_trend_analyzer, leaving the agent to infer usage scope.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ossf_scorecard_trend_analyzerA

Read-onlyIdempotent

Inspect

As a CTO, analyze OSSF Scorecard trends for your top 10-50 dependencies to identify security regressions or deteriorating project health. Input GitHub repository names (owner/repo), get structured trend data including score deltas, check failures, and risk flags. Uses OSSF Scorecard API and GitHub Archive for historical context. Ideal for proactive dependency management and risk assessment.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`lookbackDays`	No	Number of days to analyze trends for
`repositories`	Yes	List of GitHub repositories in owner/repo format
`minScoreThreshold`	No	Minimum acceptable score to flag as risky

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`results`	No
`sources`	No
`warnings`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, openWorldHint, idempotentHint) already indicate safe, non-destructive behavior. The description adds valuable details: uses OSSF Scorecard API and GitHub Archive for historical context, and outputs structured trend data with score deltas, check failures, and risk flags. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the purpose and scope. Every sentence adds value: first defines the tool's role, second elaborates on inputs, outputs, and data sources. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the rich input schema (100% coverage) and presence of an output schema, the description provides sufficient context. It mentions the output elements (deltas, failures, flags) and targets CTOs. Could briefly mention the time range parameter, but lookbackDays is already in the schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the input schema already describes all parameters thoroughly. The description reinforces that repositories are in owner/repo format and mentions 'top 10-50 dependencies', but does not add significant new meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'analyze', the resource 'OSSF Scorecard trends', and the scope 'top 10-50 dependencies'. It explicitly mentions identifying security regressions and deteriorating project health, which distinguishes it from sibling tools like dependency_vulnerability_scan and oss_dependency_velocity_tracker.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description labels it as 'Ideal for proactive dependency management and risk assessment', giving clear context for use. It does not explicitly state when not to use or name alternatives, but the context is sufficient for most agents to infer appropriate usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

outbound_sequencerC

Read-only

Inspect

Séquences outbound — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub → CFO + CRO B2B SaaS France — Séquence 6 touches multi-canal · Taux réponse +180%. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`icp`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`offer`	Yes
`excludedAngles`	No
`targetAccounts`	No

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and returns an 'audited deliverable', but does not disclose additional behavioral traits beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short but includes an unnecessary reference case that may confuse. It mixes languages and does not front-load the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, 5 params, no output schema), the description is insufficient. It does not clarify what the deliverable contains or how to structure inputs beyond vague 'case fields'.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 20% (only 'async' has a description). The description does not explain the meaning or usage of any parameters, including the required nested objects 'offer' and 'icp'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description mentions 'Séquences outbound' and references a case for reaching C-suite executives, suggesting it generates outbound sequences. However, it lacks a clear verb+resource statement; the purpose is implied but not explicitly defined.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus its siblings, which include 'abm_architect' and 'sales_enablement_architect'. The description does not specify context or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

paid_ads_optimizerC

Read-only

Inspect

Optimiseur de publicités payantes — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Spendesk (Google + LinkedIn · €45k/mo) — €9k/mo gaspillés identifiés · ROAS LinkedIn ×2.4. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`campaigns`	Yes
`targetMetric`	Yes
`audienceDescription`	Yes
`totalMonthlyBudgetEur`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint: true and openWorldHint: true, so the description need not re-state them. It adds that inputs are validated server-side, which is useful but lacks further behavioral traits like rate limits or data handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a pair of sentences but includes a reference case example that adds length without essential clarity. It is somewhat verbose for a concise purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, 6 parameters, no output schema), the description fails to explain the return deliverable structure or behavior in detail. The reference case provides a glimpse but not general completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (17%), and the description does not explain parameter meanings beyond what the schema already provides. It only vaguely mentions 'documented case fields' without specifics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool optimizes paid ads and returns a structured deliverable. However, it does not differentiate from sibling tools like programmatic_attribution_calibrator or retail_media_attribution_bridge, which have overlapping purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

There is no explicit guidance on when to use this tool versus alternatives. The description only mentions sending documented case fields, omitting when-not-to-use or prerequisite conditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

partnership_synergiesA

Read-onlyIdempotent

Inspect

Identify and rank strategic partnership opportunities for a company. Returns 5-12 high-fit partnership targets, each scored on revenue lift, time-to-impact, integration complexity and regulatory risk, with a rationale and a recommended first-step outreach playbook. When to use this tool: the user wants business-development or alliance ideas, or M&A target screening before deeper due diligence. Inputs: the user's own company and the strategic axis to unlock through partnership (e.g. enter a new market via distribution, add AI infrastructure without rebuilding). Delivered by Antoine, the AI CSO of the Gapup portfolio.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`constraints`	No
`selfCompany`	Yes
`strategicAxis`	Yes	What strategic axis to unlock through partnership (e.g. 'enter US market via distribution', 'leverage AI infra without rebuild')
`currentPartnerships`	No	Existing alliances to factor in

Output Schema

ParametersJSON Schema

Name	Required	Description
`kpis`	No	3-5 headline KPI bubbles
`sources`	No
`recommendations`	No	Prioritised next steps
`executiveSummary`	Yes	Board-ready partnership opportunity overview
`partnershipTargets`	Yes	5-12 ranked partnership targets

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, no destructiveness, and idempotent. The description adds value by detailing the scoring metrics (revenue lift, time-to-impact, integration complexity, regulatory risk) and output structure (rationale, first-step playbook). It also mentions the async parameter behavior implicitly. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-organized: it opens with the core purpose, then details output, usage guidelines, inputs, and a closing note. Every sentence adds value, and there is no redundancy. It is appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested objects, an output schema) and the richness of annotations, the description covers the main aspects: purpose, usage, key parameters, and output characteristics. It does not explain all parameters (e.g., focus, constraints) but the output schema handles return values. Overall, it provides sufficient context for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 50% (3 of 6 parameters have descriptions). The description adds context beyond the schema: it explains that 'selfCompany' and 'strategicAxis' are the key inputs, gives an example for strategicAxis, and mentions that 'currentPartnerships' are existing alliances to factor in. This meaningfully augments the schema's sparse descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with a specific verb-resource pair ('Identify and rank strategic partnership opportunities') and provides concrete output details (5-12 targets, scored on four dimensions). It distinguishes from siblings by stating when to use: business-development, alliance ideas, or M&A target screening before deeper due diligence.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool: for business-development or alliance ideas, or M&A target screening before deeper due diligence. It also specifies required inputs (selfCompany and strategicAxis). It does not provide explicit when-not-to-use or alternative tool names, but the context is clear enough given the sibling list.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

patent_landscapeA

Read-only

Inspect

Search, analyze and map patent landscapes across major jurisdictions (US, EP, WO, CN, JP, KR). Three modes: (1) search — find patents by keywords, company name or inventor name; (2) landscape — aggregate distributions: top assignees, top inventors, CPC class breakdown, filings by year, citation leaders, white-space innovation opportunities; (3) lookup — retrieve a specific patent by number (e.g. US10000000B2, EP3456789A1, WO2023/123456). Primary source: WIPO PatentScope (WO PCT, keyless). Optional sources: USPTO PatentsView (US, env PATENTSVIEW_API_KEY), EPO OPS (EP/WO, env EPO_OPS_CONSUMER_KEY + EPO_OPS_CONSUMER_SECRET), Lens.org (global, env LENS_API_TOKEN). Use cases: freedom-to-operate (FTO) analysis, R&D gap identification, VC due diligence IP audit, competitor patent portfolio mapping, inventor network analysis. SLA: <=24s p95 (parallel fetches, 8s per source). Cache: 24h TTL (patent data stable). Quality score: 30 pts per retrieved source (max 90), +10 if >=10 patents, +10 bonus for landscape mode with non-empty top_assignees.

ParametersJSON Schema

Name	Required	Description
`mode`	No	search: keyword/inventor/assignee search; landscape: aggregate distributions; lookup: fetch by patent number. Default: "search"
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Keywords, company/inventor name, or patent number (e.g. "machine learning", "Tesla Inc", "US10000000B2")
`date_to`	No	ISO date YYYY-MM-DD — latest filing date
`date_from`	No	ISO date YYYY-MM-DD — earliest filing date
`max_results`	No	Max patents to return (5-50). Default: 20
`jurisdictions`	No	Jurisdictions to include. Default: ["US","EP","WO"]

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`status`	Yes
`patents`	Yes
`sources`	Yes
`landscape`	No
`quality_score`	Yes

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, destructiveHint), the description discloses SLA (<=24s p95), cache TTL (24h), quality scoring mechanism, required environment variables per source, and parallel fetch behavior. This provides rich context for safe and effective use.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is lengthy but well-structured, starting with purpose, then modes, sources, use cases, and performance characteristics. Every sentence adds useful information, though it could be slightly more compact without losing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, output schema, multiple modes and sources), the description covers all critical aspects: required environment variables, SLA, caching, quality scoring, and use cases. It leaves no significant gaps for an agent to misuse the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the description adds value by explaining how parameters like 'query' are used across modes and listing default jurisdictions. It also clarifies the 'async' parameter's behavior. While schema already covers basics, the description enhances practical understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose with a specific verb ('Search, analyze and map') and resource ('patent landscapes'). It distinguishes three modes (search, landscape, lookup) and lists supported jurisdictions, making it distinct from sibling tools like patent_landscape_async.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit use cases (FTO analysis, R&D gap identification, VC due diligence, etc.) and explains when each mode is appropriate. It also hints at async usage for slow queries via the 'async' parameter. However, it does not explicitly state when not to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

patent_landscape_asyncA

Read-only

Inspect

Async extended variant of patent_landscape. Supports max_results up to 200 (vs 50 in sync mode) and an optional include_citation_graph flag that enriches each patent with its 2-level citation graph (parent patents that cite this one + child patents cited by this one). Returns immediately (<300ms) with a job_id. Poll the result with patent_landscape_result(job_id) after eta_seconds (~180s). Use for deep R&D white-space analysis, freedom-to-operate (FTO) audits, VC due diligence IP mapping, or large-scale competitor portfolio analysis. Async tool — register a webhook via webhooks_manage(register, url, [job.completed]) to receive callbacks instead of polling. Faster + lighter.

ParametersJSON Schema

Name	Required	Description
`mode`	No	search / landscape / lookup. Default: "search"
`query`	Yes	Keywords, company/inventor name, or patent number (e.g. "machine learning", "Tesla Inc")
`date_to`	No	ISO date YYYY-MM-DD — latest filing date
`date_from`	No	ISO date YYYY-MM-DD — earliest filing date
`max_results`	No	Max patents to return (5-200). Default: 20
`jurisdictions`	No	Jurisdictions to include. Default: ["US","EP","WO"]
`include_citation_graph`	No	If true, enriches each patent with a 2-level citation graph (parents + children). Adds significant processing time — use for deep analysis only. Default: false.

Output Schema

ParametersJSON Schema

Name	Required	Description
`job_id`	Yes	Unique job identifier — pass to patent_landscape_result
`status`	Yes
`eta_seconds`	Yes
`submitted_at`	Yes

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses async behavior, immediate return of job_id, polling via patent_landscape_result, and optional citation graph impact on processing time. Annotations (readOnlyHint=true) align with description. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a single paragraph that covers all essential aspects without redundancy. While dense, it remains readable. Could be slightly improved with bullet points, but current structure is effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (async, 7 params, output schema exists), the description covers all necessary aspects: purpose, usage, behavior, parameters, and even mentions webhook integration. No gaps identified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed parameter descriptions. Description adds value by specifying max_results limit difference from sync version and explaining the citation graph enrichment. All parameters are well-documented both in schema and description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Explicitly states it's an async variant of patent_landscape, lists supported features (max_results up to 200, optional citation graph), and enumerates specific use cases (R&D, FTO, VC due diligence). Clearly distinguishes from sibling patent_landscape.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit contexts for use: deep analysis, large-scale portfolio analysis. Mentions async nature and alternatives like polling or webhook. Contrasts with sync version (max_results 50).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

patent_landscape_resultA

Read-onlyIdempotent

Inspect

Poll the result of a patent_landscape_async job. Returns status=pending while running, status=completed with the full patent landscape report once done, status=failed on error, or status=not_found if the job_id is unknown or expired (TTL 24h). Call this after the eta_seconds hint returned by patent_landscape_async (~180s).

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	Yes	The job_id returned by patent_landscape_async (prefix: patl_)

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, idempotentHint), description adds specific status values (pending, completed, failed, not_found) and TTL 24h, enriching behavioral understanding.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences, front-loaded with purpose, no wasted words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With output schema present, description adequately explains return statuses, error conditions, and TTL. Complete for a polling tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and schema description already explains job_id format. Description doesn't add extra meaning, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool polls the result of an async job, specifies it's for patent_landscape_async, and distinguishes it from sibling tools by being the result retrieval counterpart.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to call (after ~180s based on eta_seconds hint) and implies not to call before that. No alternatives needed as it's the only polling tool for the async job.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

patent_ownership_auditA

Read-onlyIdempotent

Inspect

Audits patent ownership for employees or contractors, identifying gaps where inventors may not have properly assigned patent rights to the company. Designed for CHROs to ensure IP compliance and mitigate legal risks. Inputs: employee/contractor names or IDs, optional date range. Outputs: list of patents, ownership status, flagged gaps, and assignment details. Sources: USPTO PatFT and EPO Espacenet public records. Keywords: patent audit, IP compliance, employee inventions, contractor agreements, CHRO.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`dateRange`	No	Optional date range for patent filings
`employeeIds`	No	List of employee or contractor IDs (optional if names provided)
`employeeNames`	Yes	List of employee or contractor full names to audit

Output Schema

ParametersJSON Schema

Name	Required	Description
`gaps`	No
`status`	Yes
`patents`	No
`sources`	No
`warnings`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds value by specifying data sources (USPTO and EPO public records) and output details (list of patents, ownership status, flagged gaps). There is no contradiction with annotations; all behavioral traits are consistent and well-communicated.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (4-5 sentences) and well-structured: purpose, audience, inputs, outputs, sources, keywords. Every sentence is informative and earned its place. No redundancy or unnecessary words. Front-loaded with the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, output schema, async option), the description covers the main aspects: purpose, audience, inputs, outputs, and data sources. It lacks details on pagination or rate limits, but the output schema likely covers return format. OpenWorldHint and annotations compensate for some missing context. Overall adequate for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description reiterates 'employee/contractor names or IDs, optional date range' which matches schema properties but adds no new detail beyond the schema. It does not elaborate on the 'async' parameter or date format, which are already described in the schema. Thus, minimal additional semantic value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool audits patent ownership for employees/contractors, identifying gaps in assignment. It specifies the target audience (CHROs) and distinguishes from similar sibling tools like 'ip_employee_invention_tracker' by focusing on ownership gaps rather than tracking inventions. The verb 'audits' and resource 'patent ownership' are specific and distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly targets CHROs for IP compliance and legal risk mitigation, providing clear inputs and outputs. It does not explicitly mention when not to use or compare to alternatives, but the context of siblings and keywords implies its niche. Some guidance on when to choose this over 'patent_landscape' would improve, but current information is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

payment_rails_cost_analyzerA

Read-onlyIdempotent

Inspect

As a CFO, compare cross-border payment rail costs (SWIFT, SEPA, local ACH, stablecoins) with FX conversion fees and settlement times. Input source/destination countries and amount, receive cost breakdown, FX rates, and settlement time estimates. Uses ECB FX rates and World Bank remittance price data for accurate cost analysis. Ideal for optimizing international payment strategies and reducing transaction expenses.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`amount`	Yes	Transaction amount in source currency
`source_country`	Yes	ISO 3166-1 alpha-2 country code of payment origin
`source_currency`	No	ISO 4217 currency code of source amount
`destination_country`	Yes	ISO 3166-1 alpha-2 country code of payment destination
`destination_currency`	No	ISO 4217 currency code of destination amount

Output Schema

ParametersJSON Schema

Name	Required	Description
`amount`	No
`status`	Yes
`fx_rate`	No
`sources`	No
`warnings`	No
`total_cost`	No
`source_country`	No
`settlement_time`	No
`source_currency`	No
`destination_country`	No
`destination_currency`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, openWorld, and idempotent. The description adds behavioral context: it uses ECB FX rates and World Bank remittance data, outputs cost breakdowns and settlement time estimates. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph of ~50 words, front-loaded with the main purpose. It is efficient but includes a slight persona flourish ('As a CFO') not necessary for function. Overall concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 params, output schema exists), the description adequately covers purpose, data sources, and outputs. It does not explain the async parameter or provide details on output schema, but these are partially covered by schema descriptions and annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% (all 6 parameters have descriptions). The description reuses 'source/destination countries and amount' but adds no new semantics beyond the schema. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: compare cross-border payment rail costs (SWIFT, SEPA, local ACH, stablecoins) with FX fees and settlement times. It specifies inputs (source/destination countries, amount) and outputs (cost breakdown, FX rates, settlement estimates), distinguishing it effectively from all sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a clear use case ('As a CFO, compare...') and states it's ideal for optimizing international payment strategies. However, it lacks explicit when-not-to-use guidance or alternatives, though no close siblings require distinction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pentest_scope_estimatorB

Read-only

Inspect

Estimateur de scope pentest — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Answers: For a pentest on with assets, what is the effort and cost estimate? · How much should I budget for a web application + API penetration test for SOC 2 Type II compliance? · What is the standard engagement plan (PTES phases + deliverables) for a pentest? · Which engagement type (black-box/grey-box/white-box/red-team) is recommended for my context? · What are the prerequisites and risks for a pentest engagement on my cloud infrastructure? Reference case: Acme SaaS Inc — Fintech B2B EU · web-app + API REST · 12 microservices Node.js AWS · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`scope_type`	Yes
`tech_stack`	Yes
`asset_count`	No
`target_geos`	No
`engagement_type`	No
`retest_included`	No
`business_context`	Yes
`compliance_frameworks`	No

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, which are consistent with the description stating it returns a structured deliverable and supports async polling. The description adds useful details about server-side validation and async job_id option, but does not discuss rate limits or authentication. No contradiction found.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose with multiple example questions and promotional text. While it front-loads the main function, the length could be reduced by removing redundant examples and marketing language. It is adequately structured but not optimally concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (9 parameters, no output schema), the description lacks details on the deliverable format and full parameter usage. It mentions 'structured, audited deliverable' but does not specify format (JSON, PDF, etc.). The reference case helps but is not sufficient for complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 11% schema description coverage, the description attempts to compensate by referencing parameters in examples (scope_type, tech_stack, asset_count) and a reference case. However, many parameters (async, target_geos, retest_included, compliance_frameworks) are not explained individually. The description provides incomplete compensation for the low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is a pentest scope estimator that returns structured deliverable with effort, cost, and recommendations. It lists example questions that solidify its purpose. However, it includes promotional language ('Gapup agent-payable C-suite expertise (RISK)') and does not explicitly distinguish from sibling tools, but no direct competitors are present.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides example questions indicating when to use the tool (budgeting, engagement plans, recommendations, prerequisites) but does not give explicit when-not-to-use scenarios or alternative tools. The mention of server-side validation hints at proper usage but lacks comprehensive guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pitch_deck_storylineA

Read-onlyIdempotent

Inspect

Build a complete investor pitch-deck storyline for a company. Returns an 8-20 slide narrative tailored to the target audience (seed-vc / series-a-vc / growth-vc / strategic / bank / grant) — each slide carrying a title, key points, a speaker note and a visual hint — plus a Q&A bank of 10-15 likely board questions and traps to avoid. Output is deck JSON ready to export to Google Slides, Notion or Pitch.com. When to use this tool: the user is preparing a fundraise, a board meeting, or an investor presentation. Inputs: the company profile and the target audience type. Delivered by Sarah, the AI Fundraising lead of the Gapup portfolio.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`audience`	Yes	Target audience — adapts tone + emphasis + Q&A bank
`keyFacts`	Yes	Hard facts to weave into the deck (traction numbers, milestones, awards)
`slideCount`	Yes	12 = standard VC deck, 15 = bank-friendly with annexes, 20 = growth/strategic

Output Schema

ParametersJSON Schema

Name	Required	Description
`kpis`	No	3-5 headline KPI bubbles surfaced from keyFacts
`slides`	Yes	8-20 slide objects ready to export to Google Slides / Notion / Pitch.com
`qaBanks`	Yes	10-15 anticipated investor questions with recommended answers
`recommendations`	No	Fundraising preparation actions
`executiveSummary`	Yes	One-paragraph elevator pitch distilled from the deck

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint=true, destructiveHint=false, and idempotentHint=true. The description adds behavioral context: it returns deck JSON, mentions the async parameter returns job_id, and notes the output is ready for export. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph but covers purpose, output, usage, and credits. It is not excessively long but could be more structured. Front-loads key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, 5 parameters, output schema exists), the description fully explains the output format, audience adaptation, async behavior, and required inputs. It is complete and leaves no major gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 80%. The description adds meaning by explaining slideCount values (12=standard VC deck, etc.) and summarizing inputs as 'company profile and target audience type.' This surpasses the baseline of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool builds a complete investor pitch-deck storyline, listing specific output details (8-20 slides with titles, key points, speaker notes, visual hints, and Q&A bank) and target audiences. It is specific and distinct from sibling tools, avoiding tautology.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly tells when to use the tool ('user is preparing a fundraise, a board meeting, or an investor presentation') and lists required inputs. It does not mention when not to use or provide alternatives, but the guidance is clear and context-aware.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

positioning_strategistC

Read-only

Inspect

Stratège de positionnement — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Gapup Hub vs Tableau/Pigment/Looker — Angle de différenciation + 5 piliers messaging + battle plan. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`market`	Yes
`company`	Yes
`product`	Yes
`aspirations`	No
`competitors`	Yes
`customerPains`	Yes
`currentWeaknesses`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true, so the description does not need to reiterate those. The description adds that it 'Returns a structured, audited deliverable' and 'Inputs are validated server-side', which provides minor additional behavioral context. However, it does not disclose other traits like rate limits, authentication requirements, or what 'audited' entails. Baseline 3 with annotations is appropriate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderate in length, but contains some extraneous details like 'Gapup agent-payable C-suite expertise (CMO)' and a reference case example. While front-loaded with purpose, it could be more concise by removing the example or integrating it into a clearer benefit statement. It is adequate but not optimally structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 params, nested objects, no output schema), the description is incomplete. It does not describe the return value format or structure beyond 'structured, audited deliverable'. The reference case hints at output content but lacks formal specification. For an agent to invoke and interpret the result correctly, more details are needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (13%), meaning most parameters lack descriptions in the schema. The description does not compensate: it only says 'send the documented case fields' without explaining any parameter semantics, such as what 'documented case fields' means or how to fill in company, product, etc. The parameter meaning largely relies on the parameter names and required structure, which leaves interpretation ambiguous.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: positioning strategy for C-level executives in Gapup context, returning a structured deliverable. It references a specific case (Gapup Hub vs Tableau/Pigment/Looker) and outlines the output components (differentiation angle, 5 pillars messaging, battle plan). This distinguishes it from sibling tools like 'market_entry_strategist' or 'pricing_strategist', though the 'Stratège de positionnement' title could be more explicit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It mentions 'Inputs are validated server-side' but gives no context on prerequisites, ideal scenarios, or when not to use it. There is no mention of alternative tools or disclaimers, leaving the agent without direction for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

press_influencerC

Read-only

Inspect

Presse & influenceurs — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Agicap (levée Série C €70M) — CP + 12 contacts presse Tier-1 · plan de diffusion 14 jours. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`budget`	No
`company`	Yes
`targetMedia`	Yes
`announcement`	Yes
`targetAudience`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true and openWorldHint=true, so the tool is non-destructive. The description adds that inputs are validated server-side and returns an audited deliverable, but does not elaborate on side effects or limitations. Adequate but minimal extra value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short but includes a marketing reference case that may not be essential for tool understanding. It is front-loaded with the tool's domain but could be more direct. Adequate conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 6 parameters, nested objects, no output schema, and low schema description coverage, the description is incomplete. It does not explain the deliverable's structure, how to interpret results, or handle the async parameter. The agent lacks sufficient context for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 17% (low), but the description only says 'send the documented case fields' without explaining any parameter beyond the schema's brief descriptions. The reference case implies fields like company and announcement but does not add meaning for the agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it is for press and influencer relations, returns a structured deliverable, and cites a reference case. It distinguishes from siblings by focusing on press/influencer outreach, though the verb (returns) is implicit. Clear enough for an agent to understand the tool's domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs. alternatives. Siblings include many PR/marketing tools, but the description does not differentiate or specify conditions. The agent must infer usage from the name and context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pricing_in_dealC

Read-only

Inspect

Pricing en Deal — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Agicap × Groupe Rocher — Deal €38k · stade négociation · contre-offre -30% · 3 scénarios pricing · ROI 12×. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`deal`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`redLines`	Yes
`negotiationContext`	Yes

Tool Definition Quality

C2.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true, indicating safe read operation. The description adds 'returns a structured, audited deliverable' and notes server-side validation, but does not disclose potential behaviors beyond what annotations already imply. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short but includes a reference case that adds context. However, the first sentence is jargon-heavy and confusing. The structure is adequate but could be more front-loaded with clear purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has complex nested input (5 params) and no output schema. The description omits what the deliverable contains, how to interpret results, or what 'audited' means. It is far from complete for an agent to use effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 20% schema description coverage, the description must compensate but fails entirely. It merely says 'send the documented case fields' without explaining any parameter meaning, format, or usage. Critical for complex nested parameters like deal, company, redLines.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a structured deliverable for pricing deals, but uses obscure terms like 'Gapup agent-payable C-suite expertise (CRO)' and lacks a clear verb indicating the main action. It is somewhat clear but not specific enough to distinguish from siblings like 'pricing_strategist'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The only usage hint is 'send the documented case fields', which does not help an agent decide contextually. Exclusions or when-not-to-use are absent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pricing_strategistC

Read-only

Inspect

Stratège de pricing — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Vercel Pricing 2026 — 4 tiers + usage metering · 3 scenarios pricing chiffrés · ARPU +28% target. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`competitors`	Yes
`currentPricing`	Yes
`valueProposition`	Yes

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds little beyond annotations: readOnlyHint and openWorldHint already indicate it is a read-only tool with variable output. The description mentions server-side validation and an audited deliverable, but lacks details on rate limits, cost, or potential side effects. With annotations present, the bar is higher, and the description fails to provide meaningful behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences plus a reference case is concise and front-loaded. However, the reference case (Vercel Pricing) is somewhat specific and may not apply universally, taking space that could be used for parameter guidance. Still, overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 6 parameters (4 required), nested objects, no output schema, and many siblings, the description is far from complete. It does not explain the return structure, what 'audited deliverable' contains, or how to interpret results. The agent lacks essential context to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (only async has a description). The description does not explain any parameters beyond 'send the documented case fields'. It does not help the agent understand what to provide for the required nested objects (company, competitors, etc.). This is a critical gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns a structured, audited deliverable for pricing strategy, targeting C-suite expertise (CMO). The reference case (Vercel Pricing 2026) solidifies what it does, distinguishing it from siblings like pricing_in_deal or competitor_pricing_radar.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs. siblings. The description does not mention specific context, prerequisites, or scenarios where it should not be used. Given many pricing-related sibling tools, this gap hinders correct agent selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

privacy_compliance_auditC

Read-only

Inspect

Audit conformité vie privée — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Lemlist SAS — SaaS outreach B2B, transferts UE→US Schrems II, RGPD + CCPA + LGPD + UK GDPR. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`presenterScript`	No
`targetFrameworks`	Yes
`processingActivities`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, and the description states it returns a deliverable with server-side validation. This aligns with a safe read operation, but the description does not elaborate on the deliverable's contents or any potential side effects beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with one sentence and a reference case, front-loading the purpose. It avoids redundancy but could be slightly more structured, e.g., by listing key capabilities.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested objects, no output schema, many siblings), the description is incomplete. It does not specify return format, input constraints, or how results are structured, leaving ambiguity for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (only 'async' described). The description says 'send the documented case fields' but does not explain the purpose of parameters like focus, presenterScript, or processingActivities. It adds minimal value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs a privacy compliance audit and returns a structured deliverable. It provides a reference case, but does not differentiate from sibling tools like ai_governance_full_report_async or lgpd_data_subject_rights_automator.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lacks explicit guidance on when to use this tool versus alternatives. It mentions 'C-suite expertise (RISK)' and a reference case, but does not specify prerequisites, when not to use, or how it compares to similar tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

process_mappingC

Read-only

Inspect

Mapping des process opérationnels — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Decathlon France — process Retour produit en magasin · 1700 magasins · 200 retours/j/magasin · -30 à -50% temps cible. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`processes`	Yes
`presenterScript`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true, and the description confirms a read-only operation by returning a deliverable. It adds minimal behavioral details beyond what annotations offer, such as server-side validation, but no mention of side effects or state changes.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively brief, with a clear opening and a relevant example. However, it could be more structured by separating purpose, usage, and parameter hints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex nested schema and lack of output schema, the description is insufficient. It provides a high-level purpose but lacks details on input structure, optional fields, and expected output format, making it hard for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, and the description does not explain parameters or their meanings. The phrase 'send the documented case fields' is vague and does not compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it maps operational processes and returns a structured deliverable, with an example case. However, it does not differentiate from the sibling tool 'process_mining', missing an opportunity to clarify uniqueness.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like process_mining or other mapping tools. The description only mentions server-side validation but lacks context for appropriate use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

process_miningC

Read-only

Inspect

Mining des process — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Gapup Hub — 4 process · €320k gaspillage identifié · 3 quick wins · 5 automations. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`objectives`	Yes
`companyName`	Yes
`mainSystems`	Yes
`topProcesses`	Yes
`employeeCount`	Yes
`revenueLostEstimateEur`	No

Tool Definition Quality

C2.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description adds minimal value. It mentions the output type (structured, audited deliverable) and provides a reference case example, giving some behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short but mixes French and English, and includes a lengthy reference case example that may not be essential. It could be more concise and clearly structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 7 parameters, no output schema, and low schema coverage, the description is insufficient. It provides a high-level idea but lacks specifics on inputs, outputs, and detailed behavior, leaving the agent with many unknowns.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 14% schema description coverage, the description should compensate but does not. It vaguely says 'send the documented case fields' without explaining any parameter meanings or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states 'Returns a structured, audited deliverable' and gives a reference case with waste identification and quick wins, but lacks a clear verb (e.g., analyze, mine) and does not distinguish from sibling 'process_mapping'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like process_mapping. Only mentions server-side validation and sending documented case fields, which implies a use case but does not specify context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

procurement_okr_esg_alignerA

Read-onlyIdempotent

Inspect

Aligns procurement OKRs with ESG targets for COOs using GRI standards and EU TED procurement benchmarks. Inputs include procurement objectives and ESG focus areas (e.g., carbon reduction, supplier diversity). Outputs structured alignment scores, gap analysis, and actionable recommendations. Essential for COOs integrating sustainability into procurement strategy. Keywords: procurement, ESG, GRI, EU TED, OKR alignment, sustainability metrics.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`esgFocusAreas`	Yes
`industrySector`	No
`procurementObjectives`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`alignmentScores`	No
`recommendations`	No
`benchmarkComparison`	No
`overallAlignmentScore`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, idempotentHint. Description does not contradict these and adds context about outputs (alignment scores, gap analysis, recommendations) and standards used. This adds value beyond annotations without conflicting.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences plus keywords, each serving a purpose: purpose, inputs/outputs, target user, and keywords. No redundancy, front-loaded with main action. Highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, description need not detail return values. It covers inputs, outputs, standards, and use case. Missing explicit mention of industrySector parameter, but overall sufficient for the complexity level.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (25%), and description partially compensates by naming esgFocusAreas and procurementObjectives with examples. However, it omits industrySector and async parameters. It adds some meaning but not full coverage for all params.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool aligns procurement OKRs with ESG targets for COOs using specific standards (GRI, EU TED). It lists inputs and outputs, distinguishing it from siblings like action_plan_esg or supplier_esg_audit by its focus on OKR alignment. The verb 'aligns' and resource 'procurement OKRs with ESG targets' are specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies usage for COOs integrating sustainability into procurement strategy, but does not explicitly state when not to use or provide alternatives. Given the many sibling tools, explicit exclusions would improve clarity. Nonetheless, the context is clear enough for agents.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

procurement_six_sigma_waste_hunterA

Read-onlyIdempotent

Inspect

Analyzes procurement waste for COOs using Six Sigma DMAIC framework and EU TED tender data. Identifies non-value-added activities, overprocessing, and inefficiencies in procurement workflows. Inputs include procurement category, time period, and organizational unit. Outputs waste classification, cost impact estimates, and process improvement recommendations. — pass async:true REQUIRED to avoid x402 timeout.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`time_period`	Yes	Time period for analysis (e.g., '2023-01-01/2023-12-31')
`six_sigma_tool`	No		DMAIC
`include_ted_data`	No
`organizational_unit`	No	Specific business unit or department (e.g., 'EMEA', 'Global Operations')
`procurement_category`	Yes	Specific procurement category to analyze (e.g., 'IT hardware', 'facilities')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`ted_data_coverage`	No
`cost_impact_estimate`	No
`waste_classification`	No
`process_improvement_recommendations`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, so no contradiction. The description adds context about EU TED data and the DMAIC framework, and warns about timeout requiring async. It does not fully describe behavior like rate limits or data freshness, but adds meaningful information beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph plus an async note; it is relatively concise. The first sentence clearly states the purpose. It could be slightly front-loaded but overall efficient with no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the existence of an output schema, the description doesn't need to detail return values. It covers the core purpose and major inputs. The async note is crucial. It might be missing some parameter context but is adequate for a tool with 6 params and good schema coverage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 67%, so the schema documents most parameters. The description repeats some inputs but omits six_sigma_tool and include_ted_data. It adds the critical note that async:true is required to avoid timeout, which adds value. However, it doesn't fully compensate for missing parameter details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it analyzes procurement waste using Six Sigma DMAIC and EU TED data, outputs waste classification and recommendations. It specifies target users (COOs) and inputs, distinguishing it from sibling tools like procurement_spend_optim or procurement_okr_esg_aligner.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not explicitly state when to use this tool vs alternatives. It implies waste analysis but lacks conditions or exclusions. The async note is a technical hint, not usage context. No mention of prerequisites or alternative scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

procurement_spend_optimC

Read-only

Inspect

Optimisation des achats / Spend strategy — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Tech SaaS €60M ARR — 200 fournisseurs analysés · 20 leviers chiffrés · -€2.4M opex/an target. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`topSuppliers`	Yes
`spendCategories`	Yes

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and mentions a reference case, but does not elaborate on behavior like async parameter usage or response time.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is short but includes a reference case that is not essential. Could be more concise without losing meaning.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex nested input schema and lack of output schema, the description is insufficient. It does not explain how to construct the input fields or what the deliverable contains, leaving the agent with incomplete instructions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only async parameter described). The description does not explain the meaning of company, spendCategories, topSuppliers, or focus beyond their names, despite low coverage requiring compensation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states 'Optimisation des achats / Spend strategy' and mentions returning a structured deliverable, but the exact nature of the output is vague. It does not differentiate from sibling procurement tools like procurement_okr_esg_aligner or procurement_six_sigma_waste_hunter.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description only notes server-side validation and to 'send the documented case fields,' which is not usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

programmatic_attribution_calibratorA

Read-onlyIdempotent

Inspect

For ad_revenue_ops persona: calibrates marketing mix models (MMM) by ingesting OpenRTB impression-level data from FreeWheel Marketplace and other programmatic sources. Accepts model parameters, date ranges, and impression IDs as input, returning structured calibration metrics and attribution adjustments. Useful for improving model accuracy with real-time bidding data and validating revenue attribution across programmatic channels.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`endDate`	Yes	End date for impression data (ISO 8601)
`modelId`	Yes	Identifier of the MMM model to calibrate
`startDate`	Yes	Start date for impression data (ISO 8601)
`impressionIds`	No	List of OpenRTB impression IDs to include in calibration
`confidenceThreshold`	No	Confidence threshold for calibration metrics

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`calibrationMetrics`	No

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description is consistent with annotations (readOnlyHint, openWorldHint, idempotentHint) as it describes a read-only analysis that returns metrics and adjustments without modifying state. It adds context about data sources and purpose but does not disclose behavioral traits beyond what annotations already provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the persona, and each sentence adds value. No wasted words. It clearly states purpose, input/output, and use case.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 6 parameters and an output schema, the description provides sufficient high-level context: it specifies the data sources, input types, and output summary. The output schema exists, so detailed return values are not needed. The description is complete for an AI agent to understand the tool's purpose and when to invoke it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description mentions 'model parameters, date ranges, and impression IDs,' which map to the parameters, but does not add significant meaning beyond the schema descriptions. It provides a summary but no additional detail.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'calibrates' with a clear resource 'marketing mix models' and explicitly targets the 'ad_revenue_ops' persona. It distinguishes itself from siblings like 'programmatic_brand_safety_auditor' by focusing on calibration and attribution adjustments rather than brand safety.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states it is 'for ad_revenue_ops persona' and 'useful for improving model accuracy with real-time bidding data and validating revenue attribution across programmatic channels,' providing clear context for when to use. It does not explicitly mention when not to use or alternatives, but the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

programmatic_brand_safety_auditorA

Read-onlyIdempotent

Inspect

Evaluates programmatic ad inventory for brand safety risks using IAB Tech Lab's standards and GDPR-compliant tracking methods. Designed for ad revenue operations teams to assess inventory quality before bidding. Inputs include domain, page URL, and optional contextual signals. Outputs a structured brand safety score with risk categorization and compliance warnings.

ParametersJSON Schema

Name	Required	Description
`url`	Yes	Full page URL being evaluated
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`domain`	Yes	Root domain of the inventory (e.g., 'example.com')
`categories`	No	Optional IAB content categories for contextual analysis
`gdprConsent`	No	GDPR consent string (TCF v2.0)

Output Schema

ParametersJSON Schema

Name	Required	Description
`flags`	No
`score`	No	Brand safety score (0-100)
`status`	Yes
`sources`	No
`warnings`	No
`riskLevel`	No
`gdprCompliant`	No

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and idempotentHint. The description adds that the tool uses GDPR-compliant tracking methods and outputs a structured score with risk categorization. It does not fully disclose the async behavior (parameter 'async' exists in schema but not mentioned in description). Overall, good alignment and useful context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, each with a clear purpose: first sentence states core function and standards, second sentence identifies target users and use case, third sentence lists inputs and outputs. No redundancy or extra words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (5 parameters, 2 required, output schema available), the description covers the essential context: purpose, standards, target users, inputs, and outputs. It is sufficient for an AI agent to select and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good descriptions. The description summarizes inputs as 'domain, page URL, and optional contextual signals,' which groups the parameters nicely but does not add significant new meaning beyond the schema. It helps in understanding the role of optional parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool evaluates programmatic ad inventory for brand safety risks using specific standards (IAB Tech Lab, GDPR). It targets ad revenue operations teams for pre-bid assessment, which distinguishes it from sibling tools like fraud_detector or programmatic_attribution_calibrator.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies when to use the tool ('before bidding') and its target audience. However, it does not explicitly mention when not to use it or provide alternative tools for similar tasks. The context is clear but lacks exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

proposal_generatorC

Read-only

Inspect

Générateur de propositions commerciales — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Spendesk × Gapup Hub — Proposition 7 sections · ROI 3Y €1.8M · Payback 4 mois. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`offer`	Yes
`company`	Yes
`prospect`	Yes
`dealContext`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint and openWorldHint, which the description does not contradict. The description adds that it returns an 'audited deliverable' and validates inputs server-side, but it does not cover performance, async behavior, or potential side effects beyond what annotations imply.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences with a mix of French and English, which is somewhat distracting. It front-loads the purpose but includes a lengthy reference case that could be more succinct. Not overly verbose but not optimally structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested objects, 5 params, no output schema), the description is insufficient. It does not explain the output format, async behavior, or how to interpret the 'audited deliverable.' The reference case gives a glimpse but is not comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only async param documented). The description briefly mentions 'company, prospect, offer' as required and alludes to case fields through the reference case, but it does not add meaningful detail for each parameter to compensate for the low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly indicates the tool generates commercial proposals (propositions commerciales) with C-suite expertise and returns a structured deliverable. While the name is self-explanatory, the description adds a reference case but does not explicitly differentiate from sibling tools like pitch_deck_storyline or deal_coach.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The reference case provides an example context, but there are no when-not-to-use instructions or comparisons to siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

qa_pre_flightC

Read-only

Inspect

Préparation Q&A investisseurs — Gapup agent-payable C-suite expertise (FUNDRAISING). Returns a structured, audited deliverable. Reference case: Agicap Série C €70M — 30 Q&A stratégiques · 8 questions pièges · Plan de préparation 21 jours. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`round`	Yes
`company`	Yes
`founderContext`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that it returns a structured, audited deliverable, which is consistent. No contradictions, but minimal additional behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short and includes a reference case, but it mixes languages and could be more concise. The key information is front-loaded, but the reference is somewhat extraneous for understanding the tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the nested input schema and no output schema, the description lacks detail on the deliverable's structure and content. The reference case is vague, leaving the agent without a complete understanding of what to expect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25%, and the description does not explain any parameters. It only states 'send the documented case fields,' which adds no meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool prepares investor Q&A materials for fundraising, with a reference case. The purpose is specific and distinct from sibling tools, though the verb is implied rather than explicit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or alternatives are provided. The fundraising context and reference case imply usage, but no guidance on when not to use or comparisons to similar tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

qbr_autoC

Read-only

Inspect

QBR automatique CSM — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub × Alan — QBR Q1 2026 · Health score 82/100 · Upsell €18k détecté · Renewal low risk. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`wins`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`period`	Yes
`company`	Yes
`metrics`	Yes
`customer`	Yes
`challenges`	Yes
`nextQuarterGoals`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description only adds minimal context about returning an audited deliverable and server-side validation. It does not disclose whether it calls external APIs (despite openWorldHint), any side effects, or behavioral traits beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is somewhat concise but includes extraneous information like a specific reference case and jargon. It could be streamlined to focus on the core functionality and parameters. It is front-loaded with the purpose but includes filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 parameters, nested objects, no output schema), the description is insufficient. It does not explain the overall workflow, expected output format, error scenarios, or provide enough context for the agent to correctly invoke the tool without additional knowledge.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter (async) has a description. The schema coverage is 13%, and the description does not elaborate on any of the seven required parameters. It merely says 'send the documented case fields' without explanation of what each field means or how to fill them.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool generates a QBR automatically for CSM and returns a structured audited deliverable. It provides a concrete reference case to illustrate output. However, the use of jargon like 'Gapup agent-payable C-suite expertise (CRO)' slightly obscures the purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide any guidance on when to use this tool compared to alternatives. No explicit when-to-use or when-not-to-use criteria. Among many sibling tools, there is no differentiation mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

real_estate_intelA

Read-onlyIdempotent

Inspect

Real estate intelligence aggregator with a best-in-class French dataset (DVF — Demandes de Valeurs Foncières — 100% of FR transactions since 2019, public, keyless) plus UK Land Registry Price Paid (all UK transactions 1995+). Four modes: (1) property — full transaction history for a specific address; (2) comparables — median/std price/m² within a radius (default 500m); (3) market — annual price series, YoY change, volume, trend by commune; (4) valuation — two-method estimate (comparables median + hedonic regression if n≥30) with confidence scoring (high/medium/low). All sources are free and require no API key. ICP: PropTech agents, REITs, fund managers, family offices, insurance. SLA: ≤25s p95 (sources fetched in parallel, 8s budget each). Cache: 24h TTL (DVF data is stable). Quality score: 30 pts DVF retrieved, 20 pts geocoding, 20 pts UK LR retrieved, 15 pts if comparables count ≥10, 15 pts if method quality achieved. Status: failed/<60/≥60 → failed/partial/final. No env vars required.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	property: transactions at an address \| comparables: sample around a point \| market: commune/neighbourhood market stats \| valuation: price estimate for a given surface
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`date_to`	No	ISO date YYYY-MM-DD — latest transaction date
`location`	Yes	Location descriptor. One of: {address, city?, country?} \| {lat, lon, radius_m?} \| {insee_code} for FR communes.
`date_from`	No	ISO date YYYY-MM-DD — earliest transaction date
`max_results`	No	Maximum number of results to return (5–50, default 20)
`surface_max`	No	Maximum surface in m² (±20% tolerance applied for comparables)
`surface_min`	No	Minimum surface in m² (±20% tolerance applied for comparables)
`property_type`	No	Filter by property type (default: all)

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`market`	No	mode=market — commune-level market stats
`status`	Yes
`sources`	Yes
`property`	No	mode=property — transactions at the location
`valuation`	No	mode=valuation — price estimate
`comparables`	No	mode=comparables — aggregated comp stats
`quality_score`	Yes
`location_resolved`	Yes

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations: async mode with polling, parallel source fetching with time budget, 24h TTL cache, quality score composition, and that no env vars are required. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is detailed and well-structured with bullet points for modes, but slightly long. It is front-loaded with the core purpose and datasets, making the key information easy to find.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple modes, nested location object, output schema exists), the description covers all necessary aspects: data sources, modes, async option, quality scoring, caching, SLA, and ICP. It is complete for effective usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema already documents all parameters well. The description adds context about mode-specific behaviors (e.g., radius default 500m, surface tolerance ±20%) but mostly re-iterates schema descriptions. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a real estate intelligence aggregator with best-in-class French and UK datasets, and explicitly lists four distinct modes (property, comparables, market, valuation) which differentiate it from its many siblings on the server.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides extensive guidance on when to use each mode, specifies the ICP (PropTech agents, REITs, etc.), and details SLA, caching, and quality scoring. However, it lacks explicit 'when not to use' or direct alternatives, though the modes cover different use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

realtime_data_streamsA

Read-only

Inspect

High-frequency real-time market data for trading agents, market-making bots and fintech analysts. Returns FX ticks (bid/ask/spread), intraday OHLCV candles, crypto orderbook snapshots (depth 5-50), recent trades with VWAP, and sovereign bond yields. All sources are keyless public REST APIs (Binance, Coinbase, Kraken, OKX, open FX feeds, worldgovernmentbonds.com). Ultra-short cache: 10s for ticks/trades, 60s for orderbook. Use when an agent needs live market data as precise numeric inputs for trading logic, arbitrage detection, or portfolio valuation.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Data stream type: fx_tick (latest FX bid/ask/mid/spread), fx_history_intraday (OHLCV candles), crypto_orderbook (order book snapshot), crypto_trades_recent (last 50 trades + VWAP), bond_yields (sovereign yield %)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`depth`	No	Orderbook depth (levels each side) for crypto_orderbook mode (default: 20)
`period`	No	Candle period for fx_history_intraday mode (default: 5m)
`symbol`	Yes	Market symbol. FX: EURUSD, GBPUSD, USDJPY. Crypto: BTCUSDT, ETHUSDT, BTC-USD. Bonds: US10Y, US2Y, DE10Y, FR10Y, UK10Y, JP10Y, IT10Y

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`symbol`	Yes
`fx_tick`	No
`sources`	Yes
`fx_history`	No
`bond_yields`	No
`crypto_trades`	No
`quality_score`	Yes
`crypto_orderbook`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and openWorldHint. The description adds cache timing (10s for ticks/trades, 60s for orderbook) and source openness ('keyless public REST APIs'), which are beyond the annotations and helpful for understanding latency and access.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (3 sentences), front-loads purpose, lists data types, includes cache notes, and ends with usage guidance. No redundant or filler content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of output schema and annotations, the description covers key aspects: data types, sources, cache, and use cases. It lacks mention of the async parameter for job polling, but overall it provides sufficient context for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed parameter descriptions. The description reinforces mode types and symbol examples but does not add meaning beyond what the schema already provides. For high-coverage schemas, a baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool returns high-frequency real-time market data (FX ticks, OHLCV candles, crypto orderbook, trades, bond yields) for trading agents and analysts. It clearly identifies the resource and its scope, distinguishing it from the many sibling tools that deal with different data or analysis.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit use cases ('trading logic, arbitrage detection, portfolio valuation') and mentions cache durations, guiding when to use. However, it does not explicitly contrast with similar tools like fx_rate or historical_price_series, leaving some ambiguity about selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recruiting_architectC

Read-only

Inspect

Architecte du recrutement — Gapup agent-payable C-suite expertise (CHRO). Returns a structured, audited deliverable. Reference case: Stripe France — 12 postes Q3 2026 · sourcing multi-canaux + employer brand + frameworks d'entretien + parcours candidat · time-to-hire -45%. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`roles`	Yes
`budget`	Yes
`company`	Yes
`preferences`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true and openWorldHint=true. The description adds that it returns a structured, audited deliverable and inputs are validated server-side, but does not disclose additional behavioral traits beyond what annotations convey. It is consistent with annotations, so no contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences, includes a concrete reference case, and is front-loaded with the core purpose. It could be slightly tighter but is generally concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 6 parameters, nested objects, and no output schema. The description does not explain the return value format or what the 'structured, audited deliverable' contains, leaving significant gaps for a tool of this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (only 'async' has a description). The tool description does not explain the meaning or usage of key parameters like 'focus', 'roles', 'budget', 'company', or 'preferences', leaving the agent to rely on the schema alone, which lacks descriptions for most fields.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is a recruitment architect for C-suite expertise (CHRO), returning a structured deliverable, with a reference case. It specifies a verb (architect) and resource (recruitment strategy), but does not explicitly differentiate from sibling tools like candidate_screening_ranking or talent_intelligence.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use when a structured recruitment strategy is needed but provides no explicit context on when to use this tool vs alternatives, nor any exclusions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

re_deal_screenerB

Read-only

Inspect

Screener deal immobilier (EU) — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Answers: Screen this real estate deal: , , asking € — give me cap rate vs market, location score, risk flags, and deal recommendation. · Should I pursue this hotel investment at for € with keys? Run an EU deal screener with DVF comparables and Géorisques risk data. · What is the real estate market valuation for a at based on recent French DVF transactions? · Run a due diligence deal screen on this property: , €, sqm — flood risk, cap rate, price vs comparables. · Evaluate this commercial real estate deal for an investment committee: at , €, NOI €. Reference case: Hôtel boutique 45 keys · 12 rue de la Paix 75002 Paris · €12.5M · €277k/key · comp DVF €250-380k/key · location 92/100 · score 72 · pursue-with-conditions. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`address`	Yes
`deal_type`	Yes
`country_iso2`	Yes		FR
`units_or_keys`	No
`gross_area_sqm`	No
`current_noi_eur`	No
`asking_price_eur`	Yes
`investment_thesis`	No

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true, and description confirms non-destructive nature ('returns structured deliverable'). However, description does not address async behavior (parameter 'async' allows immediate job_id); it implies synchronous results, which could mislead agents using async mode.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is verbose and starts with a confusing French phrase. While it front-loads purpose and provides multiple examples, it could be more succinct and better organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 9 parameters and no output schema, description fails to fully explain input parameters or return structure. Async behavior is not covered, and output is vaguely described as 'structured, audited deliverable'.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 11% (only 'async' has description). Description partially compensates by listing example fields (address, deal_type, price) and providing a reference case, but lacks explanations for parameters like 'investment_thesis', 'gross_area_sqm', etc.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly identifies that this tool screens EU real estate deals and returns structured analysis (cap rate, location score, risk flags, recommendation). However, it does not differentiate from sibling tool 'ma_deal_screener' or 'real_estate_intel', which could cause confusion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides example queries showing when to use (e.g., 'Screen this real estate deal: ...'), but lacks explicit guidance on when not to use or alternatives like 'real_estate_intel' for simpler market valuations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

renewal_optimizerC

Read-only

Inspect

Optimiseur de renouvellements — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Renewals 10 comptes · €89k ARR à 90j · 3 comptes at-risk · Playbook 6 scénarios. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`horizon`	No
`product`	Yes
`accounts`	Yes
`targetRenewalRatePct`	No

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and openWorldHint=true. The description adds that the tool 'Returns a structured, audited deliverable', which aligns with the read-only nature. It also mentions server-side input validation. However, it does not elaborate on the deliverable's format, size, or any other behavioral details beyond what the annotations imply. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (3 sentences) but mixes French and English, and includes a reference case that is somewhat distracting. It front-loads the core purpose but the example could be trimmed or moved. It is adequate but not optimally concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (6 parameters, nested objects, no output schema), the description is incomplete. It does not explain the return format, the role of the 'async' parameter, how to compose the 'accounts' array, or what the deliverable contains. Users are left with many unanswered questions, relying heavily on the schema which is itself poorly documented.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (only the 'async' parameter has a description). The tool description does not explain any parameter meaning, nor does it provide context for the nested objects (company, product, accounts) or the horizon enum. Users must rely solely on the sparse schema comments, which lack detail for most fields.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a renewal optimizer ('Optimiseur de renouvellements') targeting CRO-level expertise, and specifies it returns a structured, audited deliverable. It references a concrete case, adding specificity. However, it does not explicitly distinguish itself from closely related sibling tools like churn_defender or save_plays, preventing a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description offers no guidance on when to use this tool versus alternatives. It merely instructs to 'send the documented case fields' and mentions server-side validation. There is no discussion of prerequisites, appropriate contexts, or exclusions, leaving the agent without decision-making support.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

repo_rate_arbitrage_scannerA

Read-onlyIdempotent

Inspect

Scans for arbitrage opportunities between repo rates (ECB) and short-term funding markets (Treasury Direct). Designed for CFOs to identify cost-effective funding strategies. Inputs include optional date ranges and currency filters. Outputs structured arbitrage opportunities with rate differentials and confidence scores.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`endDate`	No
`currency`	No
`startDate`	No
`minDifferential`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`opportunities`	No

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and openWorldHint. The description adds that outputs are structured with rate differentials and confidence scores. No additional behavioral traits (e.g., data freshness, rate limits) are disclosed, but annotations cover safety profile.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with core purpose, no redundancy. Every sentence adds value: purpose, audience, inputs, output. Excellent conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, audience, optional inputs, and output structure. Given the tool has an output schema, the description needn't detail return values. However, the minDifferential parameter is undocumented, and async behavior is omitted. Mostly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 20% (only async described). The description adds meaning for date ranges and currency filters but does not explain minDifferential. Baseline is 3 due to low coverage, and description only partially compensates.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it scans for arbitrage opportunities between repo rates and short-term funding markets, and specifies the target audience (CFOs). However, it does not explicitly distinguish itself from sibling arbitrage tools like tariff_arbitrage_finder or ma_arbitrage_hunter, though the domain is specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for funding strategy identification but does not provide explicit guidance on when to use this tool versus alternatives, nor does it mention when not to use it. The context is implied but not actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

reputation_engineC

Read-only

Inspect

Moteur de réputation — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: PayShield SaaS — Monitoring réputation Q2 2026. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`brand`	Yes
`channels`	Yes
`industry`	Yes
`keywords`	Yes
`historicalCrises`	No

Tool Definition Quality

C2.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint; the description adds that it returns a 'structured, audited deliverable' and mentions server-side validation. However, the meaning of 'agent-payable' is unclear and could imply cost behavior not covered by annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short but includes extraneous details (reference case, jargon) that do not earn their place. It is front-loaded with the title but could be more efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 6 parameters, no output schema, and low schema coverage, the description should provide more context on input requirements and output format. It fails to explain what the deliverable contains or how parameters influence results.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (only async is described). The tool description does not explain the meaning or usage of any parameters, only vaguely referencing 'documented case fields'. No value added over the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description mentions 'Moteur de réputation' and 'Monitoring réputation', indicating it's for reputation monitoring, but uses unclear jargon like 'Gapup agent-payable C-suite expertise (CMO)' and lacks a clear English purpose statement. It does not explicitly distinguish from sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not specify when to use this tool versus alternatives. It only provides a reference case (PayShield SaaS) but no guidance on context or exclusion of other tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

research_paper_qaC

Read-only

Inspect

Synthèse littérature scientifique (PaperQA2) — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Answers: Conduct a literature review on — what does the evidence show across recent papers? · Evaluate the current hypothesis that — supporting and contradicting evidence with citations. · Map contradictions in the literature on — which camps exist, how many papers per side? · What is the state-of-the-art understanding of as of ? · Perform an interdisciplinary synthesis on — findings from and . Reference case: Gut-brain axis · Cognitive performance in healthy adults · OpenAlex+SemanticScholar+CORE · Evidence synthesis · DOI-verified citations · Contradictions + gaps mapped. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`max_papers`	Yes
`year_range`	No
`focus_domain`	Yes		all
`include_preprints`	Yes
`research_question`	Yes
`evidence_grade_required`	Yes		standard

Tool Definition Quality

C2.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnly and openWorld. Description adds that it provides a structured, audited deliverable with DOI-verified citations and server-side validation. However, it does not mention async behavior, performance, or error handling. Description does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long and chaotic, mixing French and English, including a reference case and sources in a stream-of-consciousness style. Lacks front-loading and clear structure.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 7 parameters, no output schema, and low parameter coverage, the description omits return format, pagination, error handling, and async details (only in schema). Insufficient for an agent to use confidently.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema description coverage at 14%, the description should compensate but does not explain individual parameters beyond the example queries. It lists sources (OpenAlex+SemanticScholar+CORE) but not how parameters affect search behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it's a literature review tool that returns a structured deliverable and provides example research questions, but it mixes languages and includes unclear phrases like 'Gapup agent-payable C-suite expertise (RISK).' It does not clearly differentiate from sibling tools like sci_literature_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Gives example use cases but no explicit guidance on when to use vs alternatives, no exclusions or prerequisites. Sibling sci_literature_search exists but is not addressed.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

retail_media_attribution_bridgeA

Read-onlyIdempotent

Inspect

Provides unified attribution insights for retail media and programmatic campaigns by analyzing MMM signals from FreeWheel Marketplace and Common Crawl. Designed for ad revenue operations teams to bridge cross-channel performance gaps. Accepts campaign IDs, date ranges, and channel filters as input. Returns structured attribution data with source provenance and confidence scores.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`endDate`	Yes	End date for attribution window (YYYY-MM-DD)
`channels`	No	Channels to include in analysis
`startDate`	Yes	Start date for attribution window (YYYY-MM-DD)
`campaignIds`	Yes	List of campaign identifiers to analyze
`confidenceThreshold`	No	Minimum confidence score for included signals

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`attribution`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint. The description adds that it returns structured attribution data with source provenance and confidence scores, which is useful beyond annotations. No contradictions found.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the main purpose, and each sentence adds value. Efficient but could be more structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists and annotations are provided, the description covers the basics. However, it misses guidance on the async parameter and confidenceThreshold, which are important for correct usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description mentions campaign IDs, date ranges, and channel filters, which are already in schema, but does not add new meaning to parameters like async or confidenceThreshold.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides unified attribution insights for retail media and programmatic campaigns by analyzing MMM signals. It specifies the verb 'provides' and the resource 'unified attribution insights', and distinguishes from siblings like programmatic_attribution_calibrator and retail_media_esg_compliance.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description notes it is designed for ad revenue operations teams to bridge cross-channel performance gaps, giving clear context. However, it does not explicitly state when not to use it or name alternatives, which would be a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

retail_media_esg_complianceA

Read-onlyIdempotent

Inspect

Audits retail media networks for ESG compliance by analyzing ad placements, tracking cookies, and verifying ethical advertising standards. Designed for ad_revenue_ops teams to ensure GDPR and sustainability compliance across digital retail platforms. Accepts domain lists or network identifiers as input and returns structured compliance reports with warnings and source references. Requires async:true to avoid timeout errors.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`domains`	No	List of retail media network domains to audit
`checkESG`	No	Enable ESG advertising standards compliance check
`checkGDPR`	No	Enable GDPR cookie tracking compliance check
`networkIds`	No	List of retail media network identifiers

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`summary`	No
`warnings`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds performance-related behavior (async required) and return type (structured compliance reports with warnings and source references). Annotations already indicate readOnly and idempotent, so no contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each carrying essential information: purpose, target use, input, output, and requirement. No redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, input types, team, output summary, and async requirement. With output schema present, it does not need to detail return fields; description is sufficiently comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so description adds little beyond what is already in the input schema. Mentions domain lists or network identifiers, but those are already defined in schema properties.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it audits retail media networks for ESG compliance, analyzing ad placements, tracking cookies, and verifying ethical advertising standards. Distinguishes from sibling tools by specifying target (retail media) and team (ad_revenue_ops).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly designed for ad_revenue_ops teams and requires async:true to avoid timeouts, providing clear usage context. Does not mention when not to use or list alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

revops_architectB

Read-only

Inspect

Architecte RevOps — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Qonto — ARR €200M · 200 reps · forecast ±35% · fuite €4,2M/an identifiée · plan RevOps 12 semaines. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`keyMetrics`	Yes
`objectives`	Yes
`revenueTeam`	Yes
`currentStack`	Yes
`horizonMonths`	Yes
`currentPainPoints`	Yes

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side, but no further behavioral traits are disclosed, such as return format or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is fairly concise with two sentences plus a reference case. However, the reference case is somewhat extraneous and could be shortened or moved to a separate field.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested parameters, no output schema, low schema coverage), the description lacks details about what the deliverable contains, prerequisites, or how to interpret results. It feels incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%, yet the description does not explain any parameters beyond the general instruction to 'send the documented case fields'. With 8 complex parameters including nested objects, this is significantly insufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly indicates the tool generates a structured RevOps deliverable, using a reference case for context. However, it does not differentiate from sibling architect tools like abm_architect or ld_architect, so there is room for improvement.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Usage context is implied by the name and reference case: use for RevOps gap analysis. However, no explicit when-to-use or when-not-to-use guidance is provided, and no alternatives are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rfp_tender_architectC

Read-only

Inspect

Architecte d'appels d'offres — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: AO DINUM — Plateforme IA souveraine. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`rfpType`	Yes
`rfpScope`	Yes
`budgetRange`	Yes
`deadlineISO`	Yes
`clientCompany`	Yes
`ourPositioning`	Yes
`compliancePoints`	No
`competitorsLikely`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and returns a structured deliverable, but does not disclose auth needs, rate limits, or specific behavioral traits beyond that.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief with two sentences plus a reference case. It is mostly concise, though the reference case could be omitted or moved to an example. The information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 9 parameters (7 required), no output schema, and no sibling differentiation, the description is incomplete. It does not explain the deliverable's content, return value, or async behavior (though async is in schema). The reference case is not general enough.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 11% schema description coverage (only async documented), the description says 'send the documented case fields' but does not elaborate on the other 8 parameters. It adds minimal meaning beyond the schema constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is an RFP tender architect that returns a structured, audited deliverable, referencing a specific case. However, it does not explicitly differentiate from similar siblings like proposal_generator or battle_plan, relying on the specific name.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description mentions 'C-suite expertise (COO)' but lacks explicit when-to-use, when-not-to-use, or alternative suggestions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rse_policy_builderC

Read-only

Inspect

Architecte de politique RSE — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: TechCorp SAS — Politique RSE 2025-2028 (500 FTE, €60M CA, SaaS B2B France). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`values`	Yes
`company`	Yes
`ambitions`	Yes
`targetLabels`	No
`currentInitiatives`	No
`targetStakeholders`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true, but the description implies generation of a deliverable, which may be considered not purely read-only. No additional behavioral details (e.g., external data usage, auth needs, rate limits) beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Short, but not well-structured. Mixes French and English. Front-loads a vague 'Architecte de politique RSE' but lacks clear action details. Could be more concise and informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Very incomplete given the tool's complexity (8 params, nested objects, no output schema). Does not describe the deliverable format, audit process, or expected output. Leaves the agent with significant uncertainty.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%. The description adds no parameter meanings beyond what the schema provides. It lacks explanations for key parameters like company, values, ambitions, etc.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it builds a CSR policy (Architecte de politique RSE) and returns a structured deliverable. The name reinforces this. However, it does not explicitly differentiate from similar sibling tools like action_plan_esg or sustainability_report.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. No mentions of prerequisites, context, or exclusions. The description only says inputs are validated server-side.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sabbatical_policy_comparatorA

Read-onlyIdempotent

Inspect

Enables CHROs to benchmark their company's sabbatical policies against peer organizations using data from SHRM, Payscale, and Mercer. Inputs include company size, industry, and current policy details. Outputs structured comparison with cost impact analysis, eligibility criteria, and duration benchmarks. Ideal for strategic HR planning and policy optimization.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`industry`	Yes	Industry classification code (NAICS)
`peerGroup`	No	List of peer company names for direct comparison
`companySize`	Yes	Number of employees in the company
`currentPolicy`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`benchmark`	No

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint, openWorldHint, and idempotentHint, and the description matches by framing the tool as a benchmarking query. It adds value by detailing outputs (cost impact analysis, eligibility criteria, duration benchmarks), which aligns with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences: first defining the core function, second listing inputs and outputs. Every sentence is purposeful, and it is front-loaded with the key action and audience.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With annotations covering safety (readOnly) and output schema handling return structure, the description sufficiently explains the tool's outputs and data sources. It is complete for a benchmarking tool, covering inputs, process, and expected deliverables.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 80% (high), so baseline is 3. The description adds meaning by specifying data sources (SHRM, Payscale, Mercer) and output components, which goes beyond the schema's parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: enabling CHROs to benchmark sabbatical policies against peer organizations using specific data sources (SHRM, Payscale, Mercer). It distinguishes this tool from siblings like executive_comp_peer_benchmark or comp_benchmark by focusing exclusively on sabbatical policies.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates the tool is 'Ideal for strategic HR planning and policy optimization,' providing context for when to use it. However, it does not explicitly state when not to use it or mention alternative tools, though the specialization implies exclusivity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

safety_guardrail_breach_analyzerA

Read-onlyIdempotent

Inspect

Analyzes potential LLM guardrail breaches against IEEE 7000 ethical compliance standards. Designed for risk persona to evaluate safety violations in AI outputs. Accepts raw LLM responses or structured breach reports, returns compliance analysis with severity scoring and mitigation recommendations.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`context`	No	Contextual information about the prompt or conversation
`llmOutput`	Yes	Raw text output from LLM to analyze for guardrail breaches
`severityThreshold`	No	Minimum severity score to report (0-10 scale)
`includeMitigations`	No	Whether to include mitigation recommendations

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`breaches`	No
`warnings`	No
`complianceScore`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint. Description adds value by specifying inputs (raw LLM responses or structured breach reports), outputs (compliance analysis, severity scoring, mitigation recommendations), and intended user, without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with no wasted words: first sentence states purpose and standard, second identifies persona, third describes inputs and outputs. Information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given parameter count (5), existing output schema, and annotations, the description covers core purpose, inputs, outputs, and standard. Lacks explicit mention of async or context params, but these are clear from schema. Sufficient for tool selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all 5 parameters. Description summarizes key parameters (llmOutput, severityThreshold, includeMitigations) but does not add new details beyond schema, meeting the baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states verb 'Analyzes' and specific resource 'LLM guardrail breaches against IEEE 7000 ethical compliance standards', clearly differentiating it from generic safety analysis tools like safety_violation_incident_logger or jailbreak_attempt_detector by specifying the standard and output (severity scoring, mitigations).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description mentions 'risk persona' as intended user and input types, but does not explicitly state when to use this tool over siblings like ai_act_incident_response or bias_amplification_tracker. The context is implied but not explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

safety_violation_incident_loggerC

Read-onlyIdempotent

Inspect

Logs AI safety violations for compliance reporting, targeting risk management personas. Accepts incident details such as violation type, severity, description, and timestamp. Returns structured data with compliance categorization based on NIST AI RMF guidelines. Ideal for automated incident tracking and regulatory reporting workflows.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`metadata`	No
`severity`	Yes
`timestamp`	Yes
`description`	Yes
`violationType`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`incidentId`	No
`nistReference`	No
`complianceCategory`	No

Tool Definition Quality

C2.5/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

There is a direct contradiction: annotations set readOnlyHint=true (indicating a read-only, non-mutating operation), while the description says 'Logs AI safety violations', which implies writing/creating records. This inconsistency severely misleads the agent. Beyond the contradiction, the description adds limited behavioral context—it mentions compliance categorization but does not clarify idempotency or effects of repeated calls. Score is 1 due to the contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is only three sentences, making it concise. It front-loads the main purpose. However, the phrase 'targeting risk management personas' is slightly extraneous and could be omitted for tighter focus. Overall, it is efficient but not maximally crisp.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, 4 required, 2 enums, nested objects, output schema), the description is insufficient. It mentions compliance categorization based on NIST AI RMF but does not detail the output structure, behavior for optional parameters, or how to handle errors. The presence of an output schema reduces the need to describe return values, but the input semantics and behavioral nuances are not covered.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (only the 'async' parameter has a description). The description lists the parameters (violationType, severity, description, timestamp) but adds no further meaning beyond the schema's names and constraints (e.g., enum values, min/max lengths, format). It fails to explain how these parameters interact or any validation rules, so it does not compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool logs AI safety violations for compliance reporting, targeting risk management personas. It uses a specific verb-resource pair ('logs ... violations') and alludes to NIST AI RMF guidelines, which helps differentiate it from generic logging tools. However, it does not explicitly differentiate from close sibling tools like 'ai_act_incident_response' or 'model_safety_certification_checker', preventing a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions it is 'Ideal for automated incident tracking and regulatory reporting workflows' but provides no guidance on when not to use this tool or what alternatives exist. Given the many sibling tools (e.g., 'bias_amplification_tracker', 'safety_guardrail_breach_analyzer'), explicit usage boundaries are missing, limiting its helpfulness for tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sales_enablement_architectC

Read-only

Inspect

Architecte Sales Enablement — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Spendesk — 45 reps · attainment 67% · ramp 5 mois → 3 mois · programme 8 modules · +€2,1M ARR. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`gaps`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`salesTeam`	Yes
`objectives`	Yes
`currentEnablement`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the agent knows this is a safe read operation. The description adds that inputs are validated server-side and returns an audited deliverable. However, openWorldHint=true is unexplained, and there is no mention of side effects, rate limits, or determinism. With annotations present, this is adequate but not exemplary.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph with four sentences, each adding distinct information (purpose, output type, reference case, validation note). It is concise and to the point, though it could be more front-loaded with the action verb.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested objects, no output schema), the description is incomplete. It lacks details about the output structure, error handling, and parameter constraints beyond a vague reference to a case study. The agent would struggle to invoke this tool correctly without external documentation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, so the description must compensate. The description only says 'send the documented case fields,' referencing a case study without explaining individual parameters (e.g., company, salesTeam, gaps). This provides insufficient guidance for correct parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it 'Returns a structured, audited deliverable' and positions it as C-suite expertise (CRO) for sales enablement architecture. It is clear what the tool does, but it does not explicitly distinguish it from sibling tools like 'battle_plans' or 'pitch_deck_storyline'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions 'Gapup agent-payable C-suite expertise (CRO)' and references a case study, but it does not provide explicit guidance on when to use this tool versus alternative tools. There is no clarification of prerequisites or conditions for use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sales_pipeline_forecastC

Read-only

Inspect

Prévision de pipeline commercial — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Doctolib Enterprise — pipeline Q2 2026 · 50 deals enterprise/mid-market · forecast confidence par deal + commit/best-case/worst-case. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`pipeline`	Yes
`historicalConversionByStage`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and the deliverable is structured and audited, but does not disclose any potential side effects or return format details. Given the annotations already cover safety, the description offers moderate additional context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences long, but includes unnecessary jargon ('Gapup agent-payable C-suite expertise (CRO)') and a specific reference case that adds length without general utility. It is front-loaded but lacks clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, 5 parameters, no output schema, low schema coverage), the description is inadequate. It does not specify the output structure beyond 'structured, audited deliverable', and the reference case does not provide a complete template for agents to understand what the forecast response contains.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, and the description does not explain any of the five parameters (company, pipeline, async, focus, historicalConversionByStage). It merely says 'send the documented case fields', leaving the agent to infer parameter meanings from the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool produces a pipeline forecast ("Prévision de pipeline commercial") and returns a structured, audited deliverable with forecast confidence levels. However, the opening phrase 'Gapup agent-payable C-suite expertise (CRO)' is jargon-heavy and may obscure the purpose for some agents.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no explicit guidance on when to use this tool versus its many siblings (e.g., meddic_scoring, deal_coach). It only references a specific case (Doctolib) but does not articulate criteria for when this tool is appropriate or when alternatives should be considered.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanctions_screener_multiB

Read-only

Inspect

Screening Sanctions Multi-listes — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Answers: For , run full OFAC + EU + UK HMT + UN + SECO + Canada SEMA + PEP + adverse media screening with composite risk score and evidence trail. · Is <company/individual> on any major international sanctions list? · What is the composite AML risk score for across all major watchlists? · Screen this M&A target / supplier / LP against all major sanctions lists and give me a compliance recommendation. · Is a PEP or associated with a PEP? What Enhanced Due Diligence is required? Reference case: Veridian Trading Co. LLC (Cyprus) — 7 listes · PEP check · adverse media 2 ans · composite 52/100 · escalate-to-compliance → EDD requis. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`address`	No
`aliases`	No
`entity_name`	Yes
`entity_type`	Yes
`context_note`	No
`date_of_birth`	No
`jurisdiction_focus`	Yes		all
`country_of_registration`	No
`adverse_media_lookback_days`	Yes

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, so the agent knows this is a safe read operation. The description adds that it returns a 'structured, audited deliverable' and mentions server-side validation, but does not disclose other behavioral traits beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph that front-loads the purpose but then includes a list of example questions and a reference case, making it somewhat verbose. It could be more concise by removing the example list and focusing on key guidance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 10 parameters (4 required) and no output schema. The description lacks details on parameter semantics (e.g., format of date_of_birth, country codes) and does not describe the return format beyond 'structured, audited deliverable'. This leaves significant ambiguity for agent invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 10% (only 'async' has a description). The description does not explain individual parameters such as 'date_of_birth' or 'country_of_registration'. It only says 'send the documented case fields' which is insufficient given the parameter count and low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool screens multiple sanctions lists (OFAC, EU, UK HMT, etc.) and returns a composite risk score and evidence trail. It includes example questions that define its purpose. However, it does not explicitly differentiate from sibling tools like kyc_screener or sharia_compliance_screener.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit use cases via example questions (e.g., 'Is <entity> on any sanctions list?'), indicating when to use the tool. However, it does not specify when not to use it or mention alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

save_playsC

Read-only

Inspect

Plans de sauvetage clients — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Kyriba — Plan sauvetage 30j · ARR €11.988 · Champion parti · Script 6 actions · 3 concessions. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`account`	Yes
`company`	Yes
`product`	Yes

Tool Definition Quality

C2.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and returns an audited deliverable, which is consistent. However, it does not disclose additional behavioral traits like whether the result is persisted, auth requirements, or rate limits. The description adds moderate value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but contains mixed-language jargon and an extended reference case that may obscure the core function. It is not overly verbose but could be more streamlined and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (4 parameters with nested objects, no output schema), the description is insufficient. It does not explain what the deliverable contains, how to interpret or use the result, or how errors are handled. The phrase 'returns a structured, audited deliverable' is too vague. The reference case offers some context but not enough for an agent to confidently use the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (only 'async' has a description). The description does not add meaning for the three required parameters (account, company, product) beyond saying 'send the documented case fields.' It fails to explain what each object expects or how they influence the deliverable. With low coverage, the description should compensate, but it does not.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description indicates the tool returns a 'structured, audited deliverable' for client rescue plans, but the purpose is implicit and mixed with French/English jargon ('Gapup agent-payable C-suite expertise (CRO)'). It does not explicitly state a verb like 'generates' or 'creates', and the reference case is the only concrete example. Sibling tools like 'churn_defender' and 'renewal_optimizer' likely overlap, but no differentiation is provided.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is given on when to use this tool versus alternatives. There is no mention of prerequisites, exclusions, or context where other tools would be more appropriate. The description provides a case reference but does not help an agent decide usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sci_literature_searchA

Read-only

Inspect

Recherche bibliographique multi-sources sur la litterature scientifique. Sources : OpenAlex (200M+ works) · Semantic Scholar · arXiv · PubMed · CrossRef. Modes : search | meta_analysis | citation_network | expert_finder. Keyless / free tier. Cache LRU 12h.

ParametersJSON Schema

Name	Required	Description
`mode`	No	Mode de recherche. Defaut: search
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Keywords, titre, auteur, DOI (ex: 10.xxxx/xxxx accepte)
`domain`	No	Domaine scientifique. Defaut: all
`date_to`	No	Date ISO fin (YYYY-MM-DD)
`date_from`	No	Date ISO debut (YYYY-MM-DD)
`max_results`	No	5-50. Defaut: 20
`min_citations`	No	Nombre minimal de citations

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`papers`	Yes
`status`	Yes
`experts`	No
`sources`	Yes
`meta_analysis`	No
`quality_score`	Yes
`citation_network`	No

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds context about keyless/free tier and a 12h LRU cache, but does not detail rate limits, data freshness, or what happens with errors. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded with the main purpose. Bullet points for sources and modes are efficient. Every sentence adds value, though it could be slightly more structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 parameters, output schema exists), the description adequately covers purpose, sources, modes, and key features. It does not need to explain return values due to output schema. A mention of async option would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description lists sources and modes which are already captured in the schema's mode enum. It does not add new meaning to parameters beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a multi-source bibliographic search on scientific literature, listing specific sources (OpenAlex, Semantic Scholar, arXiv, PubMed, CrossRef) and modes (search, meta_analysis, citation_network, expert_finder). This distinguishes it from sibling tools like research_paper_qa or web_search_multilang.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for literature searches but does not explicitly state when to use this tool versus siblings or when not to use it. No alternatives or exclusions are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sec_filing_decoderB

Read-only

Inspect

Décodeur de filing SEC — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Answers: Read the 10-K of and give me the material red flags, KPI movements, and a board-ready executive summary. · What has materially changed in 's risk profile in its latest annual filing? Flag any going-concern or auditor-change signals. · Is there any M&A signal or strategic review hint in 's most recent SEC filings? What's the evidence? · Prepare a due-diligence SEC filing brief for : financial snapshot, red flags, governance changes, and recommended next actions. · What is the sentiment of 's latest 10-K compared to its most recent 10-Q — bullish, neutral, or bearish? Reference case: SHOP · 10-K FY2024 · 4 red flags (1 critical: merchant concentration) · Revenue +24.7% YoY · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`cik`	No
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	Yes		all
`ticker`	No
`filing_types`	Yes
`lookback_months`	Yes

Tool Definition Quality

B3.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds that the tool returns a 'structured, audited deliverable' and that inputs are server-validated, which provides context beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose, including marketing language ('Gapup agent-payable C-suite expertise'), multiple example questions, and a reference case. It could be significantly condensed while retaining core information. The length hinders quick scanning.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 6 parameters (3 required), no output schema, and low schema coverage, the description is incomplete. It lacks details on expected output structure, parameter constraints, and error handling, leaving the agent with insufficient information for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (only 'async' described). The description does not explain parameters like filing_types, lookback_months, focus, or cik, and does not compensate for the low coverage. The agent lacks necessary detail to specify calls correctly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool decodes SEC filings, providing red flags, KPI movements, and executive summaries. It includes specific example queries that define its scope, distinguishing it from many siblings by its focus on SEC filings. The purpose is explicit and well-articulated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides example queries that implicitly guide usage, but it lacks explicit when-to-use or when-not-to-use guidance, and does not mention alternatives among the many sibling tools. This leaves the agent without clear differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sentiment_news_pulseB

Read-only

Inspect

Pulse Média & Sentiment — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: What is the current PR / brand sentiment for over the last 7 days? Show top headlines, trend signals, and recommended actions. · Is there a crisis building for ? Detect early-warning signals in press coverage and flag emerging negative narratives. · Track launch media coverage for — what is the press sentiment and which topics dominate the conversation? · Compare media sentiment between and its competitors over the past week. · What should our communications director prioritize in the next 48h based on current press coverage of ? Reference case: Velora Payments — Pulse média 7j · sentiment neutre (score +5) · crise émergente détectée · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entity_name`	Yes
`entity_type`	Yes		company
`sentiment_lens`	Yes		reputation
`date_range_days`	Yes
`language_filter`	Yes		en
`include_competitors`	No

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and returns a structured deliverable, but does not elaborate on permissions, rate limits, or other behavioral traits beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the title and core purpose, but the extensive list of example questions and the reference case make it verbose. Some sentences could be merged or removed to improve conciseness without losing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 7 parameters with 5 required and no output schema, the description should provide a clearer picture of what the deliverable contains. It mentions 'structured, audited deliverable' but no specifics. The examples hint at outputs (headlines, trends, actions) but not the exact structure or fields, making it incomplete for an agent to fully understand the return value.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 14%, meaning most parameters lack descriptions in the schema. The description lists example use cases (e.g., 'company', 'brand', 'product') but does not explicitly explain the meaning, constraints, or relationships of the parameters like entity_name, entity_type, sentiment_lens, etc., missing the opportunity to compensate for poor schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides media pulse and sentiment analysis for entities, listing specific use cases like brand sentiment, crisis detection, launch monitoring, and competitor comparison. It distinguishes from siblings by its focus on structured, audited deliverables, but could be more concise.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives clear scenarios (e.g., 'What is the current PR / brand sentiment', 'Is there a crisis building') and even includes a reference case. However, it does not explicitly state when not to use this tool or mention alternatives among the sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

seo_cro_auditA

Read-only

Inspect

Full SEO + CRO audit of any public URL. Analyses technical SEO (HTTP status, HTTPS, title/meta/canonical/robots, H1-H2, JSON-LD structured data, sitemap, robots.txt, OG/Twitter cards), content SEO (word count, keyword density top-10, readability estimate, image alt coverage, internal/external links), performance signals (page size, estimated render time, inline scripts/styles, unoptimised images), and CRO (CTA detection, above-fold CTAs, forms, social proof, trust signals, pricing visibility). Optionally compares up to 5 competitor URLs. Returns 0-100 scores per dimension plus a prioritised (P0/P1/P2) recommendation list. ICP: marketing managers, SEO/CRO consultants, e-commerce ops, agency teams. Budget: 8s per URL. Cache TTL: 1h.

ParametersJSON Schema

Name	Required	Description
`url`	Yes	Fully-qualified URL to audit (e.g. https://stripe.com/pricing)
`mode`	No	Audit scope — defaults to 'full'
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`compare_competitors`	No	Optional list of competitor URLs to compare (max 5)

Output Schema

ParametersJSON Schema

Name	Required	Description
`url`	Yes
`status`	Yes
`sources`	Yes
`audit_modes`	Yes
`content_seo`	Yes
`cro_signals`	Yes
`quality_score`	Yes
`technical_seo`	Yes
`overall_scores`	Yes
`recommendations`	Yes
`performance_signals`	Yes
`competitor_comparison`	No

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and destructiveHint; the description adds valuable context: budget (8s per URL), cache TTL (1h), async mode for long runs, and specifics on returned scores and recommendations. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is comprehensive but not overly verbose; it is front-loaded with the main purpose and logically organized by audit dimensions. Each sentence adds value, though a slight reduction in length would improve conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, multiple audit modes, optional competitor comparison, output scores/priorities) and available annotations, the description covers all necessary aspects for correct invocation and interpretation. The output schema existence further reduces burden.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage, so parameters are already well-documented. The description adds marginal extra (e.g., competitor comparison option is mentioned again), but does not significantly enhance understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs a 'Full SEO + CRO audit of any public URL', listing specific sub-dimensions (technical SEO, content SEO, performance signals, CRO). This distinctively separates it from sibling tools like competitive_deep_dive or seo_keyword_research.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies the target user personas (marketing managers, SEO/CRO consultants) and includes optional competitor comparison, but does not explicitly state when to avoid using this tool or recommend alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

seo_keyword_researchA

Read-only

Inspect

SEO keyword research from a seed keyword or topic. Uses Google Suggest (public, keyless) to discover related queries at 2 expansion levels, then clusters them by intent: informational / commercial / transactional / navigational — via heuristic pattern matching. Search volume is bucketed (very_high / high / medium / low / very_low) and clearly labelled as ESTIMATED — no fabricated precise numbers. Returns all keywords, intent clusters, quality scores (0-100), and top 10 opportunities. Supports country (gl) and language (hl) targeting. 100% keyless. Cache TTL 6h. ICP: SEO managers, content strategists, SaaS founders, agency teams.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`country`	No	ISO 3166-1 alpha-2 country code for Google Suggest (e.g. 'US', 'FR', 'DE'). Defaults to 'US'.
`language`	No	BCP-47 language code for suggestions (e.g. 'en', 'fr', 'de', 'es'). Defaults to 'en'.
`seed_keyword`	Yes	The seed keyword or topic to research (e.g. 'invoice software', 'project management tool')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`country`	Yes
`clusters`	Yes
`language`	Yes
`warnings`	Yes
`all_keywords`	Yes
`seed_keyword`	Yes
`quality_score`	Yes
`total_keywords`	Yes
`top_opportunities`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, destructiveHint), description discloses key behavioral traits: uses Google Suggest (public, keyless), two expansion levels, heuristic intent clustering, volume bucketing as estimated, and output structure. All without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is concise, front-loaded with primary purpose, and every sentence adds specific value (e.g., keyless, cache TTL, ICP). No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, description covers all necessary aspects: purpose, source, parameters, output contents, and performance characteristics (cache TTL). Sufficient for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all 4 parameters with 100% description coverage. Description adds value by explaining that country and language target Google Suggest and that the tool is keyless. Also clarifies that the async parameter returns a job_id, aligning with schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it performs SEO keyword research from a seed keyword, using Google Suggest. It specifies the resource (seed keyword) and the action (discover, cluster, return opportunities). Distinguishes from sibling tools like content_discovery by focusing on keyword research and clustering by intent.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context on target users (SEO managers, content strategists, etc.) and features like keyless access, cache TTL, and country/language targeting. Does not explicitly state when not to use or compare to alternatives, but context strongly implies intended use case.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sharia_compliance_screenerA

Read-only

Inspect

Sharia compliance screening engine for Islamic banks, Sukuk issuers, Gulf sovereign funds, halal investment managers and MENA family offices. Zero competing MCP on this vertical.

Standards supported: AAOIFI (default) | MSCI_Islamic | S&P_Sharia | DJIM

Four modes: • company — Full Sharia screen of a listed company: business activity (halal/haram/mixed) + AAOIFI financial ratios (debt/market-cap <30%, interest-assets <30%, non-compliant revenue <5%) • instrument — Sukuk / halal fund classification by ISIN or name. Maps to known Sharia boards. • sector_screen — Industry classification (halal/haram/mixed) with rationale + examples. Static AAOIFI-based map covering 40+ sectors. • financial_ratios — AAOIFI ratio computation on fetched or provided financials.

Prohibited activities screened: alcohol, gambling, pork, weapons, pornography, tobacco, conventional banking (riba), conventional insurance, adult entertainment, embryonic stem cells.

Output includes compliance_status (halal/haram/doubtful_mixed/purification_required), purification_pct when applicable, P0/P1/P2 signals, quality_score, and sources.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Screening mode. company=full listed company screen, instrument=Sukuk/fund classification, sector_screen=industry halal/haram classification, financial_ratios=AAOIFI ratio check.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Entity to screen. Company name, ticker or ISIN (e.g. "Aramco", "AAPL", "tobacco", "XS1234567890").
`standard`	No	Sharia standard to apply. Default "AAOIFI" (most conservative, widely accepted by Islamic banks).

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`company`	No
`signals`	Yes
`sources`	Yes
`instrument`	No
`quality_score`	Yes
`sector_screen`	No
`standard_used`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint: true, openWorldHint: true, destructiveHint: false) are consistent with a read-only screening tool. The description adds behavioral details beyond annotations: it explains the async parameter, lists output fields (compliance_status, purification_pct, signals, etc.), and describes the four modes' behaviors. No contradictions found.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-organized with clear sections for standards, modes, and output. It is reasonably concise given the complexity, though some redundancy exists (e.g., listing prohibited activities separately). The front-loading of the main purpose immediately aids understanding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (four modes, multiple standards, async option, output fields), the description is thorough. It covers all modes, input constraints (query max/min length), output specifics, and even lists prohibited activities. An output schema exists, but the description still provides essential context for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with all parameters described. The description adds significant value by explaining the semantics of each mode in detail, providing default standard (AAOIFI), and giving examples for the query parameter (e.g., 'Aramco', 'AAPL', 'tobacco'). This reduces ambiguity even though the schema already defines allowed values.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies a specific tool for sharia compliance screening in Islamic finance, listing four distinct modes (company, instrument, sector_screen, financial_ratios) with detailed use cases. It explicitly notes zero competing MCP tools on this vertical, effectively differentiating it from siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies the target users (Islamic banks, Sukuk issuers, etc.) and contexts for each mode, including examples and supported standards. It does not provide explicit 'when not to use' instructions, but the specificity of the domain and modes makes usage largely self-evident.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

social_engagement_velocity_trackerA

Read-onlyIdempotent

Inspect

Tracks hourly social engagement velocity (likes, shares, comments) across Twitter, LinkedIn, and Reddit for CMOs. Inputs include platform handles/subreddits and time range. Outputs engagement metrics, velocity trends, and platform-specific insights. Ideal for real-time marketing performance monitoring and competitive benchmarking. Keywords: social media analytics, engagement tracking, marketing KPIs, CMO dashboard.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`hours`	No
`platforms`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`trends`	No
`sources`	No
`warnings`	No
`engagement`	No

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, openWorldHint, and idempotentHint, so the description does not need to repeat these. The description adds that it tracks hourly data across specific platforms and outputs insights. It does not disclose rate limits, authentication needs, or behavior when data is unavailable, but given annotation coverage, this is acceptable.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: 4 sentences plus keywords. It is front-loaded with the core action and adds relevant details efficiently. No extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of output schema and annotations, the description is reasonably complete. It explains the tool's purpose, inputs, outputs, and use cases. It could be slightly more explicit about the time range parameter constraints, but overall it provides sufficient context for a read-only analytics tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema coverage at 33%, the description partially compensates by mentioning 'platform handles/subreddits and time range', which maps to the 'platforms' and 'hours' parameters. It does not mention the 'async' parameter. The description adds value but does not fully document all parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool tracks hourly social engagement velocity across Twitter, LinkedIn, and Reddit for CMOs. It specifies inputs (platform handles/subreddits, time range) and outputs (engagement metrics, trends, insights). This distinguishes it from sibling tools like social_influencer_fake_follower_detector.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says it is 'Ideal for real-time marketing performance monitoring and competitive benchmarking', providing implied usage context. However, it does not explicitly state when not to use this tool or mention alternative tools, which would be helpful given the many sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

social_influencer_fake_follower_detectorA

Read-onlyIdempotent

Inspect

Analyzes up to 10 social media influencers for fake followers by checking engagement velocity patterns (Trends24) and RSS feed anomalies. Returns authenticity scores, follower growth spikes, and suspicious activity flags. Optimized for CMOs evaluating influencer partnerships. Includes keywords: influencer marketing, fake follower detection, engagement analysis, social media audit.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`platform`	Yes	Social media platform of the influencers
`influencerHandles`	Yes	Array of up to 10 social media handles (e.g., ['@influencer1', 'user2'])

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`results`	Yes
`sources`	Yes
`summary`	No
`warnings`	Yes

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds behavioral context beyond annotations by specifying the analysis methods (Trends24, RSS feed anomalies) and output details (authenticity scores, follower growth spikes, suspicious activity flags). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise—two sentences plus a keyword list—with no superfluous words. It front-loads the primary action and context, making it easy for an AI agent to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (3 parameters, output schema present), the description adequately covers the core functionality and target audience. It could mention error cases or limitations beyond the 10-handle cap, but overall it is sufficient for initial understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all three parameters. The description does not add significant meaning beyond the schema; it only mentions the limit of 10 influencers, which is already captured in the schema's maxItems constraint.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Analyzes up to 10 social media influencers for fake followers' with specific methods (Trends24, RSS feed anomalies) and outputs (authenticity scores, follower growth spikes, suspicious activity flags). This distinguishes it from siblings like 'social_engagement_velocity_tracker' by focusing on fake follower detection.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says it is 'Optimized for CMOs evaluating influencer partnerships,' which provides a target user but no explicit guidance on when to use this tool versus alternatives like social_engagement_velocity_tracker or fraud_detector. No exclusion criteria or conditions for not using it are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sovereign_data_breach_impactA

Read-onlyIdempotent

Inspect

Estimates financial impact of a data breach across three jurisdictions (US, EU, UK) for CFO strategic planning. Inputs include breach size, industry sector, and affected jurisdictions. Outputs include direct costs, regulatory fines, reputational damage, and cyber insurance premium adjustments. Ideal for cross-border risk assessment, financial contingency planning, and board-level reporting. Keywords: data breach cost, regulatory fines, cyber insurance, financial risk, cross-jurisdiction impact.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`industry`	No	Industry sector of the affected organization
`records_lost`	Yes	Number of records compromised in the breach
`jurisdictions`	Yes	Jurisdictions where the breach has legal or financial impact
`detection_time_days`	No	Time in days to detect the breach

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`total_cost_usd`	No	Estimated total financial impact in USD
`cost_per_record_usd`	No	Cost per compromised record in USD
`regulatory_fines_usd`	No
`cyber_insurance_impact`	No
`reputational_damage_usd`	No	Estimated reputational damage cost in USD

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint. Description adds context on outputs (direct costs, fines, reputational damage, insurance adjustments) but does not disclose any additional behavioral traits beyond what annotations provide. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single paragraph of 4 sentences, front-loaded with purpose, then inputs/outputs and use cases. No redundant words; the keywords line is minor but does not detract. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With output schema present and input schema fully described, the description covers purpose, inputs, outputs, and use cases. It provides sufficient context for an agent to select and invoke the tool correctly, including listing output components not in schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all 5 parameters. Description maps 'breach size' to records_lost, 'industry sector' to industry, and 'affected jurisdictions' to jurisdictions, but omits async and detection_time_days. The description adds contextual keywords but does not significantly enhance beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states verb (estimates), resource (financial impact of data breach), and specific scope (three jurisdictions: US, EU, UK). It distinguishes from sibling tools by focusing on cross-border financial estimation for CFO planning, while siblings cover unrelated areas like market entry or architecture.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states ideal use cases: cross-border risk assessment, financial contingency planning, and board-level reporting. Does not provide exclusions or alternatives, but given the specificity and lack of overlapping siblings, this is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sre_slo_breach_predictorA

Read-onlyIdempotent

Inspect

As a CTO, predict potential SLO breaches 24 hours in advance by analyzing public incident reports and MITRE ATT&CK techniques. Input your service's critical components and reliability thresholds to receive breach probability scores, top contributing TTPs, and recommended mitigations. Uses MITRE ATT&CK, GitHub Advisories, and Cloudflare Radar data. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`time_window_hours`	No
`service_components`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`incident_reports`	No
`breach_probability`	No
`recommended_actions`	No
`top_ttp_contributors`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnly, openWorld, and idempotent behavior. The description adds the important behavioral note about using async to avoid timeouts, which aligns with the annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the purpose, and includes necessary details like async usage and data sources. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (predictive model, multiple data sources), the description covers inputs, outputs, async behavior, and data provenance. The presence of an output schema reduces the need to describe return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 33% (only async has description). The description explains async's purpose and briefly mentions service_components and time_window_hours implicitly through the input description, but does not detail their format or defaults beyond what is in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool predicts SLO breaches 24 hours in advance, using specific data sources like MITRE ATT&CK and GitHub Advisories. This distinguishes it from sibling tools, which cover different domains (e.g., abm_architect, fraud_detector).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies the target user (CTO), the input (critical components and reliability thresholds), and the async option for avoiding timeouts. However, it does not explicitly state when not to use this tool compared to similar prediction tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

strategic_options_analyzerB

Read-only

Inspect

Analyseur d'options stratégiques — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: Aircall — 5 options stratégiques post-Série D (2023-2024). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`optionHypotheses`	Yes
`strategicContext`	Yes
`founderConstraints`	Yes

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint=true, indicating no side effects, and openWorldHint=true, suggesting external data access. The description adds that inputs are validated server-side and returns a deliverable, but does not mention auth needs, rate limits, or other behaviors. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (three sentences) and front-loaded with the purpose. The reference case adds context, though the mixed French/English might confuse some agents. Overall, it is efficient but could be more structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is provided, and the description only says 'returns a structured, audited deliverable' without specifying format or content. Given the tool's complexity (5 parameters, nested objects), the description lacks details on what the agent can expect from the output or how to use the result.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, meaning most parameters lack descriptions. The description does not elaborate on any parameters beyond 'send the documented case fields,' providing no additional meaning. For a tool with complex nested objects, this is insufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it analyzes strategic options and returns a structured, audited deliverable for C-suite (CSO). It provides a specific reference case (Aircall). However, it does not explicitly differentiate from sibling tools like market_entry_strategist or growth_path_architect, which could also be used for strategic analysis.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description instructs to 'send the documented case fields' and mentions server-side validation, implying what inputs to provide. But it does not specify when to use this tool over alternatives or provide exclusions (e.g., when not to use it). The guidance is implicit rather than explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

supplier_esg_auditC

Read-only

Inspect

Audit ESG des fournisseurs — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: TechCorp — Audit ESG fournisseurs 2025 (5 fournisseurs, €1.37M spend). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`suppliers`	Yes
`targetScore`	No
`auditCriteria`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the tool's non-destructive nature is known. The description adds that inputs are validated server-side and that a deliverable is returned, but does not elaborate on other behavioral aspects like rate limits or result availability.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is short with 4 sentences, but the first sentence is somewhat vague and includes a subtitle. The reference case adds context but could be condensed. Overall acceptable but not optimally concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, no output schema, many siblings), the description lacks information about the return format, scoring, or how to interpret results. It does not address the async polling mechanism or success criteria.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%. The description does not explain the parameters beyond the schema's minimal descriptions. The 'async' parameter is documented in the schema, but nested objects like 'company' and 'suppliers' have no additional guidance despite being complex.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs a supplier ESG audit and returns a structured deliverable. However, it does not differentiate from similar sibling tools like 'esg_audit_multi' or 'vendor_esg_blacklist_monitor'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description mentions server-side validation and to send documented case fields, but lacks context on prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

supply_chain_fx_exposure_dashboardA

Read-onlyIdempotent

Inspect

Provides real-time foreign exchange exposure dashboard for supply chain monitoring. Designed for COO persona to track currency risk across suppliers and regions. Inputs include supplier IDs, base currency, and target currencies. Outputs structured FX exposure data with risk indicators, exchange rates, and supplier impact analysis sourced from World Bank LPI and live FX rate APIs.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`supplierIds`	No	List of supplier identifiers to analyze
`baseCurrency`	Yes	Base currency code (ISO 4217) for exposure calculation
`riskThreshold`	No	Percentage threshold for high-risk exposure flagging
`targetCurrencies`	Yes	Target currency codes (ISO 4217) to compare against base

Output Schema

ParametersJSON Schema

Name	Required	Description
`data`	No
`status`	Yes
`sources`	No
`warnings`	No

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, open-world, and idempotent behavior. The description adds value by specifying real-time data, structured outputs, and data sources (World Bank LPI and live FX rate APIs), providing useful context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loaded with the primary purpose, and every sentence adds value. No superfluous information, making it highly efficient for an AI agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 parameters, output schema exists), the description covers the persona, inputs, outputs, and data sources comprehensively. It provides enough context for an agent to understand what the tool does and what to expect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with each parameter described in the schema. The description mentions key inputs (supplier IDs, base currency, target currencies) but does not add new meaning beyond the schema's descriptions, so baseline score is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides a real-time FX exposure dashboard for supply chain monitoring, with specific outputs (risk indicators, exchange rates, supplier impact analysis). It distinguishes itself from siblings like fx_rate and working_capital_fx_hedge_optimizer by specifying its scope and intended use case.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states it is designed for the COO persona to track currency risk across suppliers and regions, providing clear context. However, it does not explicitly list when not to use it or mention alternatives, which would have elevated the score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sustainability_reportC

Read-only

Inspect

Rapport de durabilité — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: GreenLoop Solutions — rapport durabilité B-Corp 2025 (95 FTE, €18M CA). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`pillars`	Yes
`stakeholders`	Yes
`targetLabels`	No
`existingLabels`	No
`audienceProfile`	Yes

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true but description mentions 'Returns a structured, audited deliverable' which could imply creation. No additional behavioral details beyond 'Inputs are validated server-side'. No disclosure of side effects, auth needs, or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is a single paragraph of 4 sentences, including a reference case. It is front-loaded with purpose but could be more concise. No wasted words but room for improvement.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is provided, and the description only says 'structured, audited deliverable' without detailing return format. Async behavior is mentioned only in schema for async parameter, not in description. Incomplete for a tool with 8 parameters including nested objects.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is very low (13%). The description says 'send the documented case fields' but does not explain parameters. Schema descriptions for some fields exist but description adds no semantic value for the many undocumented parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool produces a sustainability report: 'Rapport de durabilité — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable.' It distinguishes from siblings like 'sustainability_reporting_pilot' by specifying a full audited deliverable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives (e.g., 'sustainability_reporting_pilot' or 'action_plan_esg'). The description does not mention prerequisites or context for using this tool over others.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sustainability_reporting_pilotC

Read-only

Inspect

Pilote de reporting durabilité — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: AlphaTech Industries SAS — premier rapport CSRD wave 2 (exercice 2025). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`dataInputs`	Yes
`materiality`	Yes
`targetFrameworks`	Yes

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint: true and openWorldHint: true. The description adds that the tool returns a structured audited deliverable and that inputs are validated server-side. It does not contradict annotations but also does not disclose significant extra behavioral traits like performance or special limitations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (one sentence plus a reference case) but includes marketing jargon like 'Gapup agent-payable C-suite expertise (RISK)' that adds little value. It is concise but lacks structured information; every sentence should earn its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 6 parameters, 4 required, nested objects, and no output schema, the description is minimal. It does not describe the deliverable's structure, how to handle the async parameter, or what the output contains. The reference case is illustrative but does not complete the picture.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (17%), and the description does not explain any parameters. It only generically says 'send the documented case fields'. Without parameter details, the agent cannot effectively construct inputs beyond the schema's minimal information.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states 'Pilote de reporting durabilité' and 'Returns a structured, audited deliverable', indicating it produces a sustainability report. However, it does not differentiate from a sibling tool 'sustainability_report' nor specify the exact scope or format of the deliverable. The purpose is somewhat clear but lacks precision.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'Inputs are validated server-side — send the documented case fields', implying correct fields should be provided. There is no explicit guidance on when to use this tool versus alternatives, no context on prerequisites or exclusions. Usage context is only implied.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

syndicated_loan_covenant_breach_alertA

Read-onlyIdempotent

Inspect

Monitors syndicated loan covenants for potential breaches by analyzing Tradeweb market data. Designed for CFOs to proactively identify financial compliance risks in loan agreements. Accepts loan identifiers, covenant thresholds, and reporting period as inputs. Returns structured breach alerts with market context and severity indicators.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`loanId`	Yes	Unique identifier for the syndicated loan
`currency`	No	ISO currency code for financial values
`reportingPeriod`	Yes	Time period for covenant compliance check
`covenantThresholds`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`breaches`	No
`warnings`	No

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, so the description adds value by stating it analyzes Tradeweb market data and returns structured breach alerts with severity indicators. It discloses behavioral context beyond annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences efficiently cover purpose, audience, inputs, and output. No extraneous words. Front-loaded with the core action. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested object, multiple parameters, output schema exists), the description is adequate but misses details like async behavior and interpretation of severity indicators. The output schema handles return values, so completeness is acceptable but not thorough.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is high (80%). The description adds context by listing input categories (loan identifiers, covenant thresholds, reporting period) but does not elaborate on specific parameters like 'currency' or the properties inside 'covenantThresholds'. The schema itself provides most parameter meaning, so the description adds marginal value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool monitors syndicated loan covenants for breach alerts using Tradeweb data. It specifies inputs and output type. However, it does not explicitly differentiate from similar sibling tools like 'bond_covenant_monitor' or 'syndicated_loan_pricing_benchmark', leaving slight ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for CFOs to identify compliance risks but does not specify when to use this tool versus alternatives, nor does it provide any prerequisites or exclusions. It lacks explicit guidance on when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

syndicated_loan_pricing_benchmarkA

Read-onlyIdempotent

Inspect

Provides CFOs with peer benchmarking for syndicated loan pricing by comparing current loan terms against market data from Tradeweb and FRED. Inputs include loan amount, tenor, credit rating, and currency. Outputs structured pricing benchmarks with spread, yield, and fee comparisons. Ideal for quick validation of loan competitiveness or negotiation preparation.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`tenor`	Yes	Loan tenor (e.g., '5Y', '3Y')
`region`	No	Region for benchmarking (e.g., 'US', 'EU')
`currency`	Yes	Currency code (e.g., 'USD', 'EUR')
`loanAmount`	Yes	Loan amount in millions
`creditRating`	Yes	Borrower credit rating (e.g., 'BBB', 'BB+')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`benchmarks`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint, openWorldHint, and idempotentHint. The description adds value by specifying data sources (Tradeweb and FRED) and outputs (spread, yield, fee comparisons), beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (three sentences) and front-loaded with the core purpose. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has an output schema and annotations; the description covers purpose, inputs, outputs, and use case. It omits mentioning the 'region' and 'async' parameters, but given the rich context from schema and annotations, it is mostly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are fully documented. The description lists the inputs but does not add new semantic meaning beyond the schema descriptions. Baseline score of 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: providing CFOs with peer benchmarking for syndicated loan pricing by comparing terms against market data from Tradeweb and FRED. It specifies the verb (benchmarking), resource (syndicated loan pricing), and distinguishes from siblings like 'syndicated_loan_covenant_breach_alert' and 'pricing_in_deal'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for use: 'Ideal for quick validation of loan competitiveness or negotiation preparation.' However, it lacks explicit guidance on when not to use it or alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

talent_contract_risk_mapperA

Read-onlyIdempotent

Inspect

For CHROs: analyzes employee contracts for non-compete, IP assignment, and confidentiality clauses, comparing against state labor laws and jurisdiction-specific precedents. Returns risk levels, conflicting statutes, and suggested revisions. Uses USPTO PatFT, CourtListener, and EUR-Lex for legal cross-referencing. Ideal for contract reviews, compliance audits, or policy updates.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`jurisdiction`	Yes	State or country jurisdiction (e.g., 'California', 'Germany')
`contract_text`	Yes	Full text of the employee contract or clause section to analyze
`employee_role`	No	Job title or role classification (e.g., 'Software Engineer', 'Executive')
`effective_date`	No	Contract effective date (YYYY-MM-DD)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`risk_summary`	No
`suggested_revisions`	No
`conflicting_statutes`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint, openWorldHint) are reinforced by description's mention of cross-referencing legal databases and returning suggestions. Does not contradict annotations; adds behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with audience ('For CHROs'), no fluff. Efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, usage, and key behaviors. Output schema exists, so return values are documented elsewhere. Slightly lacking on potential limitations or scope, but overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% description coverage, so baseline is 3. Description mentions contract_text and jurisdiction but adds no meaning beyond schema for the other params.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it analyzes employee contracts for specific clauses (non-compete, IP assignment, confidentiality) against laws, returning risk levels and revisions. Distinguishes from siblings like 'talent_intelligence' and 'contract_risk_scanner' by focusing on legal compliance for CHROs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Targets CHROs and recommends use for contract reviews, compliance audits, policy updates. Lacks explicit when-not-to-use or direct alternatives, but context is clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

talent_intelligenceA

Read-only

Inspect

HR tech intelligence for CHROs, recruiters, VC teams, comp & benefits leads and workforce planners. Four modes powered by ESCO, O*NET, BLS OES and crowd-sourced salary data:

• salary_benchmark — cash-only salary medians (p25/median/p75) for 54+ roles across US/EU/Asia. Covers tech, finance, compliance, healthcare, marketing, ops and C-suite. Data from BLS OES, Levels.fyi and StackOverflow Developer Survey 2024. • skills_taxonomy — maps a skill to its ESCO URI, O*NET codes, skill type (hard/soft/knowledge/cert), 8 related skills with similarity scores and typical roles. • job_market_trends — YoY growth %, open positions estimate, top employers and leading skills per job category × country. Static 2024 data with BLS baseline fallback. • adjacent_roles — up to 6 roles adjacent to a source role with ESCO taxonomy adjacency: similarity score, salary delta % and skills overlap %.

All salary data is cash-only (excludes equity/RSU/bonus). Cache TTL: 24h (stable labour market data). Optional env ONET_API_KEY for authenticated O*NET lookups (free registration at onetcenter.org).

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analysis mode: salary_benchmark=compensation data, skills_taxonomy=ESCO/O*NET mapping, job_market_trends=market growth and demand, adjacent_roles=career path recommendations.
`role`	No	Job title (required for salary_benchmark, job_market_trends, adjacent_roles). Examples: "Senior Software Engineer", "Compliance Officer", "Data Scientist", "CFO".
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`skill`	No	Skill to classify (required for skills_taxonomy mode). Examples: "Python", "transformer architecture", "GDPR", "Kubernetes", "leadership".
`country`	No	ISO 2-letter country code. Default: US. Examples: US, FR, DE, GB, SG.
`seniority`	No	Seniority level. Default: senior. Affects salary benchmark ranges.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`sources`	Yes
`quality_score`	Yes
`adjacent_roles`	No
`skills_taxonomy`	No
`salary_benchmark`	No
`job_market_trends`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. Description adds behavioral details: data is cash-only (excludes equity), has a 24h cache TTL, and requires an optional ONET_API_KEY for authenticated lookups. This provides useful context beyond annotations, though async behavior is described only in parameter schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is front-loaded with the overall purpose ('HR tech intelligence') and structured with bullet points for each mode. It is dense but not overly verbose; each sentence adds value (e.g., data sources, cache TTL). Could be slightly more concise by merging the optional API key note into a single sentence.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 modes, 6 parameters) and presence of an output schema, the description is thorough. It covers data sources (ESCO, O*NET, BLS, crowd-sourced), limitations (cash-only, static 2024 data), caching, and optional API key. This makes the tool's behavior and data understand without needing to infer from output.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema covers all parameters with descriptions, achieving 100% coverage. The description explains the four modes (mapping to the 'mode' parameter) and data sources but does not add significant semantic detail beyond what's in the schema. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly identifies the tool as HR tech intelligence with four distinct modes (salary_benchmark, skills_taxonomy, job_market_trends, adjacent_roles), each with specific data sources and use cases. This differentiates it from sibling tools like 'onboarding_salaries' or 'job_postings_intelligence' by emphasizing its multi-mode, comprehensive coverage.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description provides clear context for each mode's purpose (e.g., 'salary_benchmark — cash-only salary medians'), but lacks explicit guidance on when to use this tool over similar siblings. It mentions cache TTL and data stability, implying suitability for periodic data needs, but no direct comparisons or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

talent_legal_dashboardA

Read-onlyIdempotent

Inspect

Generates a real-time legal risk dashboard for CHROs, covering contracts, intellectual property, and labor law compliance. Inputs include jurisdiction, employee count, and risk thresholds; outputs include risk scores, actionable alerts, and source citations. Ideal for proactive legal risk management and compliance monitoring. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`includeIP`	No
`jurisdiction`	Yes
`employeeCount`	Yes
`riskThreshold`	No
`includeLaborLaw`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`alerts`	No
`status`	Yes
`sources`	No
`warnings`	No
`riskScore`	No
`lastUpdated`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, and idempotentHint, so the description's behavioral burden is lower. It adds the async timeout note ('Pass async:true to avoid timeout'), which is valuable beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, then inputs/outputs, then ideal use. No redundancy or wasted words. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, inputs, outputs, use case, and async recommendation. With an output schema present, return values are handled. It lacks differentiation from similar tools but is otherwise sufficient for a read-only dashboard with good annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (17%), so the description should compensate. It lists jurisdiction, employee count, and risk thresholds, but omits includeIP, includeLaborLaw, and async. The schema itself describes async well, but the description adds only marginal value for other params. A 3 is appropriate given the gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates a real-time legal risk dashboard for CHROs, covering contracts, IP, and labor law. The verb 'Generates' and specific resources (dashboard, contracts, IP, labor law) make the purpose distinct from siblings like talent_contract_risk_mapper or talent_litigation_exposure.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions 'Ideal for proactive legal risk management and compliance monitoring,' implying use cases but lacks explicit guidance on when not to use or how it differs from alternatives. Given many similar sibling tools, more explicit differentiation would help.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

talent_litigation_exposureA

Read-onlyIdempotent

Inspect

Estimates litigation exposure risk for CHROs by analyzing past employee lawsuits, settlement amounts, and industry benchmarks. Inputs include company location, industry code, and employee count range. Returns exposure score, average settlement amounts, lawsuit frequency trends, and risk factors. Ideal for legal risk assessment, HR strategy planning, and board-level reporting. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`industry_code`	Yes	NAICS industry code (e.g., '541511' for IT services)
`employee_count`	No	Current number of employees
`lookback_years`	No	Number of years to analyze
`company_location`	Yes	State or region where company operates (e.g., 'CA', 'New York')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`warnings`	Yes
`avg_settlement`	No	Average settlement amount in USD
`exposure_score`	Yes	Normalized risk score (0-100)
`historical_trend`	No
`top_risk_factors`	No
`lawsuit_frequency`	No	Lawsuits per 1000 employees per year
`industry_benchmark`	No	Industry average exposure score

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, openWorldHint, idempotentHint) already indicate safe, read-only behavior. The description adds behavioral context by noting that the tool can be slow and suggesting async usage to avoid timeouts. It also describes the return values, which is consistent with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading the purpose and including essential details (inputs, outputs, use cases). It is concise and well-structured, though slightly redundant in listing inputs both in the first sentence and then again explicitly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown), the description covers all necessary aspects: purpose, inputs, outputs, ideal use cases, and one behavioral note (async). For a tool of this complexity (5 parameters, output schema), it is complete enough for an AI agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so all parameters have descriptions. The tool description lists the key inputs but does not add significant meaning beyond what the schema already provides. Baseline 3 is appropriate given full schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool estimates litigation exposure risk for CHROs, specifying inputs (company location, industry code, employee count range) and outputs (exposure score, average settlement amounts, etc.). It effectively distinguishes from sibling tools like 'talent_contract_risk_mapper' and 'talent_poaching_risk' by focusing on litigation exposure with specific data points.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage contexts: 'Ideal for legal risk assessment, HR strategy planning, and board-level reporting.' It also advises passing async:true to avoid timeout. However, it does not explicitly mention when not to use this tool or suggest alternatives among the many sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

talent_poaching_riskA

Read-onlyIdempotent

Inspect

Analyzes employee poaching risk for CHROs by evaluating LinkedIn profile activity (job searches, profile views) and comparing compensation against BLS benchmarks. Returns a ranked list of high-risk employees with risk scores and suggested retention actions. Ideal for proactive talent retention strategies. Keywords: employee retention, poaching risk, compensation benchmark, LinkedIn activity, CHRO analytics.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`location`	No	Geographic location filter (e.g., 'San Francisco, CA')
`department`	Yes	Department filter (e.g., 'Engineering', 'Sales')
`min_tenure_months`	No	Minimum tenure in months to include in analysis
`benchmark_job_title`	No	Specific job title for compensation benchmarking

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`risk_assessment`	No
`department_avg_risk`	No
`benchmark_comparison`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds behavioral context by detailing data sources (LinkedIn activity, BLS benchmarks) and output format (ranked list with scores and actions), which goes beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences plus keywords, front-loaded with the core function, and every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the moderate complexity, presence of an output schema, and annotations, the description covers main purpose and data sources. It mentions output format but could provide more context on why to use this over sibling tools, though it's not a major gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not add specific parameter meaning beyond what the schema already provides; it only summarizes overall function.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it analyzes employee poaching risk using LinkedIn activity and compensation benchmarks, returning a ranked list with risk scores and retention actions. It is specific and distinguishes from sibling tools by focusing on poaching risk.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description identifies the target user (CHROs) and ideal use case (proactive retention strategies), but does not provide explicit when-to-use or when-not-to-use guidance relative to alternative tools like talent_intelligence.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tariff_arbitrage_finderA

Read-onlyIdempotent

Inspect

As a COO, identify tariff reclassification opportunities to reduce import costs. Analyzes product HS codes against WTO TFA and USA Trade Online data to find lower-duty classifications. Inputs: product description, current HS code, country of origin, and annual import volume. Outputs: potential duty savings, alternative HS codes, and compliance considerations.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`annualVolume`	No
`currentHsCode`	Yes
`countryOfOrigin`	Yes
`currentDutyRate`	No
`productDescription`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`opportunities`	No

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, openWorldHint=true, idempotentHint=true, covering safety and idempotency. The description adds that it analyzes WTO TFA and USA Trade Online data, but does not reveal any additional behavioral traits beyond what annotations provide. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with purpose and context, then specifics. No redundant words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the many sibling tools, the description covers the core functionality, data sources, and outputs. With an output schema present, return values don't need detailing. It could mention limitations (e.g., US-focused data) or async behavior, but overall it's adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (17%, only async has a description). The description compensates by listing 4 of 6 parameters and their purpose (product description, HS code, country, annual volume). It also explains the outputs, helping understand parameter usage. However, it omits currentDutyRate and async, which could reduce clarity.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: identify tariff reclassification opportunities to reduce import costs. It specifies the target user (COO), data sources (WTO TFA, USA Trade Online), and outputs. This distinguishes it from siblings like tariff_impact_simulator or trade_finance_eligibility.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for reducing import costs via reclassification, but does not explicitly contrast with other tariff tools or state when not to use it. The sibling list includes several tariff-related tools, so some guidance would help, but the context is still clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tariff_impact_simulatorB

Read-onlyIdempotent

Inspect

As a COO, model how proposed tariff changes affect landed costs for imported goods. Inputs: HS code, current tariff rate, proposed tariff rate, product value, shipping cost, and country of origin. Outputs: detailed cost breakdown including new duties, taxes, and total landed cost impact. Sources include WTO TFA and US Census trade data.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`hsCode`	Yes
`productValue`	Yes
`shippingCost`	No
`countryOfOrigin`	Yes
`currentTariffRate`	Yes
`proposedTariffRate`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`costImpact`	No
`currentDuty`	No
`proposedDuty`	No
`dutyDifference`	No
`currentLandedCost`	No
`proposedLandedCost`	No
`costImpactPercentage`	No

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate the tool is read-only, open-world, and idempotent. The description adds context about modeling and data sources but does not disclose behavioral specifics like response structure, error handling, or rate limits beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, using two sentences and a list. It front-loads the key action and then enumerates inputs and outputs without superfluous text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema, the description does not need to explain return values. It covers the main use case, inputs, outputs, and data sources, providing adequate context for selecting and using the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema description coverage is low (14%), so the description must compensate. It lists most input parameters (HS code, tariff rates, product value, country, shipping cost) but does not specify formats, units, or constraints beyond what the schema provides. It adds some value but not comprehensive.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states a specific verb-modelling tariff changes-and resource-impact on landed costs. It specifies user role (COO), inputs, outputs, and data sources, making the purpose distinct from related tools like tariff_arbitrage_finder, though it does not explicitly differentiate from siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide guidance on when to use this tool versus alternatives or when not to use it. There is no mention of prerequisites, limitations, or explicit comparison to sibling tools, leaving the agent to infer usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tax_compliance_multiA

Read-onlyIdempotent

Inspect

Multi-jurisdiction tax compliance data for international SaaS, cross-border marketplaces and expat services. Five modes: (1) vat_lookup — validate EU VAT numbers live via VIES SOAP (27 EU countries) or UK VRN via HMRC; (2) sales_tax — US state sales tax rates, nexus thresholds (post-Wayfair 2018), digital goods taxability for all 50 states + DC; (3) gst — APAC GST/SST/consumption-tax rates for IN, SG, AU, NZ, MY, JP, KR, TH, ID, PH, VN with reduced rates and registration thresholds; (4) oss_ioss_eligibility — EU One-Stop-Shop and Import-OSS eligibility analysis (EUR 10k OSS threshold, EUR 150 IOSS per-consignment); (5) transfer_pricing_benchmark — OECD/JTPF operating-margin benchmarks by industry and country (20+ sectors, country-specific adjustments). Returns P0/P1/P2 compliance signals: P0=invalid VAT used for zero-rating, P1=taxable digital goods detected/audit risk, P2=filing deadlines/nexus alerts. Keyless — no API key required. Optional env: HMRC_VAT_API_KEY for UK VAT live validation. Cache TTL 24h.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Tax mode to invoke.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Mode-specific query: vat_lookup -> VAT number with country prefix (e.g. 'FR40303265045'); sales_tax -> US state code or name (e.g. 'CA', 'California'); gst -> ISO country code (e.g. 'SG', 'IN', 'AU'); oss_ioss_eligibility -> annual EU B2C revenue in EUR or keyword (e.g. '5000', 'below'); transfer_pricing_benchmark -> industry name (e.g. 'manufacturing', 'saas', 'r&d').
`country`	No	ISO 3166-1 alpha-2 country code. Required for gst when query is ambiguous. Used in transfer_pricing_benchmark for country-specific OECD adjustments.
`transaction_type`	No	Transaction type for signal generation. 'digital' triggers GST/sales-tax digital goods warnings.

Output Schema

ParametersJSON Schema

Name	Required	Description
`gst`	No
`mode`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`oss_ioss`	No
`sales_tax`	No
`vat_lookup`	No
`quality_score`	Yes
`transfer_pricing`	No

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description matches annotations (readOnlyHint, idempotentHint, destructiveHint false). It discloses data sources (VIES SOAP, HMRC), keyless access, optional env var for UK VAT, cache TTL 24h, and async mode. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is thorough but efficiently structured with numbered modes and clear headings. While long, every sentence adds information. Minor improvement possible by grouping modes more compactly, but overall well-organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 modes, global coverage, async, caching, optional auth), the description covers all necessary aspects: mode selection, query format, country parameter, transaction type, output signals (P0/P1/P2), and performance notes. Output schema exists, so return values are handled. Complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with detailed descriptions for all 5 parameters. The description adds value beyond schema by providing concrete query examples (e.g., 'FR40303265045' for VAT), mode-specific nuances, and transaction type implications. This greatly aids correct invocation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides multi-jurisdiction tax compliance data for SaaS, marketplaces, and expat services, and explicitly describes five distinct modes (vat_lookup, sales_tax, gst, oss_ioss_eligibility, transfer_pricing_benchmark). This specificity differentiates it from sibling tools like tax_optimization or other compliance tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides detailed usage context: international SaaS, cross-border marketplaces, expat services, and explains each mode's target (e.g., EU VAT, US sales tax, APAC GST). It does not explicitly state when not to use, but the mode-based structure gives clear direction. Compared to siblings, it covers compliance rather than optimization.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tax_optimizationB

Read-only

Inspect

Optimisation fiscale — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Pennylane — Fiscalité optimisée · CIR €1.2M · IP Box France 10% · Économie totale €2.4M/an. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`ipAssets`	No
`activities`	Yes
`financials`	Yes
`jurisdictions`	Yes
`currentTaxOptimizations`	No

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, and the description confirms it returns a deliverable (not mutating data). It adds that inputs are validated server-side, but lacks details on permissions, rate limits, or what 'audited' entails. With annotations covering the core behavioral trait, the description provides marginal extra context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, relatively concise. It front-loads the main purpose (tax optimization) and ends with a reference case and instruction. The first sentence could be clearer ('Gapup agent-payable C-suite expertise' is jargon), but overall it is not verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite complex nested input (7 parameters, including objects), the description does not specify the return format or structure of the 'structured, audited deliverable.' No output schema exists, so the agent lacks information about what to expect from the tool's response. The reference case gives numbers but not a schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 14% (only async has a description). The description does not explain the purpose or structure of any parameter (company, financials, jurisdictions, etc.). It simply says 'send the documented case fields' without clarifying what those fields are, failing to compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs tax optimization and returns a structured, audited deliverable. The reference case with specific numbers (CIR €1.2M, IP Box 10%, total savings €2.4M/year) provides concrete examples. It distinguishes from sibling tools like tax_compliance_multi which focus on compliance rather than optimization.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives (e.g., tax_compliance_multi, ma_tax_efficiency_mapper). The instruction to 'send the documented case fields' is vague and does not specify prerequisites or scenarios where this tool is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

term_sheet_negotiationC

Read-only

Inspect

Négociation term sheet — Gapup agent-payable C-suite expertise (FUNDRAISING). Returns a structured, audited deliverable. Reference case: Agicap Série C €50M — 8 clauses analysées · 3 rouges · Score fondateur 62/100 → plan pour 81. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`round`	Yes
`company`	Yes
`termSheetClauses`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, so the description adds some context by stating it 'Returns a structured, audited deliverable' and that inputs are validated server-side. However, it doesn't elaborate on the auditing process or any other behavioral traits. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise (one sentence plus a reference case), but includes jargon ('Gapup agent-payable C-suite expertise') and French terms that may reduce clarity. It is front-loaded with the purpose but contains some unnecessary details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of the input schema (nested objects, no output schema), the description is incomplete. It does not explain the return format beyond 'structured, audited deliverable', nor does it provide typical usage examples or expected outputs. The low schema coverage and lack of output schema make this description insufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (only 'async' is described). The description does not explain any of the parameters ('company', 'round', 'termSheetClauses') beyond saying 'send the documented case fields'. For a tool with nested objects and required fields, this is severely lacking.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is for 'Négociation term sheet' with expertise in fundraising, and mentions it returns a structured audited deliverable. The reference case gives concrete context. However, it does not explicitly differentiate from sibling tools like 'deal_coach' or 'capital_strategy', but the name is self-explanatory enough.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives. The description implies it's for fundraising term sheet analysis, but doesn't mention when not to use it or provide any selection criteria. The agent would have to infer usage from the name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tool_recommendA

Read-only

Inspect

Cross-tool recommendation system: given a free-text intent, returns the most appropriate tools from the 170+ Gapup MCP catalogue, ranked by confidence, with pre-filled input suggestions and an optimal multi-tool chain when applicable. Use this first when you are unsure which tool to call — it navigates the full catalogue for you. Supports 15+ static pre-designed chains for frequent intents (M&A due diligence, sanctions screening, ESG 360, AI Act compliance, FTO patent clearance, crypto wallet tracking, etc.). Domains: compliance | finance | intel | legal | content | data | trade | infra. Pure compute — $0.01/call, no external fetch. Ideal as a first call in any multi-step agent workflow.

ParametersJSON Schema

Name	Required	Description
`lang`	No	Optional ISO 639-1 language hint (fr, en, de, zh, es …). Used for language-aware boosting.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`domain`	No	Optional domain hint to boost tools in this category.
`intent`	Yes	Free-text description of what you want to accomplish. E.g. 'Run a full M&A due diligence on Acme Corp' or 'Je veux vérifier qu'un fournisseur n'est pas sous sanctions OFAC'. FR/EN/DE/ZH supported.
`max_results`	No	Max number of recommendations returned (1-10). Default 5.
`include_chain`	No	Whether to include a suggested_chain of tools in the optimal sequence. Default true. Chain is always included for well-known intents (M&A, compliance, ESG, etc.).

Output Schema

ParametersJSON Schema

Name	Required	Description
`intent`	Yes
`status`	Yes
`sources`	No
`not_covered`	No
`quality_score`	Yes
`recommendations`	Yes
`suggested_chain`	No
`alternative_paths`	No

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds 'Pure compute — $0.01/call, no external fetch' beyond the readOnlyHint and openWorldHint annotations, providing cost and execution context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three paragraphs are slightly longer but front-loaded with the main purpose. Contains useful details without excessive verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Output schema exists so return values are covered. Description includes intent, domains, chain feature, cost, and ideal usage context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds value by explaining language support, pre-filled suggestions, and chain inclusion behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it is a cross-tool recommendation system that returns appropriate tools based on free-text intent. It distinguishes from siblings by being a meta-tool that navigates the full catalogue.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use this first when you are unsure which tool to call' and 'Ideal as a first call in any multi-step agent workflow', providing clear when-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

trade_finance_eligibilityA

Read-onlyIdempotent

Inspect

Evaluates trade finance eligibility for CFOs by analyzing counterparty risk and jurisdiction using World Bank and BIS data. Inputs include counterparty country code (ISO 3166-1 alpha-3) and industry sector. Returns risk scores, eligibility flags, and financing terms. Ideal for assessing letters of credit, export credit agency guarantees, and other trade finance instruments. Keywords: trade finance, counterparty risk, jurisdiction risk, letters of credit, ECA guarantees.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`industrySector`	Yes
`annualTradeVolumeUSD`	No
`counterpartyCountryCode`	Yes
`counterpartyCreditRating`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`eligibility`	No
`financingTerms`	No
`countryRiskScore`	No
`maxFinancingAmountUSD`	No
`recommendedInstruments`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, and idempotentHint, so the bar is lower. The description adds value by mentioning data sources (World Bank, BIS) and return types, but does not disclose behavioral traits like latency or rate limits. It is consistent with annotations and provides additional context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the core action, inputs, returns, and use case. It includes relevant keywords and has no redundancy or wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 5 parameters (2 required), an async option, and an output schema, the description explains the purpose, data sources, returns, and typical use. It does not cover the async parameter or potential errors, but the presence of an output schema reduces the need to detail return values. Overall, it is moderately complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (2 out of 5 parameters are mentioned in description). The description mentions counterpartyCountryCode and industrySector, adding format context for country code (ISO 3166-1 alpha-3), but entirely omits annualTradeVolumeUSD, counterpartyCreditRating, and async. With low coverage, the description should compensate more thoroughly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool evaluates trade finance eligibility for CFOs by analyzing counterparty risk and jurisdiction using World Bank and BIS data. It specifies the return types (risk scores, eligibility flags, financing terms) and typical use cases (letters of credit, ECA guarantees), distinguishing it from sibling tools that may focus on different aspects like ESG or preferences.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'Ideal for assessing letters of credit, export credit agency guarantees, and other trade finance instruments,' providing some context on when to use. However, it does not explicitly state when not to use this tool or mention alternatives, leaving usage guidance implicit rather than explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

transcribe_chapterize_mediaA

Read-onlyIdempotent

Inspect

Transcription and chapterization of long-form media (YouTube, podcasts, direct audio/video) for content marketing teams, podcast publishers, edu tech, journalists and accessibility/compliance.

Pipeline: • YouTube → timedtext captions (keyless) + oEmbed metadata + native timecode chapters from description • Podcast RSS → episode description + duration + timecodes if embedded in show notes • Direct media → partial (requires Whisper API via OPENAI_API_KEY + force_whisper:true) • Chapters: native YouTube timecodes preferred; heuristic TF-IDF segmentation as fallback • Summary: extractive TF-IDF top-sentences (no LLM required) • Language detection: character-set heuristic (CJK→zh, kana→ja, hangul→ko, accents→fr/de/es)

Output formats: json (full structured object) | text (plain transcript) | srt | vtt

SLA: ≤15s budget total. Cache: 24h TTL.

ParametersJSON Schema

Name	Required	Description
`url`	Yes	YouTube URL, podcast RSS feed URL, or direct MP3/MP4 URL. Example: "https://www.youtube.com/watch?v=jNQXAC9IVRw"
`lang`	No	ISO 639-1 language hint (e.g. "en", "fr", "de"). Default "auto".
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`chapters_max`	No	Maximum number of chapters. Default 8.
`output_format`	No	Transcript format. Default "json".
`include_summary`	No	Include extractive summary. Default true.

Output Schema

ParametersJSON Schema

Name	Required	Description
`url`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`summary`	No
`chapters`	Yes
`segments`	Yes
`key_topics`	Yes
`transcript`	Yes
`source_type`	Yes
`lang_detected`	Yes
`quality_score`	Yes
`duration_seconds`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read-only, idempotent, non-destructive. Description adds details: cache TTL (24h), SLA (≤15s), chapterization methods (native vs heuristic), summary extraction (TF-IDF, no LLM), language detection heuristic. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is well-structured with headings (Pipeline, Output formats, SLA) and bullet points. It is detailed but not overly verbose. Front-loaded with main purpose. Some details like language detection heuristic could be considered extra but are relevant.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Description covers inputs (URL, optional parameters), process (pipeline for each source type, chapterization method, summary, language detection), outputs (formats json/text/srt/vtt), and performance constraints (SLA, caching). With an output schema present, return values are covered. Comprehensive for such a complex tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage for all 6 parameters. The description does not add significant extra meaning beyond the schema, but it provides context on how the URL parameter interacts with the pipeline. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Tool name 'transcribe_chapterize_media' and description clearly state it performs transcription and chapterization of long-form media from YouTube, podcasts, and direct audio/video. It specifies the verb and resource, and its function is distinct from the many business analysis sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description provides explicit target users (content marketing, podcast publishers, edu tech, accessibility) and explains pipeline for different source types (YouTube, RSS, direct). It does not explicitly state when not to use, but the coverage of alternatives is implied, e.g., need for Whisper API for direct media.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

treasury_optimizerC

Read-only

Inspect

Optimiseur de trésorerie — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Alan — Trésorerie €380M post-Série F · Allocation optimale 4 instruments · Yield +145bp · +€5.5M/an. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`horizon`	No
`constraints`	Yes
`cashPosition`	Yes

Tool Definition Quality

C2.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description's mention of returning a deliverable aligns. It adds server-side validation context but does not detail behavioral traits like side effects or response structure. The description does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (three sentences) and front-loaded with the tool's purpose. However, the case study adds noise and could be omitted for clarity. Still, it is reasonably concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 parameters, nested objects, no output schema), the description is insufficient. It does not explain the deliverable's structure, how to populate nested fields, or what the output contains. The annotations and schema partly compensate, but the description lacks operational context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only 'async' has a description). The tool description does not explain any parameters beyond 'send the documented case fields'. With low coverage, the description must compensate but fails to add meaning, leaving 4 parameters with no semantic aid.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Optimiseur de trésorerie' and 'Returns a structured, audited deliverable' indicate a treasury optimization report, but the marketing language and case study obscure the core function. It lacks a clear verb+resource statement, making it less specific than needed for precise tool selection.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description does not mention scenarios, prerequisites, or exclusions. The sibling tools include financial tools like working capital optimizers, but no differentiation is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

trend_watcherA

Read-onlyIdempotent

Inspect

Monitor emerging trends, regulatory shifts and adoption signals for a given market sector. Returns 5-12 trend cards, each with a momentum score (rising/stable/declining), a 3-month and 12-month outlook, opportunity windows, and recommended actions. When to use this tool: the user asks what is heating up in a market, wants to time a product roadmap or content calendar, or needs an early read on a sector. Inputs: a sector to monitor and 3-8 keywords defining the watch perimeter. Delivered by Manue, the AI CMO of the Gapup portfolio.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No	Optional context (geography, language target, comparator window, etc.)
`sector`	Yes	Sector to monitor (e.g. 'B2B SaaS productivity', 'EU fintech', 'climate-tech hardware')
`keywords`	Yes	3-8 keywords describing the watch perimeter

Output Schema

ParametersJSON Schema

Name	Required	Description
`kpis`	No	3-5 headline KPI bubbles
`trends`	Yes	5-12 trend cards for the sector
`recommendations`	No	Prioritised strategic recommendations
`executiveSummary`	Yes	Board-ready sector overview prose

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, etc. The description adds behavioral details such as returning 5-12 trend cards with specific fields (momentum score, outlooks, actions). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is about 4 sentences, front-loaded with purpose and output details. The final sentence 'Delivered by Manue...' is unnecessary fluff but does not significantly detract from conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description adequately covers purpose, usage, inputs, and output structure. It does not discuss limitations or costs, but annotations cover safety and idempotency.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema already describes parameters thoroughly. The description adds minimal extra meaning, mainly restating 'sector' and 'keywords' as 'input a sector and 3-8 keywords defining the watch perimeter.'

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool monitors emerging trends, regulatory shifts, and adoption signals for a given market sector, with specific output details (5-12 trend cards with momentum scores and outlooks). This distinguishes it from sibling tools like market_research_brief or competitive_deep_dive.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use: 'user asks what is heating up in a market, wants to time a product roadmap or content calendar, or needs an early read on a sector.' It does not explicitly mention alternatives or when not to use, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ugc_moderation_classifierA

Read-onlyIdempotent

Inspect

Multi-language UGC content moderation for marketplaces, social platforms and comment systems. Detects policy violations in text content across 9 policies and 12 languages without external API calls.

Policies checked: • hate — hate speech, slurs, dehumanization (50+ terms × 12 languages) • sexual — explicit sexual content, pornography references, nudity solicitation • violence — threats, weapon references, graphic violence • self_harm — suicidal ideation, self-injury, eating disorder promotion • harassment — doxxing, stalking, cyberbullying, blackmail • scam — phishing, investment fraud, romance scam, lottery fraud • spam — bots, keyword stuffing, excessive caps, emoji storms, suspicious URLs • copyright — piracy, leaked content, serial keys, streaming fraud • minor_safety — grooming signals, CSAM references, minor + adult content combos

Languages: en / fr / de / es / it / pt / nl / zh / ja / ko / ar / ru (auto-detected)

Output includes severity (low/medium/high/severe), confidence (0-100), matched patterns, excerpt, recommended action, age appropriateness (adult/teen/child), and signals.

No API key required. Stateless — no content is stored or logged.

ParametersJSON Schema

Name	Required	Description
`lang`	No	Language override. If omitted, language is auto-detected.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`content`	Yes	Text content to moderate (comment, review, post, chat message).
`policies`	No	Policies to check. Default: all 9 policies.
`content_type`	No	Type of content. Affects recommended_action heuristic. Default: comment.

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`signals`	Yes
`sources`	Yes
`violations`	Yes
`lang_detected`	Yes
`quality_score`	Yes
`age_appropriate`	Yes
`content_preview`	Yes
`policies_checked`	Yes
`recommended_action`	Yes

Tool Definition Quality

A4.2/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description aligns with annotations (readOnlyHint, idempotentHint, destructiveHint) and adds valuable context: stateless, no content storage or logging, no external API calls, output details (severity, confidence, etc.). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with an opening sentence, bullet list of policies, and output summary. It is slightly lengthy but front-loaded and complete.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema, the description still provides essential context like languages, policies, statelessness, and output fields. It leaves no major gaps for an AI agent to invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description adds little beyond what parameter descriptions already provide. The listing of policies in the description is redundant with the enum values in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a multi-language UGC content moderation classifier that detects policy violations in text. It lists 9 specific policies and 12 languages, distinguishing it from sibling tools which are unrelated to moderation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states it requires no API key and is stateless, implying it should be used when privacy is a concern, but it does not explicitly guide when to use this tool versus alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

upsell_hunterC

Read-only

Inspect

Chasseur d'upsell — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Upsell 8 comptes · €127k potentiel · Top 3 : Alan+Qonto+Pennylane · Playbook 5 étapes. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`horizon`	No
`product`	Yes
`accounts`	Yes
`targetUpsellEur`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint. The description adds that it returns a 'structured, audited deliverable', consistent with read-only behavior, and mentions server-side validation. It does not introduce any behavioral traits beyond what annotations convey, nor does it contradict them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, with three sentences. It starts with the title and a reference case, which is helpful. However, the first sentence mixes French and jargon, reducing clarity. Every sentence serves a purpose, but the structure could be more logical.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has complex nested parameters and no output schema, yet the description does not explain the output format or the behavior of the 'async' parameter. The reference case provides some context but leaves significant gaps for the agent to operate effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (17%), and the tool description does not explain parameter meanings beyond saying 'send the documented case fields'. This lacks clarity for the agent, as parameters like 'company', 'product', and 'accounts' are not described in the description. The schema's minimal descriptions are insufficient on their own.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description 'Chasseur d'upsell' indicates an upsell analysis tool, and 'Returns a structured, audited deliverable' clarifies the output. However, the phrase 'Gapup agent-payable C-suite expertise (CRO)' is vague and does not precisely define the tool's function. No explicit differentiation from sibling tools is given, but the purpose is generally clear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It only states that inputs are validated server-side and to send documented case fields, which is procedural rather than contextual. No when-not-to-use or sibling comparisons are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

usdc_x402_payments_intelA

Read-only

Inspect

Real-time analytics on x402 protocol USDC micropayments for MCP endpoints on Base network. Unique competitive advantage: aggregates internal production telemetry (our own traffic data) with on-chain USDC Transfer events and Bazaar marketplace listings — data no external competitor can access. Four modes: (1) facilitator_stats — Coinbase x402 facilitator settlement statistics (volume, count, top payees/payers). Uses Coinbase CDP API if COINBASE_X402_API_KEY is set; falls back to Base mainnet RPC scan of USDC transfers to known facilitator addresses. (2) endpoint_intel — Per-MCP-endpoint analytics: tx count, USDC volume, unique callers, success rate, catalog size. For gapup-mcp.io endpoints: reads internal JSONL telemetry (richest data source, unique). (3) agent_caller_profile — Anonymous profile of a calling agent wallet: tx count, USDC spent, top endpoints, inferred persona (depth-seeker / bulk-scanner / generalist / researcher / explorer). Wallet anonymised via SHA-256. (4) price_radar — USDC price distribution by tool category (data_lookup / synthesis / compliance / competitive) from Bazaar + internal catalog. Returns median, P25, P75. Network: Base mainnet. USDC contract: 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913. Cache: 30 min LRU. Timeout per source: 8s. Optional env: COINBASE_X402_API_KEY (higher-fidelity facilitator stats).

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analytics mode: facilitator_stats=network-wide settlements \| endpoint_intel=per-URL analytics \| agent_caller_profile=per-wallet analytics \| price_radar=price distribution by category
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`category`	No	Tool category for price_radar mode. Defaults to all.
`period_days`	No	Lookback window in days (5-90, default 30)
`endpoint_url`	No	MCP endpoint URL for endpoint_intel mode (e.g. https://mcp.gapup.io/mcp)
`wallet_address`	No	EVM wallet address for agent_caller_profile mode (0x...)

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`sources`	Yes
`price_radar`	No
`quality_score`	Yes
`endpoint_intel`	No
`facilitator_stats`	No
`agent_caller_profile`	No

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond annotations by detailing data sources (internal telemetry, on-chain USDC transfers, Bazaar), caching (30 min LRU), timeouts (8s per source), fallback behavior (CDP API vs RPC scan), anonymization (SHA-256), and async mode. No contradictions with readOnlyHint=true and destructiveHint=false.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose (over 200 words) but well-structured with clear sections and bullet points for modes. Information is front-loaded with the main purpose. Could be slightly more concise, but every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (multiple modes, data sources, env dependency, async option), the description covers all essential behavioral aspects: data freshness, fallback, timeout, anonymization, and job_id handling. Output schema exists, so return values are covered elsewhere. The description is comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with descriptions for all 6 parameters. The description adds contextual meaning by explaining how parameters relate to each mode (e.g., category for price_radar, endpoint_url for endpoint_intel) and default values (period_days default 30). This significantly helps an agent understand parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states it provides real-time analytics on x402 protocol USDC micropayments for MCP endpoints on Base network, and lists four distinct modes with brief explanations. This clearly distinguishes it from any sibling tools, which are unrelated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains each mode and when to use it (e.g., 'facilitator_stats' for network-wide settlements, 'endpoint_intel' for per-URL analytics). However, it lacks explicit when-not-to-use guidance or comparisons to alternatives, though alternatives are not obvious.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vendor_esg_blacklist_monitorA

Read-onlyIdempotent

Inspect

As a COO, quickly check if a vendor is blacklisted for ESG non-compliance using CDP and GRI data. Input the vendor's legal name or identifier to receive their ESG risk score, blacklist status, and compliance violations. Returns structured data including CDP disclosure score, GRI alignment, and any regulatory flags. Ideal for vendor due diligence, risk assessment, and sustainability reporting. Keywords: ESG, vendor risk, compliance, CDP, GRI, sustainability, blacklist.

ParametersJSON Schema

Name	Required	Description
`year`	No	Reporting year (default: current year)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`vendorId`	No	Optional identifier (e.g., LEI, DUNS)
`vendorName`	Yes	Legal name of the vendor to check

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`vendorId`	No
`warnings`	Yes
`griAligned`	No
`vendorName`	Yes
`violations`	No
`blacklisted`	Yes
`esgRiskScore`	No
`cdpDisclosureScore`	No

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly, openWorld, and idempotent hints. Description adds context about using CDP and GRI data and returning structured fields, but does not disclose additional behaviors (e.g., data freshness, API limits). With good annotation coverage, the description provides moderate value beyond annotation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is three sentences plus keyword tags, well front-loaded with purpose. It is concise and avoids unnecessary detail, though the keyword block could be integrated or omitted. Still, it efficiently conveys key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of output schema and annotations, description covers core purpose, inputs, and return types. It does not explain output schema details (handled by schema) or mention rate limits, but is complete enough for a quick blacklist check tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions. Description adds minimal extra meaning by mapping 'vendorName' to 'legal name' and 'vendorId' to 'identifier' (e.g., LEI, DUNS). This clarifies usage but is not substantial beyond schema; baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool checks if a vendor is blacklisted for ESG non-compliance using CDP and GRI data, and lists specific outputs (ESG risk score, blacklist status, compliance violations). It distinguishes from similar tools by emphasizing speed and specific data sources, effectively differentiating from siblings like 'supplier_esg_audit'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description provides clear use cases ('vendor due diligence, risk assessment, sustainability reporting') and targets a specific user (COO). However, it does not explicitly state when to avoid using this tool or mention alternative tools for more comprehensive ESG checks, missing an opportunity for clearer guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vendor_esg_diversity_scannerA

Read-onlyIdempotent

Inspect

For COOs: scans vendor ESG reports to identify suppliers lacking diversity disclosures in GRI or CDP filings. Input a supplier name or identifier to receive a structured assessment of gender, ethnicity, and board diversity metrics. Returns compliance gaps, missing data flags, and source references from CDP open data and GRI standards. Ideal for vendor risk assessment and ESG compliance tracking.

ParametersJSON Schema

Name	Required	Description
`year`	No	Reporting year to check (default: current year)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`supplierId`	No	CDP or GRI identifier for the supplier (e.g., CDP company ID)
`supplierName`	Yes	Exact or partial name of the supplier to scan

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`reportLinks`	No	URLs to relevant ESG reports
`supplierName`	Yes
`complianceScore`	Yes	Percentage compliance with diversity disclosure standards
`diversityDisclosures`	Yes

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint, indicating a safe, non-destructive operation. The description adds context about returning compliance gaps and missing data flags but does not disclose additional behavioral nuances beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the target audience and core purpose. Every sentence adds value: audience, input, output, and use case. No fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, the description covers what the tool does, what it returns (compliance gaps, missing data flags, source references), and when to use it. It is complete for an agent to understand the tool's role.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 4 parameters have schema descriptions (100% coverage). The description enriches meaning by linking 'supplierName' and 'supplierId' to CDP/GRI identifiers and the scanning context, going beyond the bare schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool scans vendor ESG reports to identify suppliers lacking diversity disclosures in GRI or CDP filings, specifying the input (supplier name/identifier) and output (structured assessment with gaps and flags). It effectively distinguishes from sibling tools by focusing on diversity metrics.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions 'Ideal for vendor risk assessment and ESG compliance tracking' but does not explicitly state when not to use this tool or provide alternatives among the many related siblings (e.g., supplier_esg_audit, vendor_esg_blacklist_monitor). More explicit guidance would improve agent selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vendor_managementC

Read-only

Inspect

Gestion des fournisseurs — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Qonto (12 fournisseurs · €2.4M/an) — €290k économies identifiées · 4 renegociations prioritaires. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`vendors`	Yes
`objectives`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description aligns with annotations (readOnlyHint true, openWorldHint true) by stating it returns a deliverable. It adds the mention of server-side validation but does not disclose additional behavioral traits such as auth needs, rate limits, or what the 'audited deliverable' entails beyond annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short but includes marketing-like language ('Gapup agent-payable C-suite expertise (COO)') which adds little value. It conveys the core purpose efficiently but could be more direct and remove fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, 4 parameters, no output schema, many siblings), the description is incomplete. It lacks details on the deliverable format, return structure, and how to apply the reference case. An AI agent would lack sufficient context to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema description coverage at 25%, the description should compensate but does not. It only mentions 'send the documented case fields' without explaining the parameters or their purpose. The reference case provides an example but no semantic detail for company, vendors, or objectives.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: vendor management returning a structured, audited deliverable. The reference case adds concreteness. However, it does not explicitly distinguish from sibling vendor-related tools like vendor_risk_assessor or vendor_esg_blacklist_monitor, leaving some ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions inputs are validated server-side and to send documented case fields, but it provides no guidance on when to use this tool versus alternatives. No explicit when/when-not or context for selection among the many sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vendor_risk_assessorC

Read-only

Inspect

Évaluateur de risque fournisseurs — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Gapup Hub — 15 fournisseurs · €1.8M spend · 3 critiques · Heatmap + plan de remédiation. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`vendors`	Yes
`riskFramework`	No
`assessmentPurpose`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and openWorldHint. The description adds that inputs are validated server-side and returns a structured deliverable, but does not disclose significant behavioral traits beyond these. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences and a reference case. It is front-loaded and contains no wasted words, though it could benefit from a more structured format.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, 5 params, no output schema), the description lacks key elements. It does not explain the output format in detail, the async behavior, or the purpose of risk frameworks and assessment purposes. The reference case helps but is not comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (20%). The description does not explain individual parameters beyond the schema. The reference case provides an example of typical inputs, but this is insufficient to compensate for the lack of parameter-level guidance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a vendor risk assessor that returns a structured, audited deliverable. It uses specific terms and a reference case, but does not explicitly differentiate from sibling tools like supplier_esg_audit or vendor_management.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides minimal guidance on when to use this tool. It mentions that inputs are validated server-side and to send documented case fields, but lacks explicit when-to-use or alternative recommendations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vertical_ai_agent_governanceA

Read-onlyIdempotent

Inspect

Generates a comprehensive vertical AI agent workforce integration plan for CHROs, including governance frameworks, human-AI collaboration metrics, and upskilling recommendations. Inputs: industry vertical, workforce size, and current AI adoption level. Outputs: role-specific AI integration roadmaps, skill gap analysis, and performance benchmarks. Uses O*NET skill taxonomies and Gartner AI adoption trends. For best results with large datasets, pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`industry`	Yes
`target_roles`	No
`workforce_size`	Yes
`ai_adoption_level`	No
`include_benchmarks`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`skill_gap_analysis`	No
`integration_roadmap`	No
`collaboration_metrics`	No
`governance_recommendations`	No

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, openWorldHint=true, idempotentHint=true. The description adds no new behavioral insights beyond these annotations, though it notes a timeout concern addressed by async parameter.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, consisting of two main sentences plus a brief tip. Information is front-loaded with the primary purpose, and every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (6 parameters, output schema exists) and sibling tools, the description provides a good overview but lacks explicit guidance on when to use this tool versus alternatives. It covers main use but misses some details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 17% schema description coverage, the description adds meaning for 3 of 6 parameters (industry, workforce_size, ai_adoption_level) but omits details for target_roles, async, and include_benchmarks. It partially compensates for low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool generates a comprehensive vertical AI agent workforce integration plan for CHROs, covering governance frameworks, metrics, and upskilling. It lists specific inputs and outputs, making the purpose distinct and actionable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions inputs and suggests using async for large datasets, but does not explicitly differentiate this tool from sibling tools like ai_governance_full_report_async. Usage context is implied but not clearly stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vuln_exploitability_forecastA

Read-onlyIdempotent

Inspect

As a CTO, assess the exploitability risk of CVEs using EPSS scores and cloud asset exposure data. Input a CVE ID (e.g., CVE-2021-44228) to receive exploitability likelihood, affected cloud services, and threat intelligence context. Returns structured risk metrics for prioritization. Sources: CVE NVD, OpenCVE, GitHub Advisories. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`cveId`	Yes
`cloudProvider`	No
`includeDetails`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`cveId`	Yes
`status`	Yes
`sources`	Yes
`warnings`	Yes
`epssScore`	No
`lastUpdated`	No
`cloudExposure`	No
`epssPercentile`	No

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly, idempotent, and openWorld. The description adds value by specifying data sources (EPSS, cloud exposure, NVD, OpenCVE, GitHub Advisories) and the async behavior to avoid timeouts, which is beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with four sentences, each adding value: role, purpose, output description, sources, and async note. No wasted words, though structure could be improved with bullet points for params.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While the output schema exists and annotations cover safety, the description omits important details about optional parameters (cloudProvider, includeDetails). This could lead to incomplete understanding, especially given the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (25%) and the description only explains the cveId parameter via an example and mentions async. It does not describe cloudProvider or includeDetails, failing to compensate for the missing schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: assessing exploitability risk of CVEs using EPSS scores and cloud asset exposure data. It provides a specific verb-resource combination and distinguishes from siblings like cve_security_lookup by focusing on exploitability and cloud context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for prioritization and mentions async to avoid timeout, but does not explicitly state when to use this tool versus alternatives like cve_security_lookup or vuln_patch_priority_engine. No exclusions or context switching guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vuln_patch_priority_engineA

Read-onlyIdempotent

Inspect

As a CTO, quickly prioritize unpatched CVEs by combining exploitability scores (EPSS) with cloud asset criticality. Input a list of CVE IDs and your AWS service types (e.g., EC2, RDS) to receive a ranked patching order with risk scores and estimated cloud impact. Uses public NVD, OpenCVE, and AWS pricing data. Ideal for vulnerability management and cloud security posture improvement.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`cveIds`	Yes	List of CVE identifiers to analyze (e.g., ["CVE-2021-44228", "CVE-2023-3824"])
`maxResults`	No	Maximum number of prioritized CVEs to return (default: 10)
`awsServices`	No	AWS service types affected by these CVEs (e.g., ["EC2", "RDS", "Lambda"])

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`prioritizedCves`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate safety (readOnlyHint, idempotentHint), and the description adds value by disclosing data sources (NVD, OpenCVE, AWS pricing) and output features (risk scores, cloud impact). This provides useful context beyond the structured metadata, though it does not detail all behavioral traits (e.g., rate limits or error handling).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with purpose in three concise sentences. No waste; every sentence earns its place by explaining inputs, process, and ideal use. Structure allows quick scanning for key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (combining multiple data sources) and the presence of an output schema (handling return values), the description is adequately complete. It explains core logic and data sources, leaving output details to the schema. Slight room for improvement: mention that results can be polled if async=true, but not a major gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, so the baseline is 3. The description does not add semantic detail beyond the schema; it reiterates cveIds and awsServices but omits async or maxResults nuances. Given the high schema coverage, the description's additional value is minimal, staying at baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: prioritizing unpatched CVEs by combining EPSS exploitability scores with cloud asset criticality. It specifies inputs (CVE IDs, AWS services) and outputs (ranked patching order, risk scores, cloud impact), effectively distinguishing it from sibling tools like cve_security_lookup or vuln_exploitability_forecast.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description identifies the ideal use case ('vulnerability management and cloud security posture improvement'), implying appropriate contexts. However, it lacks explicit guidance on when not to use this tool or direct comparisons to sibling alternatives, which slightly reduces clarity for an AI agent choosing among similar tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

weather_climate_intelA

Read-only

Inspect

Physical climate intelligence for insurance underwriting, agritech, logistics, energy trading and ESG/climate risk disclosure. Three modes: (1) forecast — 14-day daily weather forecast with temperature, precipitation, wind and humidity; (2) historical — daily records and monthly aggregates for any date range since 1940, with anomaly detection (P90/P95 heat events, extreme precipitation days); (3) climate_risk — long-term physical risk scoring combining CMIP6 ensemble projections (2020-2050), altitude, FEMA flood zones (US) and historical baselines. Risk dimensions: flood, heat (days >35°C/year), drought (SPI), wildfire, sea-level. Overall score 0-100 (100 = severe). Location: city string or lat/lon coordinates. Sources: Open-Meteo (keyless, global, 1940→2050), Open-Elevation, FEMA NFHL (US), NOAA CDO (optional NOAA_API_KEY env var for US+global station data). SLA: ≤25s p95. Cache: 1h forecast / 24h historical / 7d climate_risk.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	'forecast' (14 days), 'historical' (date range since 1940), 'climate_risk' (long-term physical risk score)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`date_to`	No	ISO date YYYY-MM-DD — end of date range (required for historical/climate_risk)
`metrics`	No	Weather metrics to include. Default: all metrics.
`location`	Yes	Geographic location. Provide either {city, country?} or {lat, lon}.
`date_from`	No	ISO date YYYY-MM-DD — start of date range (required for historical/climate_risk)

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`sources`	Yes
`forecast`	No
`location`	Yes
`historical`	No
`climate_risk`	No
`quality_score`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond annotations, including data sources (Open-Meteo, FEMA, etc.), SLA (≤25s p95), cache durations (1h/24h/7d), and the async parameter behavior. Annotations already indicate read-only and non-destructive, so there is no contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with an opening purpose statement, bulleted modes, and key details like sources and SLA. It is slightly verbose but efficiently packed with information, making it easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (three modes, multiple parameters, external data sources, output schema exists), the description covers all necessary context: modes, parameters, location, metrics, async option, data sources, SLA, caching, and risk dimensions. It is complete for agent decision-making.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the description adds minimally beyond schema definitions. It reiterates that date_from/date_to are required for historical/climate_risk, which is already in the schema. The location format is clarified but not uniquely valuable.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides physical climate intelligence for specific industries and explicitly defines three distinct modes (forecast, historical, climate_risk). It distinguishes itself from sibling tools by its focus on climate data and risk scoring.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use each mode (forecast for 14-day, historical for date ranges, climate_risk for long-term risk) and mentions sources and SLA. However, it does not explicitly state when not to use the tool or refer to alternative sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

webhooks_manageAInspect

Manage HTTP webhook callbacks for async tools (T5/T6 batch flagships). Instead of polling every 5s, register a callback URL — Gapup posts the job result to your endpoint the moment it completes. Supported events: job.completed | job.failed | monitoring.alert | quota.threshold. Modes: register (add endpoint), list (view active webhooks), revoke (soft-delete), test (fire a test payload to verify your receiver), history (last 20 fires). Security: every delivery is signed with HMAC-SHA256 on the body — verify the X-Gapup-Signature header against sha256(secret, body).

ParametersJSON Schema

Name	Required	Description
`url`	No	(register) HTTPS/HTTP endpoint that will receive POST callbacks. Must return 2xx within 10s.
`mode`	Yes	register — add a webhook endpoint. list — view your active webhooks. revoke — soft-delete a webhook by webhook_id. test — fire a test payload to verify the receiver is alive. history — last 20 delivery attempts for a webhook.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`events`	No	(register, optional) Events to subscribe to. Defaults to all events if omitted.
`secret`	No	(register, optional) A secret string used to sign deliveries with HMAC-SHA256. Store it safely — verify X-Gapup-Signature header on your receiver.
`webhook_id`	No	(revoke / test / history) The webhook_id returned from register.
`caller_hash`	No	Optional caller identity override. If omitted, uses the internal session hash.

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Provides extensive behavioral details: HMAC-SHA256 signing, each mode's action, event types, timeout requirement (10s), and security header. This adds significant value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is well-structured: opens with purpose, then rational, modes, events, security. Every sentence adds value with no redundancy. Appropriate length given complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all modes, event types, security, and operational details. With output schema present (not shown but assumed), the description sufficiently equips an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Adds meaningful descriptions beyond schema: e.g., url requires HTTPS/HTTP, must return 2xx within 10s; events default to all; secret used for HMAC. Schema coverage is 100%, yet description enriches each parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it manages HTTP webhook callbacks for async tools, lists modes (register, list, revoke, test, history), and distinguishes from polling. The purpose is specific and well-defined.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly explains when to use: 'Instead of polling every 5s, register a callback URL'. Lists modes and events, providing context for usage. However, lacks explicit when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

web_search_multilangA

Read-only

Inspect

Multi-language, multi-source web search that goes beyond Anglo-centric results. Supports 15 languages (fr/de/es/it/pt/nl/ja/zh/ko/ar/ru/sv/pl/tr/en) with automatic detection. Aggregates results from Mojeek (independent search engine, multilang) and Wikipedia (native multilang API), with DDG and HN as English-language complements. Returns deduplicated results ranked by cross-engine consensus. Use when you need non-English search results, when DDG fails, or for geographically-biased queries. Phase 2 #7 of the geo/lang expansion plan. Note: Brave/Bing/Searx are blocked from DO IPs — configure AICI_RESEARCH_PROXY_URL for residential proxy.

ParametersJSON Schema

Name	Required	Description
`lang`	No	2-letter language code. If omitted, auto-detected from query characters and lexical markers.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Search query in any language
`country`	No	ISO-3166-1 alpha-2 country code for geographic bias (e.g. FR, DE, JP, BR). Optional.
`max_results`	No	Maximum number of results to return (default 10).

Output Schema

ParametersJSON Schema

Name	Required	Description
`query`	Yes
`status`	Yes
`results`	Yes
`sources`	Yes
`by_engine`	Yes
`lang_used`	Yes
`country_used`	No
`quality_score`	Yes
`total_unique_results`	Yes

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnly, openWorld), description reveals automatic language detection, aggregation from multiple engines, deduplication, cross-engine consensus ranking, and async behavior. Also notes blocked engines and proxy requirement, adding critical behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is concise (3-4 sentences) and front-loaded with purpose. Every sentence adds value: supported languages, sources, deduplication, use guidance, and configuration note. No redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 parameters, output schema exists, multiple sources), description covers use cases, behavioral traits, limitations (blocked engines), and async option. It's complete enough for an agent to understand when and how to use it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good descriptions. Description adds value by explaining auto-detection for lang parameter and async behavior. For other parameters (query, country, max_results), it doesn't add much beyond schema, but the schema already provides adequate detail.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly defines the tool as a multi-language, multi-source web search that goes beyond Anglo-centric results. It lists supported languages and sources, distinguishing it from sibling tools like web_search (likely English-only) or other search tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit guidance: 'Use when you need non-English search results, when DDG fails, or for geographically-biased queries.' Also mentions proxy configuration for blocked engines, providing clear when-to-use and when-not-to-use context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

win_loss_decoderC

Read-only

Inspect

Analyse Win/Loss deals — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Win/Loss 32 deals Q1 2026 · Win rate 41% → 68% potentiel · Playbook 8 actions CRO. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`deals`	Yes
`company`	Yes
`product`	Yes
`topCompetitors`	No
`primaryChallenge`	No
`salesCycleTargetDays`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true, which is consistent with the description of returning a deliverable. The description adds that inputs are validated server-side, but doesn't elaborate on behavioral traits like performance or scope.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief and front-loaded with the purpose. The reference case adds some context but is not essential for functionality.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (7 params, nested objects, no output schema), the description lacks detail on input formatting, return structure, and usage of async. It is not complete enough for an agent to reliably invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 14% (async), and the description does not explain the meaning of the main parameters (company, product, deals). It vaguely says 'send the documented case fields', which is insufficient for an agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes win/loss deals and returns a structured deliverable. However, it does not differentiate from sibling tools like 'deal_coach' or 'battle_plan', lacking specific positioning.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use vs. alternatives. The marketing language and reference case do not provide practical context for tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

workflow_orchestratorA

Read-only

Inspect

Meta-tool that CHAINS multiple MCP tools sequentially into a named workflow — delivering a composite output in a single call. 10 predefined workflows: compliance_full_audit (6 steps: KYC+sanctions+AI_gov+privacy+ESRS+CSRD), deal_due_diligence (7 steps: deep_dive+registry+court+patents+KYC+financials+M&A), market_entry_brief (6 steps: country_study+regulations+procurement+tax+AGOA+market_brief), competitor_intelligence_pack (5 steps: deep_dive+intel+patents+earnings+pitch_deck), esg_360 (5 steps: ESG_audit+carbon+CSRD+ESRS+supplier_esg), ip_freedom_to_operate (4 steps: patent_search+async_deep+IP_audit+competitive), climate_property_assessment (3 steps: climate_risk+real_estate+geo), pharma_target_screen (4 steps: trials+adverse_events+patents+meta_analysis), sanctions_360 (5 steps: KYC+Russian_sec+registry+crypto_wallet+court_filings), talent_market_brief (4 steps: salary+trends+adjacent_roles+skills_taxonomy). Returns steps_executed, consolidated P0/P1/P2 signals, overall_status, estimated_cost_usd, and raw outputs per step. Cache: 1h LRU per (workflow, target). Budget: 60s global timeout → partial if exceeded. Use when an agent needs a composite liverable without orchestrating tools manually.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`params`	No	Optional overrides passed to sub-tools. Keys depend on workflow (e.g., country, sector, role, drug, technology, wallet_address, acquirer).
`target`	Yes	The entity to analyze. A company name for most workflows; location for climate_property_assessment; role+country for talent_market_brief.
`workflow`	Yes	Named workflow to execute. Each workflow chains 3-7 tools sequentially.
`skip_failed_steps`	No	Default true: continue on step failure. Set false to abort on first error.

Output Schema

ParametersJSON Schema

Name	Required	Description
`target`	Yes
`outputs`	Yes
`summary`	Yes
`workflow`	Yes
`overall_status`	Yes
`steps_executed`	Yes
`total_duration_ms`	Yes
`estimated_cost_usd`	Yes
`consolidated_signals`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint true; description adds significant behavioral context: reveals 60s global timeout with partial results, 1h LRU cache, async option with job polling, and skip_failed_steps behavior. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is dense but well-structured: begins with purpose, lists workflows with steps, then returns, cache, timeout, and usage. Every sentence adds value; could be slightly more concise but effective for a complex meta-tool.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the high complexity (10 workflows, 5 parameters, meta-tool behavior), description covers all critical aspects: composite output, cache policy, timeout handling, async option, error continuation, and parameter semantics. Output schema exists, so return details are not required. Complete for agent decision-making.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, but description adds value by explaining `async` polling mechanism, `params` override flexibility, `target` entity interpretation per workflow, and `skip_failed_steps` default behavior. Enhances understanding beyond schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly identifies the tool as a meta-tool that chains multiple MCP tools into named workflows with composite output. Lists 10 specific workflows with step counts, distinguishing it from individual sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'when an agent needs a composite liverable without orchestrating tools manually.' Also describes cache, timeout, async fallback, and skip_failed_steps behavior, providing clear context for usage. Does not explicitly list when not to use, but purpose is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

working_capitalB

Read-only

Inspect

Optimiseur du BFR — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Agicap — BFR optimisation · DSO 52→38j · Cash libéré +€2.8M · 3 quick wins immédiats. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`industry`	No
`challenges`	Yes
`financials`	Yes
`topCustomers`	No
`topSuppliers`	No

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations set readOnlyHint=true, indicating no state mutation. Description adds that inputs are validated server-side and returns a deliverable. However, the async parameter (present in schema) is not mentioned in the description, leaving behavior for long-running calls unclear. The reference case is promotional rather than behavioral.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short but includes a reference case line that is not essential for tool invocation. The core purpose is stated, but the marketing-like content adds noise. Adequate but not maximally efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested objects, 7 parameters, no output schema), the description is incomplete. It does not specify the return format or structure of the deliverable, nor does it mention the async parameter for pacing expectations. The agent lacks guidance on what to expect as output or how to handle the asynchronous option.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 14% schema description coverage, the description does not compensate. It mentions required fields (company, financials, challenges) but does not explain their meaning, constraints, or how to populate them. Parameters like industry, topCustomers, topSuppliers are undocumented.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is a working capital optimizer (BFR = besoin en fonds de roulement) for CFO-level expertise, returning a structured deliverable. It distinguishes from siblings like working_capital_esg_impact_rater and working_capital_fx_hedge_optimizer by specifying its general optimization focus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for working capital optimization but provides no explicit guidance on when to use this tool versus siblings or alternatives. It mentions CFO expertise but does not define scenarios or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

working_capital_esg_impact_raterB

Read-onlyIdempotent

Inspect

As a CFO, assess how ESG factors (Environmental, Social, Governance) influence working capital efficiency using IMF SDR and BIS data. Inputs include company sector, geographic exposure, and ESG risk scores. Outputs provide a quantitative impact rating on working capital metrics like days sales outstanding (DSO) and inventory turnover, alongside IMF SDR-aligned liquidity risk indicators.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	Yes	Primary geographic exposure (e.g., 'EU', 'APAC')
`sector`	Yes	Industry sector (e.g., 'manufacturing', 'energy')
`currency`	No	Reporting currency (ISO 4217 code, e.g., 'USD', 'EUR')
`esgRiskScore`	Yes	Aggregate ESG risk score (0-100)
`workingCapitalRatio`	No	Current working capital ratio (current assets / current liabilities)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`impactRating`	No	ESG impact on working capital efficiency (-100 to +100)
`esgFactorBreakdown`	No
`liquidityRiskIndicator`	No	IMF SDR-aligned liquidity risk score (0-1)
`workingCapitalAdjustment`	No	Projected adjustment to working capital ratio (%)

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and idempotentHint. The description adds context about data sources (IMF SDR, BIS) and output types (rating, liquidity indicators), but does not disclose additional behavioral traits like caching, rate limits, or side effects beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a moderate-length paragraph that front-loads purpose and audience. However, it slightly repeats information already in the schema (inputs list) and could be more concise by omitting obvious mappings.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers high-level purpose and outputs, but lacks details on parameter interactions, constraints, or edge cases. Given the presence of an output schema, the description is adequate but not complete for a 6-parameter tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description repeats some required inputs (sector, region, ESG score) but does not explain how optional parameters (currency, workingCapitalRatio) affect results or provide deeper meaning beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: assessing ESG factors' influence on working capital efficiency using specific data sources (IMF SDR, BIS). It lists inputs and outputs, distinguishing it from sibling tools like 'working_capital' by its ESG focus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description identifies a target user ('CFO') and implies when to use the tool, but it does not explicitly state when not to use it or provide alternatives. No guidance on prerequisites or differentiation from other ESG-related sibling tools is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

working_capital_fx_hedge_optimizerA

Read-onlyIdempotent

Inspect

For CFOs managing multinational working capital, this tool analyzes real-time ECB and FRED foreign exchange rates to recommend optimal hedging strategies. Input base currency, target currencies, and working capital amounts to receive forward contract suggestions, natural hedge opportunities, and cost-benefit analysis of various hedging instruments (forwards, options, swaps). Outputs include hedge ratios, estimated cost savings, and risk reduction metrics.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`baseCurrency`	Yes	ISO 4217 code of the company's functional currency (e.g., 'USD', 'EUR')
`riskAppetite`	No	Company's risk tolerance for currency fluctuations	balanced
`timeHorizonDays`	No	Planning horizon in days (default: 90)
`targetCurrencies`	Yes	ISO 4217 codes of currencies to hedge against (e.g., ['EUR', 'GBP', 'JPY'])
`workingCapitalAmounts`	Yes	Working capital amounts in each target currency (e.g., { EUR: 5000000, GBP: 3000000 })

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`recommendations`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, openWorldHint=true, idempotentHint=true. The description adds behavioral context: 'analyzes real-time ECB and FRED foreign exchange rates' (confirming open-world), explains that it recommends strategies (non-destructive), and lists specific instruments. This adds value beyond the annotations by disclosing data sources and output types.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loading the core purpose and data sources in the first sentence. The second sentence lists inputs and outputs efficiently. Every sentence adds value; no redundant or filler content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (financial hedging with multiple instruments and metrics), the description covers the essential inputs, data sources, and outputs. An output schema exists to explain return values. Minor gaps include lack of prerequisites (e.g., required user permissions) and error conditions, but these are mitigated by the comprehensive schema and annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all 6 parameters well. The description mentions baseCurrency, targetCurrencies, and workingCapitalAmounts, aligning with the required parameters. It does not repeat schema details but adds high-level context. For a tool with 100% coverage, this is adequate, but no additional semantic depth is provided beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: it analyzes ECB and FRED FX rates to recommend hedging strategies. It specifies the target user (CFOs), inputs (base currency, target currencies, amounts), and outputs (forward suggestions, natural hedges, cost-benefit analysis). This distinguishes it from sibling tools like 'working_capital' and 'working_capital_esg_impact_rater' by focusing specifically on FX hedging.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context ('For CFOs managing multinational working capital') and lists inputs, but provides no explicit guidance on when not to use this tool versus alternatives like 'treasury_optimizer' or 'supply_chain_fx_exposure_dashboard'. The usage is clear from the description, but lacks exclusions or comparative guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

x402_liquidity_monitorA

Read-onlyIdempotent

Inspect

Monitors real-time x402-USDC liquidity depth across 12 decentralized and centralized exchanges, providing slippage alerts and depth analysis for CFO liquidity risk assessment. Inputs include slippage thresholds and exchange selection; outputs liquidity depth, price impact estimates, and warning flags. Essential for optimizing trade execution and managing liquidity exposure. Keywords: liquidity monitoring, slippage analysis, DEX/CEX depth, x402-USDC pair, CFO financial tooling.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`exchanges`	No	List of exchanges to monitor (defaults to all 12 if empty)
`depthLevels`	No	Liquidity depth levels to analyze (percentage from mid-price)
`slippageThreshold`	Yes	Maximum acceptable slippage percentage (0-100)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`midPrice`	No	Current x402-USDC mid-price
`warnings`	Yes
`priceImpact`	No
`liquidityDepth`	Yes
`slippageAlerts`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, idempotentHint, openWorldHint. Description adds that it monitors real-time data and outputs depth, price impact, and warning flags. No contradictions. Since annotations cover safety, description adds useful context beyond them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, front-loaded with purpose and outputs. Every sentence adds value. No fluff or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With output schema available, description adequately covers inputs and outputs. Mentions key outputs (liquidity depth, price impact, warning flags). Could briefly mention async behavior but that is in schema. Overall complete for decision making.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description mentions inputs generically (slippage thresholds, exchange selection) but does not add meaning beyond what the schema already provides via parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it monitors real-time liquidity depth across 12 exchanges for the x402-USDC pair, providing slippage alerts and depth analysis for CFO risk assessment. It is distinct from sibling tools like x402_payment_flow_analyzer or usdc_x402_payments_intel.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly mentions it is essential for optimizing trade execution and managing liquidity exposure, giving clear context. However, it does not mention when not to use it or explicitly name alternative tools for other use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

x402_payment_flow_analyzerA

Read-onlyIdempotent

Inspect

As a CTO, analyze USDC payment flows involving x402 addresses to assess counterparty risk, trace transaction paths, and evaluate regulatory exposure. Input a wallet address or transaction hash to receive risk scores, flow diagrams, and compliance flags from Chainalysis and TRM Labs public APIs. Ideal for due diligence, fraud detection, and compliance reporting. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`depth`	No	Hops to trace in payment flow
`txHash`	No	USDC transaction hash to trace
`address`	Yes	Ethereum wallet address to analyze
`includeRiskScore`	No	Include counterparty risk scoring

Output Schema

ParametersJSON Schema

Name	Required	Description
`flowId`	No	Unique identifier for this payment flow analysis
`status`	Yes
`sources`	No
`warnings`	No
`riskScore`	No	Counterparty risk score (0-100)
`complianceFlags`	No
`exposureSummary`	No
`transactionPath`	No

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate the tool is read-only, idempotent, and open-world. The description adds valuable behavioral details: it uses Chainalysis and TRM Labs public APIs, produces risk scores, flow diagrams, and compliance flags, and supports asynchronous execution. This goes beyond the annotations without contradicting them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three short sentences, each earning its place: first states purpose and outputs, second names the data sources, third gives async advice. Front-loaded and efficient with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the tool's purpose, inputs, outputs, data sources, and async capability. With an output schema present, return value details are not required. It is sufficiently complete for a complex tool, though it could briefly mention the depth and includeRiskScore parameters for added clarity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description explicitly mentions the two primary input options (wallet address or transaction hash) and the async flag, reinforcing schema content but not adding substantial new semantic meaning beyond what is already documented.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it analyzes USDC payment flows for counterparty risk, transaction path tracing, and regulatory exposure. It specifies inputs (wallet address or tx hash) and outputs (risk scores, flow diagrams, compliance flags), but does not differentiate from closely related sibling tools like x402_payment_fraud_detector or x402_liquidity_monitor.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context that it is ideal for due diligence, fraud detection, and compliance reporting, and advises passing async:true to avoid timeout. However, it does not offer explicit guidance on when to use this tool versus alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

x402_payment_fraud_detectorA

Read-onlyIdempotent

Inspect

Risk-focused tool that analyzes x402-USDC payment transactions for fraud patterns using on-chain forensics. Takes a transaction hash or wallet address as input and returns risk scores, suspicious indicators, and historical patterns. Designed for risk management teams to quickly assess payment legitimacy. Includes keywords: fraud detection, USDC risk, blockchain forensics, transaction monitoring. pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`walletAddress`	No
`includeHistory`	No
`amountThreshold`	No
`transactionHash`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`warnings`	Yes
`riskScore`	Yes
`isSuspicious`	Yes
`sanctionsMatch`	No
`fraudIndicators`	No
`transactionHistory`	No

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and openWorldHint. The description adds that the tool returns risk scores and indicators, and mentions async usage to avoid timeout, which is helpful but not extensive beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with several sentences covering purpose, inputs, outputs, audience, and a usage tip. It is front-loaded with the main purpose, though keywords add minor clutter.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters, an output schema exists, and annotations cover safety, the description is sufficiently complete for a risk-focused tool, though missing some parameter details reduces completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 20% schema description coverage, the description partially compensates by noting transactionHash or walletAddress as acceptable inputs and async behavior, but does not explain includeHistory or amountThreshold parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes x402-USDC payment transactions for fraud patterns using on-chain forensics, with specific inputs and outputs. It distinguishes itself from sibling tools by focusing on payment fraud detection with risk scores and on-chain data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description targets risk management teams and advises using async=true to avoid timeouts, but does not explicitly compare to alternative tools like fraud_detector or x402_payment_flow_analyzer, nor provide when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Resources

Need Help?