gapup-mcp

Name: gapup-mcp
Author: getgapup

by io.github.getgapup

Server Details

271 agent-payable tools: competitive intel, finance, KYC, compliance, ESG. x402 per-call.

Status: Healthy
Last Tested: 2026-07-18 10:33
Transport: Streamable HTTP
URL
Repository: getgapup/gapup-mcp-public
GitHub Stars: 1
Server Listing: gapup-mcp

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

C2.6/5.0

Tool DescriptionsC

Average 3.6/5 across 241 of 271 tools scored. Lowest: 1.5/5.

Server CoherenceC

Disambiguation2/5

With 271 tools, there is significant overlap in tool purposes, especially among the many 'Gapup agent-payable C-suite expertise' tools and multiple tools for compliance, ESG, and finance. Many tools have similar descriptions, making it difficult for an agent to distinguish the best tool for a given task.

Naming Consistency2/5

Tool names mix English and French, with no consistent pattern. Some use snake_case (abm_architect), others use descriptive phrases (carbon_footprint_calculator). There are many async/result pairs that follow a pattern, but overall naming is inconsistent and lacks a clear verb_noun structure.

Tool Count1/5

271 tools is extremely excessive for any single server. This indicates a kitchen-sink approach with too many specialized tools, making the server unwieldy and difficult to navigate. Most servers should have 3-15 well-scoped tools; this far exceeds that range.

Completeness3/5

The server covers an extremely broad range of domains (compliance, finance, HR, content, etc.), but the coverage is uneven. Some areas have many tools while others have gaps (e.g., no content creation tools despite a content catalog). The sheer number suggests both over-coverage and missing essentials.

Available Tools

271 tools

abm_architectC

Read-only

Inspect

Architecte ABM — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Gapup Hub — ABM 20 comptes nommés · Budget €120k · Tier 1×5 + Tier 2×15 · Playbooks 3 niveaux. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`product`	Yes
`salesTeam`	No
`icpCriteria`	Yes
`abmBudgetEur`	No
`targetAccounts`	Yes
`currentChannels`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and that it returns a deliverable, which aligns with read-only behavior. However, it does not disclose other traits like performance, auth requirements, or output structure beyond 'structured, audited deliverable'.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (three sentences including the reference case) and front-loaded with the core purpose. The reference case adds concrete context without excessive verbosity. However, the first sentence is a noun phrase rather than an imperative verb, slightly reducing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 parameters, nested objects, no output schema, low schema coverage), the description omits essential details: what the deliverable contains, how to interpret results, and how to handle edge cases. The reference case is helpful but does not compensate for the lack of comprehensive guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%, meaning most parameters lack schema descriptions. The description does not explain the required fields (company, product, targetAccounts, icpCriteria) or their meaning; it merely says to send 'documented case fields'. The reference case provides an example but does not map to schema parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description identifies the tool as an ABM architect for C-suite expertise, returning a structured audited deliverable. The verb is implied ('architect') rather than explicit, and the reference case provides context. It distinguishes from siblings like 'abm_lookalike_account_finder' but could be clearer on the specific action (e.g., 'generates an ABM plan').

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives like 'abm_lookalike_account_finder' or 'account_expansion_mapper'. The reference case hints at typical usage but does not provide criteria for selection or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

abm_lookalike_account_finderA

Read-onlyIdempotent

Inspect

As a CMO, discover 50 B2B accounts that closely match your top 10 customers' tech stacks and firmographics. This tool analyzes public web data including robots.txt and OpenGraph metadata to identify lookalike accounts for targeted ABM campaigns. Input your top customer domains and desired firmographic filters to receive a ranked list of potential targets with matching technologies and company attributes.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`tech_stack_keywords`	No	Specific technologies to match in lookalike accounts
`firmographic_filters`	No
`top_customer_domains`	Yes	List of top 10 customer domains to use as seed accounts

Output Schema

ParametersJSON Schema

Name	Required	Description
`stats`	No
`status`	Yes
`sources`	Yes
`warnings`	Yes
`lookalike_accounts`	Yes
`matched_technologies`	No

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, openWorldHint, idempotentHint) are consistent with a read-only, idempotent tool. The description adds behavioral details (analyzes public web data, returns ranked list) beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (two sentences), front-loaded with the primary purpose, and avoids unnecessary details. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the tool's function, inputs, data sources, and output. Given the presence of an output schema and annotations, it is fairly complete. However, it could briefly note the async option for slow operations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 75% (3 of 4 params described). The description adds meaning by linking tech_stack_keywords and firmographic_filters to the tool's purpose. The async parameter is not mentioned in the description, but its schema description is clear.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: discovering 50 B2B lookalike accounts based on top customers' tech stacks and firmographics. It specifies the data sources (robots.txt, OpenGraph metadata) and output (ranked list). This distinguishes it from siblings like 'account_expansion_mapper' or 'competitive_deep_dive'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for ABM campaigns ('As a CMO...for targeted ABM campaigns') and specifies inputs (top customer domains, firmographic filters). However, it does not explicitly state when to prefer this tool over siblings, e.g., what scenarios warrant lookalike finding vs account expansion.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

account_expansion_mapperC

Read-only

Inspect

Mapping d'expansion comptes — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Notion B2B Enterprise — top 30 strategic accounts · expansion plays NRR 130%+ target · Snowflake/Shopify/Vercel/Stripe analyzed. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`accounts`	Yes
`ownership`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description aligns with annotations (readOnlyHint, openWorldHint) by stating it returns a deliverable and referencing a case. It does not contradict, but also does not add significant behavioral context beyond what annotations convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately concise but includes extraneous detail like the reference case (Notion, Snowflake, etc.). It front-loads the purpose, but could be more streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, nested objects, no output schema), the description is insufficient. It does not describe the output format, how to use results, or any constraints. The openWorldHint is not explained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, and the description does not explain any of the 5 parameters beyond 'send the documented case fields.' No value is added for understanding complex nested fields like company, accounts, or ownership.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a structured, audited deliverable for account expansion mapping, with a specific reference case (Notion B2B Enterprise). However, it mixes French and English and does not differentiate from sibling tools like growth_path_architect or upsell_hunter.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description only says to send documented case fields and mentions server-side validation. It provides no guidance on when to use this tool versus alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

action_plan_esgB

Read-only

Inspect

Plan d'action ESG — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: TechCorp SAS — Plan ESG 36 mois (500 FTE, €60M CA, score 54→76/100). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`horizon`	Yes		36 mois
`ambitions`	Yes
`targetLabels`	No
`currentScores`	No
`availableResources`	Yes

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include 'readOnlyHint: true' and 'openWorldHint: true', and the description's claim of returning a deliverable aligns with a read operation. However, the description does not elaborate on the 'agent-payable' aspect, side effects, or any limitations beyond server-side validation. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (two sentences plus a reference case) and front-loaded with the tool's purpose. However, it could be more structured by separating the usage instruction or parameter hints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 8 parameters, nested objects, and no output schema, the description is insufficient. It does not describe the output format, constraints, or prerequisites beyond validating inputs server-side. The reference case provides an example but does not cover general usage scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (13%). The description does not explain any parameter semantics; it merely instructs to 'send the documented case fields' without detailing what those fields are. This forces the agent to rely solely on the schema, which lacks descriptions for many properties.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a structured, audited ESG action plan. It uses specific language like 'Plan d'action ESG' and references a concrete case, but does not explicitly differentiate it from similar tools like 'esg_audit_multi' or 'carbon_roadmap'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by asking to 'send the documented case fields' but provides no explicit guidance on when to use this tool over alternatives. The reference case gives context but no when-not or comparative advice.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

adversarial_input_stress_testerA

Read-onlyIdempotent

Inspect

An asynchronous risk assessment tool that evaluates AI model resilience against adversarial inputs following NIST AI Risk Management Framework (RMF) red-teaming protocols. Designed for security and compliance personas, it accepts model outputs or decision boundaries and returns structured risk scores, failure modes, and adversarial examples. Requires async:true to avoid timeout errors. Outputs include status, warnings, and source references.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`maxTests`	No	Maximum number of adversarial tests to run
`modelOutput`	Yes	The AI model's output or decision to be stress-tested
`adversarialDataset`	No	Optional custom adversarial inputs to test
`sensitivityThreshold`	No	Threshold for flagging high-risk adversarial examples

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`riskScore`	No	Normalized risk score from adversarial testing
`failureModes`	No
`adversarialExamples`	No

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses key behavioral traits: the tool is asynchronous, requires async:true, and outputs status, warnings, and source references. Annotations already indicate readOnlyHint, openWorldHint, and idempotentHint, so the description adds value by specifying the NIST framework and red-teaming context, without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three sentences, each serving a distinct purpose: purpose and framework, target audience and async requirement, and output description. It is front-loaded with key information, avoiding unnecessary verbosity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of annotations (readOnlyHint, openWorldHint, idempotentHint) and an output schema, the description provides sufficient context about the tool's purpose, usage constraints, and outputs. It covers all essential aspects for an AI agent to understand and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the input schema documents all parameters. The description adds meaningful context by highlighting the need for async:true, which complements the schema's description of the async parameter. No parameter details are repeated, fulfilling the role of adding value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as an asynchronous risk assessment tool for evaluating AI model resilience against adversarial inputs, following NIST AI RMF red-teaming protocols. It targets security and compliance personas, which distinguishes it from sibling tools like jailbreak_attempt_detector and safety_guardrail_breach_analyzer.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states that async:true is required to avoid timeout errors, providing a clear usage directive. However, it does not mention when not to use this tool or suggest alternative tools for related tasks, which would improve guidance for an AI agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

affiliate_fraud_clickstream_detectorA

Read-onlyIdempotent

Inspect

Analyzes affiliate clickstream data from Common Crawl to flag potential fraud patterns (duplicate IPs, rapid clicks, device spoofing). Designed for CMOs to validate affiliate traffic quality and prevent budget waste. Inputs: affiliate network name and date range. Outputs: fraud probability score, suspicious IP list, and pattern analysis. Keywords: affiliate fraud detection, clickstream analysis, marketing attribution, traffic validation.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`threshold`	No	Fraud probability threshold (0.1-0.99)
`date_range`	Yes
`affiliate_network`	Yes	Name of the affiliate network to analyze (e.g., 'CJ Affiliate', 'Rakuten')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`suspicious_ips`	No
`fraud_probability`	No	Overall fraud probability score (0-1)
`patterns_detected`	No
`total_clicks_analyzed`	No

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, openWorld, and idempotent hints. The description adds context about data source (Common Crawl) and output specifics, but does not disclose behavioral traits like data freshness or processing delays. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is three sentences, front-loaded with the main action, and efficient. The keyword list at the end is somewhat redundant but does not hinder clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description adequately covers inputs, outputs, and purpose. Minor gaps: the role of the threshold parameter is not explained in relation to fraud detection sensitivity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is high (75%+), so baseline is 3. The description mentions inputs (affiliate network, date range) but does not add meaning beyond what the schema provides for the threshold parameter. No additional constraints or formats clarified.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes affiliate clickstream data to flag fraud patterns like duplicate IPs and rapid clicks. It names the specific data source (Common Crawl), target users (CMOs), inputs and outputs, and distinguishes itself from siblings like the general fraud_detector.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for validating affiliate traffic quality but does not explicitly state when not to use it or compare with sibling tools (e.g., fraud_detector). It lacks guidance on alternatives or exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

africa_trade_barrier_breakerA

Read-onlyIdempotent

Inspect

As a COO, analyze non-tariff trade barriers (NTBs) across African trade corridors using WITS and UNCTAD STAT data. Input origin/destination countries and product HS codes to receive barrier mapping with severity scores and actionable mitigation strategies. Returns structured risk assessment, regulatory compliance gaps, and supply chain optimization recommendations. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`hs_code`	No	6-digit Harmonized System product code
`origin_country`	Yes	ISO 3-letter country code for export origin
`destination_country`	Yes	ISO 3-letter country code for import destination
`include_regulatory_details`	No	Whether to include detailed regulatory text in output

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`warnings`	Yes
`barrier_summary`	Yes
`trade_flow_impact`	No
`regulatory_details`	No
`mitigation_strategies`	Yes

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, openWorld, idempotent. The description adds value by mentioning the async option to avoid timeout and specifying the return of structured risk assessment and recommendations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences that efficiently convey purpose, data sources, inputs, outputs, and a behavioral note. Front-loaded and no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the main aspects: data sources, inputs, outputs, and async option. Given the complexity, it is reasonably complete, though could mention limitations or output schema details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters. The description does not add new meaning beyond what the schema provides, meeting the baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: analyzing non-tariff trade barriers using specific data sources (WITS, UNCTAD STAT). It distinguishes from siblings by focusing on NTBs and providing barrier mapping with severity scores and mitigation strategies.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context (COO, African trade corridors) and input requirements, but does not explicitly state when not to use this tool or mention alternatives among sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

africa_trade_finance_esg_raterA

Read-onlyIdempotent

Inspect

As a COO, evaluate ESG compliance of African trade finance providers using World Bank WITS trade statistics and CDP climate disclosure data. Input the financial institution's name or identifier, and receive an ESG rating with breakdown across environmental, social, and governance dimensions. Ideal for due diligence on trade partners or portfolio risk assessment. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`year`	No	Assessment year (2018-2023)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`countryCode`	No	ISO 2-letter country code (e.g., 'ZA' for South Africa)
`institutionName`	Yes	Full name of the trade finance provider (e.g., 'Standard Bank Group')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`warnings`	Yes
`esgRating`	Yes
`socialScore`	No
`tradeVolume`	No	Annual trade finance volume (USD)
`carbonIntensity`	No	CO2 emissions per million USD financed (tons)
`governanceScore`	No
`environmentalScore`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds context beyond annotations: mentions async execution to avoid timeout, data sources used. Annotations already indicate read-only, open-world, idempotent. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, no wasted words. First sentence states purpose, second describes input/output, third gives use case, fourth mentions async. Front-loaded with key info.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given complexity (4 params, output schema exists, many siblings), description covers rating dimensions, data sources, async option, and use case. Lacks detailed output explanation but output schema compensates.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% description coverage. Description mentions institutionName as input, but doesn't add new meaning beyond schema. Baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool evaluates ESG compliance of African trade finance providers using specific data sources (World Bank WITS, CDP), outputs an ESG rating with breakdown, and distinguishes from siblings like supplier_esg_audit by focus on African trade finance.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

States ideal for due diligence on trade partners or portfolio risk assessment, and mentions async:true to avoid timeout. Lacks explicit exclusions or comparison to similar ESG tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

africa_trade_preference_arbitrageB

Read-onlyIdempotent

Inspect

Analyzes AGOA (African Growth and Opportunity Act) and EBA (Everything But Arms) trade preference arbitrage opportunities for COOs evaluating export strategies. Compares tariff rates, trade volumes, and preference utilization across eligible African countries using WITS and OECD trade data. Returns structured analysis of potential duty savings, market access advantages, and compliance requirements. — pass async:true REQUIRED to avoid x402 timeout.

ParametersJSON Schema

Name	Required	Description
`year`	No	Reference year for trade data
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`hs_code`	Yes	6-10 digit Harmonized System product code
`exporting_country`	Yes	ISO 2-letter country code of African exporter
`importing_country`	No	ISO 2-letter country code of target market (US/EU)
`preference_scheme`	No	Trade preference scheme to analyze

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`duty_savings_pct`	No	Estimated duty savings percentage under preference scheme
`trade_volume_usd`	No	Annual trade volume in USD for given HS code
`market_access_score`	No	Composite score of market access advantage (0-100)
`compliance_requirements`	No	List of compliance requirements for preference eligibility
`preference_utilization_rate`	No	Percentage of eligible exports utilizing preference

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark the tool as readOnly, openWorld, and idempotent, providing good safety disclosure. However, the description contradicts the schema by stating 'pass async:true REQUIRED' when the async parameter is optional in the schema. This misleading requirement harms transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is mostly concise with three sentences, but the directive 'pass async:true REQUIRED' is misleading and adds unnecessary emphasis. Could be improved by removing the contradiction and streamlining the async guidance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of trade preference analysis, the description covers the tool's purpose, data sources, and output types (duty savings, market access, compliance). It adequately complements the existing output schema, but the async confusion detracts from completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameter details are fully provided. The description adds useful context about data sources (WITS, OECD) not in the schema, but does not significantly elaborate on individual parameters beyond the schema. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it analyzes AGOA and EBA trade preference arbitrage opportunities for COOs, and details the outputs (tariff comparison, duty savings, etc.). It distinguishes from siblings like africa_trade_preference_optimizer and agoa_eba_intelligence by focusing on arbitrage, though it does not explicitly name them.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies the target user (COOs evaluating export strategies) and the general context, but does not provide explicit when-to-use vs. alternatives or exclusions. Usage is implied but not thoroughly guided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

africa_trade_preference_optimizerA

Read-onlyIdempotent

Inspect

As a COO, analyze AGOA/EBA duty savings opportunities with HS code-level trade route optimization. Input origin country, destination country, and HS code to receive duty savings estimates, optimal trade routes, and preference utilization recommendations. Uses UN Comtrade trade flow data, WCO tariff schedules, and African Union trade agreement rules. Ideal for export market evaluation, supply chain optimization, and trade agreement compliance analysis. Keywords: AGOA, EBA, duty savings, trade optimization, HS code, African trade, export strategy.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`hsCode`	Yes	6-10 digit Harmonized System code (e.g., '010121' for live horses)
`quantity`	No	Estimated annual export quantity in units
`valueUsd`	No	Estimated annual export value in USD
`originCountry`	Yes	ISO 3166-1 alpha-3 country code of export origin (e.g., 'KEN' for Kenya)
`destinationCountry`	Yes	ISO 3166-1 alpha-3 country code of import destination (e.g., 'USA' for United States)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`dutySavings`	No	Estimated annual duty savings in USD under optimal preference program
`optimalRoute`	No
`alternativeRoutes`	No
`complianceWarnings`	No	Potential compliance risks or documentation requirements

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, which cover the key behavioral traits. The description adds context about data sources and outputs but does not disclose additional behaviors such as potential latency, data freshness, or prerequisites. Given the annotation coverage, the description provides moderate added value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph that covers purpose, inputs, outputs, data sources, and use cases, but it includes a keyword list at the end that adds redundancy. It could be more concise by focusing on the unique value proposition and omitting the keyword list, which is already covered in the description.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (not shown but present) and the description mentions specific outputs and data sources, the contextual information is fairly complete for a trade optimization tool. However, it does not mention the async parameter defined in the input schema, which could be relevant for long-running queries. Still, the description provides sufficient context for a reasonable understanding of the tool's functionality.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% parameter description coverage, so each parameter is already well-documented. The description only reiterates the main inputs at a high level without adding new semantic meaning or constraints beyond what the schema provides. Therefore, the description adds no significant value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: analyzing AGOA/EBA duty savings opportunities with HS code-level trade route optimization. It specifies the inputs (origin country, destination country, HS code) and outputs (duty savings estimates, optimal trade routes, recommendations). The mention of keywords and data sources (UN Comtrade, WCO, AU) helps distinguish it from siblings like 'africa_trade_barrier_breaker' or 'agoa_eba_intelligence'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides usage context by targeting a COO role and listing ideal use cases (export market evaluation, supply chain optimization, compliance analysis). However, it does not explicitly state when to use this tool over siblings, nor does it provide exclusions or alternatives. The guidance is implied but not explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agoa_eba_intelligenceB

Read-only

Inspect

Intelligence préférentielle AGOA (US→Africa) et EBA/GSP (EU→Africa). Vérifie l'éligibilité d'un pays africain aux programmes tarifaires préférentiels, l'éligibilité d'un produit par code HS, identifie les meilleures opportunités d'export Afrique→US/EU, et fournit les règles de conformité (rules of origin, valeur ajoutée, docs). Différenciateur Africa diaspora : 39 pays AGOA + 47 LDCs EBA encodés. Sources : AGOA.info · EU EBA · EU GSP+ · WTO Tariff · UN Comtrade.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Mode d'analyse : 'country_eligibility' (statut AGOA/EBA/GSP d'un pays africain) \| 'product_eligibility' (éligibilité d'un produit par code HS) \| 'trade_opportunity' (top opportunités export Afrique→US/EU) \| 'compliance_check' (rules of origin, seuils valeur ajoutée, documentation)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`hs_code`	No	Code HS (Harmonized System) 6+ chiffres (requis pour product_eligibility). Exemple : '620342' = pantalons coton homme, '090111' = café arabica non torréfié, '060310' = fleurs fraîches.
`country_iso`	No	Code ISO 2-lettres du pays africain (requis pour country_eligibility). Exemples : KE=Kenya, NG=Nigeria, ZA=Afrique du Sud, ET=Éthiopie, LS=Lesotho, GH=Ghana.
`destination`	No	Marché de destination pour trade_opportunity : 'US', 'EU', ou 'both' (défaut). Ignoré pour les autres modes.

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, indicating a safe, read-only operation using external data. The description confirms this by mentioning external sources (AGOA.info, EU EBA) and describing read-only checks. It adds minimal behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph of four sentences, roughly 120 words. It is concise and front-loads the main action. Every sentence adds value (purpose, differentiator, sources). Minor improvement could be structuring with bullet points for the four modes.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema, and the description does not explain what the tool returns (e.g., eligibility status, opportunity list, compliance rules). Given the complexity (5 parameters, 4 modes), the agent needs to know the output format to invoke and interpret results correctly. This is a significant gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not add parameter-specific details beyond what is already in the schema (e.g., examples for hs_code and country_iso are in the schema). It provides general context but no enrichment of parameter meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: verifying AGOA/EBA/GSP eligibility, identifying export opportunities, and providing compliance rules. It uses specific verbs and resources, and includes a differentiator (39 AGOA countries + 47 LDCs EBA). Though it does not explicitly differentiate from sibling tools, the purpose is unambiguous and comprehensive.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide guidance on when to use this tool versus alternatives (e.g., other trade preference tools). There are no explicit 'when to use' or 'when not to use' statements, leaving the agent to infer usage context from the description alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_act_incident_responseA

Read-onlyIdempotent

Inspect

Generates EU AI Act incident response playbooks with regulator notification templates for risk management teams. Inputs include incident severity, AI system type, and affected stakeholders. Outputs structured playbook steps, regulator notification drafts, and compliance checklists. Essential for high-risk AI system breaches requiring formal EU notification — pass async:true REQUIRED to avoid x402 timeout. Keywords: AI Act compliance, incident response, regulator notification, risk management, ISO 27035, NIST SP 800-61.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`severity`	Yes
`incident_type`	Yes
`ai_system_type`	No
`incident_description`	No
`affected_stakeholders`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`next_steps`	No
`playbook_steps`	No
`compliance_checklist`	No
`regulator_notification`	No

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, and idempotentHint. The description adds information about timeout avoidance via async mode, which is not covered by annotations. However, it does not disclose other behavioral traits beyond what annotations imply. The description adds marginal value, resulting in a moderate score.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, front-loading the purpose, and efficiently conveys essential information. The inclusion of keywords adds value without excessive verbosity. Minor redundancy (e.g., 'Keywords:' list) does not detract significantly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and annotations, the description provides sufficient context: purpose, usage guidance, parameter hints, and async requirement. It does not need to explain return values because the output schema exists. The description is complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (17%), with only the 'async' parameter having a description. The tool description lists 'incident severity, AI system type, and affected stakeholders' as inputs, providing some semantic context for these parameters. However, it does not explain all parameters or their formats, only partially compensating for the schema gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates EU AI Act incident response playbooks with regulator notification templates, specifying verb, resource, and audience. It lists inputs and outputs, making the purpose unambiguous. However, it does not explicitly differentiate from sibling tools like 'incident_response_evidence_collector' or 'ai_act_sandbox_regulatory_sandbox', so it does not achieve a top score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says it is 'essential for high-risk AI system breaches requiring formal EU notification' and mandates 'pass async:true REQUIRED to avoid x402 timeout.' This provides clear context for when to use the tool. It does not include explicit exclusions or alternatives, but the guidance is sufficient for the intended use case.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_act_sandbox_regulatory_sandboxA

Read-onlyIdempotent

Inspect

A legal-focused tool for simulating EU AI Act regulatory sandbox submissions. Provides structured feedback on compliance, risk levels, and required documentation based on EUR-Lex and OECD AI Policy Observatory sources. Accepts AI system descriptions, intended use cases, and technical specifications as input. Returns detailed assessment with warnings, citations, and actionable recommendations for legal teams and AI developers.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`sector`	No	Primary sector of application
`riskLevel`	Yes	Self-assessed risk level of the AI system
`intendedUse`	Yes	Primary and secondary use cases of the AI system
`documentation`	No	List of provided documentation types (e.g., 'technical', 'ethical', 'data')
`systemDescription`	Yes	Detailed description of the AI system including purpose, architecture, and data sources

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`assessment`	No

Tool Definition Quality

A3.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, so the description adds value by specifying that the tool returns a detailed assessment with warnings, citations, and recommendations. It discloses the nature of the output without contradictions. However, it could further clarify the behavioral scope (e.g., that it does not submit actual documents).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loading the core purpose and then detailing inputs and outputs. Every sentence adds value, though it could be trimmed slightly (e.g., 'legal-focused tool' is redundant given the context). Still, it is efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains the tool's purpose, inputs, and output type. However, it omits guidance on the 'async' parameter (which is part of the schema) and does not address potential prerequisites or usage constraints. With an output schema present, the description is moderately complete but has notable gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description adds minimal meaning beyond the schema. It rephrases the required parameters ('AI system descriptions, intended use cases, and technical specifications') but does not clarify enum values, optional fields like 'async', or provide format examples. For a 100% coverage scenario, a score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: simulating EU AI Act regulatory sandbox submissions. It specifies the verb 'simulating' and the resource 'regulatory sandbox submissions', and mentions inputs and outputs. However, it does not explicitly distinguish itself from sibling tools like 'ai_act_incident_response' or 'ai_act_training_data_audit', so it falls short of a perfect score.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context ('for simulating EU AI Act regulatory sandbox submissions') but provides no explicit guidance on when to use this tool vs. alternatives, nor any exclusions or prerequisites. The agent must infer usage from the purpose alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_act_training_data_auditA

Read-onlyIdempotent

Inspect

As a CTO, audit AI training datasets for EU AI Act compliance with bias detection and regulatory risk assessment. Inputs: dataset identifier (Hugging Face ID or URL) and optional risk thresholds. Outputs: compliance score, bias metrics, regulatory warnings, and source references. Ideal for pre-deployment risk evaluation. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`dataset_id`	Yes	Hugging Face dataset identifier or direct URL to dataset
`risk_threshold`	No
`include_bias_metrics`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`bias_metrics`	No
`compliance_score`	No
`dataset_metadata`	No
`regulatory_warnings`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, openWorldHint, idempotentHint. The description adds output details (compliance score, bias metrics, regulatory warnings, source references) and async behavior. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is concise with two sentences, front-loading the purpose. Could be more structured (e.g., bullet lists) but is efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers input, output, usage context, and async tip. Output schema exists, so return values are partially handled. Lacks mention of prerequisites or authentication, but annotations cover safety.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 50% (missing descriptions for risk_threshold and include_bias_metrics). The description adds context for risk threshold and bias metrics but does not fully compensate for the gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool audits AI training datasets for EU AI Act compliance with bias detection and risk assessment, specifying input types and outputs. It distinguishes itself from siblings like ai_act_incident_response by focusing on pre-deployment audit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Mentions 'Ideal for pre-deployment risk evaluation' and advises using async to avoid timeout, but does not explicitly exclude other use cases or compare with sibling tools like ai_act_incident_response.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_governance_full_report_asyncA

Read-only

Inspect

Audit EU AI Act complet (Règlement UE 2024/1689) — implémentation native audit-grade. Classifie le système IA selon les 4 tiers de risque (unacceptable/high_risk/limited_risk/minimal_risk/gpai) sur la base de l'Annexe III et de l'Article 5. Produit : (1) classification tier + justification + articles applicables, (2) checklist conformité Articles 9-15 + 50 + 53-55, (3) gaps documentation Annexe IV, (4) mapping ISO 42001, (5) deadlines EU AI Act 2025-2029, (6) estimation coût et effort, (7) top 10 recommandations P0/P1/P2. Retourne immédiatement (<300ms) un job_id. Poller avec ai_governance_full_report_result(job_id) après eta_seconds (~90s). Cache 7 jours pour inputs identiques. Async tool — register a webhook via webhooks_manage(register, url, [job.completed]) to receive callbacks instead of polling. Faster + lighter. DISCLAIMER : non substitutif à un avis juridique professionnel.

ParametersJSON Schema

Name	Required	Description
`company_size`	No	Taille entreprise : startup (≤50), smb (51-250), mid (251-1000), large (1001-5000), enterprise (>5000)
`data_sources`	No	Sources de données utilisées par le système IA
`affected_persons`	No	Catégories de personnes affectées par les décisions du système (ex: candidats, employés, clients)
`geographic_scope`	No	Zones géographiques de déploiement (ex: 'EU', 'France', 'Global')
`intended_purpose`	Yes	Finalité prévue du système IA : à quoi sert-il concrètement
`deployment_context`	No	Contexte de déploiement : interne (usage employés), public, B2B, B2C
`ai_system_description`	Yes	Description détaillée du système IA : ce qu'il fait, comment il fonctionne, quelles décisions il prend

Output Schema

ParametersJSON Schema

Name	Required	Description
`job_id`	Yes	Identifiant unique du job — passer à ai_governance_full_report_result
`status`	Yes
`eta_seconds`	Yes	Durée estimée avant disponibilité du résultat
`submitted_at`	Yes	Timestamp ISO-8601 de soumission

Tool Definition Quality

A3.8/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description declares it returns a job_id and creates an audit, implying mutation, but annotations set readOnlyHint: true. This is a clear contradiction, misleading the agent about side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is fairly concise given the complexity, but slightly verbose with French text. Front-loaded with main purpose and key usage details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all necessary context: async nature, polling/webhook alternatives, caching, disclaimer. The description is complete for the tool's complexity and provides actionable guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions, so baseline is 3. The tool description does not add parameter-specific semantics beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs a full EU AI Act audit, classifying AI systems and producing comprehensive outputs. It distinguishes itself from sibling tools like ai_governance_full_report_result by noting async submission and polling.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly explains when to poll with the result tool or register a webhook for callbacks, giving clear alternatives for asynchronous workflow.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_governance_full_report_resultA

Read-onlyIdempotent

Inspect

Poll the result of an ai_governance_full_report_async job. Returns status=pending while running, status=completed with the full EU AI Act governance audit report once done (risk_tier, compliance checklist Articles 9-15/50/53-55, Annex IV documentation gaps, ISO 42001 alignment, deadlines 2025-2029, cost estimate, top-10 recommendations P0/P1/P2, compliance_score), status=failed on error, or status=not_found if the job_id is unknown or expired (TTL 24h). Call this after the eta_seconds hint returned by ai_governance_full_report_async (~90s).

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	Yes	The job_id returned by ai_governance_full_report_async (prefix: aigfr_)

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, destructiveHint. Description adds status behavior, TTL of 24h, and job_id prefix, providing additional behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single concise paragraph with front-loaded purpose. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Output schema exists, so description doesn't need full return details. Yet it provides a comprehensive list of report components, statuses, and TTL, making it very complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with description for job_id. Description reiterates the origin of job_id but adds minimal extra meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Poll the result of an ai_governance_full_report_async job' with specific verb and resource. Distinguishes from sibling tool ai_governance_full_report_async which is the async launcher.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises to call after the eta_seconds hint (~90s). Describes statuses (pending, completed, failed, not_found) to guide polling behavior.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_governance_pilotC

Read-only

Inspect

Pilotage de gouvernance IA — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: TalentScope SAS — scoring IA candidats RH (EU AI Act Annex III §4, high-risk). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`aiUseCases`	Yes
`targetFrameworks`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and returns an audited deliverable, but does not disclose additional behaviors like rate limits or authentication needs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences and a reference case. However, the use of French may obscure meaning for some users, and the structure could be more front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex input schema with nested objects and no output schema, the description lacks information about the return format and how the deliverable is structured. It does not address gaps in schema documentation or provide a complete picture for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, and the description does not explain any parameters or their constraints. The generic statement 'send the documented case fields' does not compensate for the lack of parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool returns a structured, audited deliverable for AI governance piloting, with a reference case about high-risk AI systems. However, it does not clearly distinguish from sibling tools like ai_governance_full_report_async.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The mention of 'agent-payable' and 'RISK' hints at cost or risk context, but no direct when-to-use or when-not-to-use advice.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

anti_demissions_hrC

Read-only

Inspect

Bouclier anti-démissions — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Buffer Inc — détection des at-risk parmi 80 FTEs (Q1 2026). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`signals`	Yes
`employees`	Yes

Tool Definition Quality

C2.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint=true and openWorldHint=true, so the description adds limited behavioral context beyond stating that it returns an 'audited deliverable' and that inputs are validated server-side. This is some additional value but not substantial.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (4 sentences) but includes cryptic jargon ('Gapup agent-payable C-suite expertise') and a reference case that may not be universally understood. It is front-loaded but could be clearer.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex nested input schema and lack of output schema, the description is incomplete. It does not explain the deliverable structure, async behavior, or how to interpret results, leaving significant gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description does not add any meaning to the parameters beyond what is in the schema. With only 20% schema description coverage, the description should compensate but fails to mention fields like company, signals, or employees.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description mentions 'Bouclier anti-démissions' and a reference case about detecting at-risk employees, which hints at HR attrition analysis. However, the purpose is vague and not clearly distinguished from sibling tools like churn_defender or talent_poaching_risk. The name 'anti_demissions_hr' is cryptic.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The description does not mention prerequisites, scenarios, or exclusions. It simply describes the tool's function and input requirements.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

arbitration_awards_lookupA

Read-onlyIdempotent

Inspect

Commercial arbitration intelligence for litigation lawyers, M&A due diligence teams, sovereign wealth funds and trade finance compliance. Covers 8 major institutions: ICC, AAA, LCIA, HKIAC, SIAC, CIETAC, DIAC, ICDR.

Three modes: • party_lookup — find awards by party name (searches 20 landmark public awards + JusMundi best-effort) • institution_index — browse awards and caseload stats per institution with date range filter • clause_check — audit an arbitration clause for missing elements (institution, seat, language, arbitrator count, governing law, binding nature)

Note: Most arbitration awards are confidential. This tool surfaces public awards (Yukos, Crystallex, Achmea, etc.) plus redacted statistics from institutional annual reports. Private awards are not accessible.

Cache: 24h (arbitration data is very stable). No API key required.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	party_lookup: search by party name or keyword. institution_index: browse awards by institution + stats. clause_check: audit an arbitration clause for issues.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	For party_lookup: party name or keyword (e.g. "Yukos", "Russia"). For institution_index: institution name or keyword. For clause_check: full text of the arbitration clause to audit.
`date_to`	No	ISO date filter to (YYYY-MM-DD). Applied to award_date.
`date_from`	No	ISO date filter from (YYYY-MM-DD). Applied to award_date.
`institution`	No	Filter by institution. Default 'all'.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`awards`	No
`status`	Yes
`sources`	Yes
`clause_check`	No
`quality_score`	Yes
`institution_stats`	No

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true, idempotentHint=true, destructiveHint=false; the description confirms read-only behavior and adds caching (24h) and the fact that private awards are inaccessible. No contradictions and useful extra detail beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections for modes and notes. It is front-loaded with the purpose. However, it is somewhat lengthy and could be more concise without losing important context.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple modes, institutional coverage, data limitations) and the presence of an output schema, the description covers all necessary aspects: data sources, mode behavior, caching, and API requirements. It is thorough and sufficient for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions. The description adds significant value by explaining each mode's purpose in detail, especially clause_check (audit missing elements). The async parameter is also described. This goes beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly defines the tool as 'commercial arbitration intelligence' for specific user roles, lists three distinct modes with examples, and covers 8 major institutions. It is specific and distinguishes from the many sibling tools which are unrelated to arbitration.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use each mode (party_lookup, institution_index, clause_check) and notes limitations (most awards confidential, only public awards). Caching and API key info are included. However, it does not explicitly state when not to use the tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

attack_surface_monitorC

Read-only

Inspect

Surveillance surface d'attaque — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Answers: Which Internet-facing assets of combine a critical CVE, an exposed service, and no WAF — top findings to fix in 14 days? · What is the attack surface of : subdomains, open ports, SSL/TLS grades, and associated CVEs? · Give me a CISO-ready ASM report with blast radius estimate and SLA-driven remediation plan for . · What is the email phishing risk for ? Assess SPF/DMARC posture and recommend improvements. · During M&A due diligence, what are the top cyber exposures on 's Internet-facing infrastructure? Reference case: Velora Payments — 8 assets exposés · 2 critiques (CVE-2023-44487 HTTP/2 RapidReset, Admin panel ouvert) · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`domain`	Yes
`exclusions`	No
`scope_cidrs`	No
`include_email_surface`	Yes

Tool Definition Quality

C2.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint=true, openWorldHint=true) are consistent. The description adds that it returns a structured deliverable and that inputs are validated server-side. However, it doesn't discuss performance, rate limits, or what happens with async mode beyond the parameter hint.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose, mixing French jargon with a long list of example questions. It lacks a clear, front-loaded summary. The examples are useful but make the description bloated and unstructured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is provided, so the description should explain return values, but only says 'structured, audited deliverable'. With 6 parameters and multiple use cases (assets, email, M&A), the description is incomplete and does not address all dimensions adequately.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, yet the description does not explain most parameters (domain, focus, exclusions, scope_cidrs, include_email_surface). Only async gets indirect mention via example. The description fails to compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it's about attack surface surveillance and returns a structured deliverable, but it lists multiple disparate questions (e.g., assets, phishing risk, M&A due diligence) without a single clear verb+resource. The title 'Surveillance surface d'attaque' helps, but the purpose is scattered across various use cases.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides example queries but no explicit guidance on when to use this tool over its many siblings (e.g., cve_security_lookup, domain_tech_fingerprint). No 'when not to use' or comparison to alternative tools is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

audit_pre_flightC

Read-only

Inspect

Pré-audit comptable — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Spendesk — Pré-audit commissaire · Readiness 74/100 · 4 findings critiques · Checklist 18 docs. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`audit`	Yes
`company`	Yes
`systems`	Yes
`financials`	Yes
`knownIssues`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations set readOnlyHint=true and openWorldHint=true. The description adds minimal behavioral context beyond stating inputs are validated server-side. It does not disclose side effects, authentication needs, rate limits, or what 'audited deliverable' entails beyond the annotation hints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences plus a reference case. The first sentence is dense with jargon, and the reference case, while illustrative, adds length without core functional explanation. A cleaner structure focusing on verb+resource would improve efficiency.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested objects, no output schema), the description is insufficient. It does not describe the output format, error handling, or how to interpret the 'audited deliverable'. The lack of detail undermines correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 17% and the description provides no parameter explanations, despite 6 complex parameters (5 required) including nested objects. The description only mentions 'documented case fields' without detailing what fields are needed, leaving the agent without critical parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it performs a 'Pré-audit comptable' and returns a 'structured, audited deliverable', which clearly indicates a pre-audit assessment. The reference case adds concrete context. However, the jargon 'Gapup agent-payable C-suite expertise (CFO)' is distracting and could be clearer.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description provides a reference case but no conditions, exclusions, or mentions of sibling tools like 'qa_pre_flight'. The agent has no direction on appropriate use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

banking_fee_negotiatorA

Read-onlyIdempotent

Inspect

As a CFO-focused tool, banking_fee_negotiator analyzes your bank's fee structures (account maintenance, wire transfers, credit lines) and provides data-driven negotiation recommendations. Input your current fees and bank details to receive benchmark comparisons from World Bank and ECB SDW, along with specific levers to reduce costs. Ideal for optimizing treasury operations and improving financial efficiency. Keywords: bank fees, cost optimization, treasury management, financial benchmarking, negotiation strategy.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`industry`	No	Industry classification (e.g., 'manufacturing', 'retail')
`bank_country`	Yes	ISO 2-letter country code of the bank
`credit_line_fee`	No	Current annual credit line fee percentage
`wire_transfer_fee`	No	Current domestic wire transfer fee in USD
`international_wire_fee`	No	Current international wire transfer fee in USD
`account_maintenance_fee`	Yes	Current monthly account maintenance fee in USD

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`negotiation_levers`	No
`credit_line_benchmark`	No	Industry benchmark for credit line fees percentage
`wire_transfer_benchmark`	No	Regional benchmark for domestic wire transfer fees in USD
`international_wire_benchmark`	No	Regional benchmark for international wire transfer fees in USD
`account_maintenance_benchmark`	No	Regional benchmark for account maintenance fees in USD

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent behavior. The description adds relevant context about using external data sources (World Bank, ECB SDW) and returning benchmark comparisons and levers. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at ~70 words, front-loading the purpose and followed by use case and keywords. It is efficient but could be slightly more compact without losing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 7 parameters and an output schema, the description adequately covers the main functionality and context. It mentions benchmark comparisons and levers, which aligns with the existence of an output schema. Minor omission: no mention of async parameter handling.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so each parameter is well-documented in the schema. The description mentions the main fee types and bank_country, but does not add significant new semantics beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: analyzing bank fee structures and providing data-driven negotiation recommendations. It specifies the resource (bank fees) and distinguishes itself from siblings by being CFO-focused and using World Bank and ECB SDW benchmarks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context for when to use the tool (optimizing treasury operations, improving financial efficiency) and implies the target user (CFOs). However, it does not explicitly state when not to use it or name alternative tools among the siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

battle_cards_liveC

Read-only

Inspect

Fiche de combat live — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub vs McKinsey Lilli — Deal SaaS B2B €500k · Win rate +11 pts · 6 objections clés armées. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`ourOffer`	Yes
`competitor`	Yes
`dealContext`	Yes
`knownWeaknesses`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, indicating a safe read operation. Description adds that inputs are validated server-side and returns an audited deliverable, but lacks details on auth needs, rate limits, or async behavior beyond the parameter.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is relatively short (two sentences plus reference) but mixes languages and includes jargon. Could be more structured and front-load the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters with nested objects and no output schema, the description lacks completeness. Does not describe output structure, async polling, or parameter constraints. Inadequate for a tool of this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 20% schema coverage (async described), description does not compensate. Does not explain key parameters (competitor, dealContext, ourOffer) beyond 'send the documented case fields'. Adds no meaning to the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states it returns a structured, audited deliverable for competitive battle cards, with a reference case. Purpose is generally clear but muddled by jargon (e.g., 'Gapup agent-payable C-suite expertise') and French terms. Could be more direct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs. siblings like 'competitive_deep_dive' or 'battle_plan'. Does not specify prerequisites, when not to use, or compare with alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

battle_planC

Read-only

Inspect

Plan de bataille marketing — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Q3 2026 · Budget €120k · Pipeline €800k · 5 chantiers prioritaires. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`quarter`	Yes
`teamSize`	Yes
`arrTarget`	Yes
`budgetEur`	Yes
`arrCurrent`	Yes
`companyName`	Yes
`topChannels`	Yes
`icpDescription`	Yes
`currentBlockers`	Yes
`primaryObjective`	Yes

Tool Definition Quality

C2.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already include readOnlyHint=true, so description adds 'audited deliverable' and 'inputs validated server-side', which are useful but don't contradict annotations. Missing details on auth requirements, rate limits, or side effects. Transparency is adequate but not enhanced beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, mixed French and English, with a reference case that may be specific to a single scenario. Could be more concise by removing the example and focusing on general use. Acceptable but not optimized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 11 parameters, no output schema, and no description of return structure, the description is insufficient. It does not explain what the deliverable contains, how to interpret results, or the significance of the reference case. Annotations add minimal context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 9%, and the description does not explain any parameters beyond enumerating 'companyName' implicitly via the case. No parameter meaning or constraints are clarified, and the description fails to compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses 'marketing battle plan' and 'returns a structured, audited deliverable', which indicates a strategic marketing plan generator. However, it's vague on the specific outputs and doesn't differentiate from siblings like 'bp_narratif' or 'brand_builder'. The reference case adds context but is not a clear purpose statement.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives. The description provides a specific reference case but no context about prerequisites, exclusions, or when not to use. Sibling tools are not mentioned or differentiated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bias_amplification_trackerA

Read-onlyIdempotent

Inspect

Tracks bias amplification in LLM outputs by analyzing fairness metrics from HuggingFace's model leaderboard. Designed for risk assessment personas to detect and quantify demographic, gender, or racial bias amplification in generated text. Accepts model identifiers or output samples, returns structured bias metrics and amplification trends.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`modelId`	No	HuggingFace model identifier (e.g., 'facebook/opt-1.3b')
`outputSamples`	No	Array of LLM output strings to analyze for bias amplification
`demographicGroups`	No	Specific demographic groups to monitor (e.g., ['gender', 'race'])

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`biasMetrics`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, indicating safe, idempotent operation. The description adds that it accepts model identifiers or output samples and returns structured bias metrics, providing useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, highly efficient, with no redundant information. Core purpose, usage context, and output are front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With output schema present and full input schema coverage, the description adequately covers the tool's purpose and use cases. Could briefly mention async option but not critical.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description mentions accepting model identifiers and output samples, which maps to schema parameters, but does not add significant new meaning beyond the existing parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool tracks bias amplification in LLM outputs using HuggingFace fairness metrics, and specifies it is for risk assessment personas. The purpose is distinct from sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates it is designed for risk assessment personas but does not explicitly state when not to use it or provide alternatives. Usage context is implied but not contrasted with other tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bond_covenant_esg_compliance_checkerA

Read-onlyIdempotent

Inspect

As a CFO, quickly assess whether your bond covenants meet ESG compliance standards set by BIS and ECB. This tool analyzes covenant text against regulatory benchmarks, identifying potential ESG-related risks in carbon emissions, governance practices, and social impact clauses. Input bond covenant details and receive structured compliance insights with source references. Ideal for pre-issuance due diligence or ongoing monitoring of existing bond portfolios.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`couponType`	No	Type of bond coupon
`covenantText`	Yes	Full text of the bond covenant to analyze
`issuerSector`	No	Industry sector of the bond issuer (e.g., energy, finance)
`jurisdiction`	No	Legal jurisdiction governing the bond (e.g., EU, US)
`maturityDate`	No	Maturity date of the bond

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`riskAreas`	No
`complianceScore`	No
`recommendations`	No

Tool Definition Quality

A4.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds 'quickly assess' and 'structured compliance insights with source references' but lacks details on rate limits, authorization needs, or behavior under invalid inputs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose, no superfluous words. Each sentence earns its place: first states the main action, second explains how and output.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 6 parameters with full schema coverage and an output schema, the description is complete enough. It covers when to use and what to expect without missing critical context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, baseline is 3. The description adds value by mentioning specific ESG dimensions (carbon, governance, social impact) that help prioritize parameters like issuerSector and jurisdiction, providing contextual meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: assessing bond covenant ESG compliance against BIS and ECB standards, analyzing covenant text for risks in carbon emissions, governance, and social impact. It distinguishes from sibling 'bond_covenant_monitor' by focusing on ESG compliance and regulatory benchmarks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions use cases: pre-issuance due diligence and ongoing monitoring. It provides clear context but does not explicitly exclude alternative use cases or name competing tools like 'bond_covenant_monitor' or other ESG checkers.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bond_covenant_monitorA

Read-onlyIdempotent

Inspect

As a CFO, monitor bond covenant compliance by analyzing leverage ratios (debt-to-equity, debt-to-EBITDA) and interest coverage ratios using real-time financial data. Input a company's ticker symbol and optional covenant thresholds to receive compliance status, key financial metrics, and SEC filing references. Ideal for proactive debt management and regulatory compliance tracking. Keywords: bond covenants, leverage ratio, interest coverage, debt compliance, SEC filings, financial health.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`ticker`	Yes	Company ticker symbol (e.g., 'AAPL')
`covenantThresholds`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`warnings`	Yes
`debtToEquity`	No
`leverageRatio`	No
`lastFilingDate`	No
`complianceStatus`	Yes
`interestCoverage`	No
`nextFilingDeadline`	No

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint as true. The description adds useful context: it uses real-time financial data and outputs compliance status, key metrics, and SEC filing references. This goes beyond annotations by clarifying the output structure and data source. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured paragraph with four sentences. It front-loads the core purpose and includes actionable keywords. Though slightly verbose, every sentence contributes meaning and there is no redundant content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (3 params, nested object, output schema existing), the description sufficiently covers inputs, outputs (compliance status, metrics, SEC references), and purpose. It does not need to detail return values as an output schema exists, making it contextually complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 67% (top-level params described, but nested thresholds have descriptions too). The description adds meaning by stating 'optional covenant thresholds' and explaining their role in compliance analysis. However, the schema already documents parameters well, so the description adds modest value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly defines the tool's purpose as monitoring bond covenant compliance by analyzing leverage and interest coverage ratios. It specifies the verb (monitor/analyze) and resource (bond covenants). However, it does not differentiate from sibling tools like 'bond_covenant_esg_compliance_checker' or 'syndicated_loan_covenant_breach_alert', which overlap in functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description suggests use cases (proactive debt management, regulatory compliance tracking) but lacks explicit guidance on when not to use this tool or how it differs from alternatives. It does not provide exclusions or prerequisites, which limits its utility for decision-making.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bp_narratifB

Read-only

Inspect

Business Plan narratif — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Stripe Series A 2012. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`raise`	Yes
`company`	Yes
`keyMetrics`	Yes

Tool Definition Quality

B3.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds context about CFO expertise and a reference case, which is consistent and adds value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loading the purpose. It is reasonably concise, though it could be shorter by omitting the 'Inputs are validated' sentence.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has complex nested parameters and no output schema. The description does not explain the output format ('structured, audited deliverable') or provide enough context to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only 25% of parameters have descriptions in the schema (async). The description does not explain the nested objects (company, raise, keyMetrics) or how to structure them, failing to compensate for low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a 'structured, audited deliverable' for a business plan narrative, targeting CFO-level expertise. However, it does not differentiate from similar sibling tools like 'ftg_business_plan' or 'pitch_deck_storyline'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a reference case (Stripe Series A 2012) and mentions inputs are validated server-side, implying how to use it. But it lacks explicit guidance on when to use this tool vs alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

brand_builderC

Read-only

Inspect

Architecte de marque — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Pennylane — brand identity SaaS fintech B2B FR/EU (2023). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`brand`	Yes
`target`	Yes
`founder`	Yes
`existingAssets`	No

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint. Description adds that inputs are validated server-side and returns a structured deliverable, but lacks details on auth, rate limits, or what 'audited' means.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is short but uses jargon ('Gapup agent-payable C-suite expertise') that may confuse. Front-loading is decent but lacks clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given complex nested parameters and no output schema, the description is incomplete. It doesn't explain the deliverable's content, output format, or behavior beyond returning a result.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Description provides no semantics for the 5 parameters (3 required). Schema coverage is 20%, and the description only says 'send the documented case fields', offering no additional meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it's a 'Brand Architect' tool that returns a structured, audited deliverable, with a reference case for context. However, it doesn't explicitly differentiate from sibling tools like positioning_strategist or brand_equity_voice_share_calculator.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description implies C-suite/CMO use but doesn't specify when to choose it over siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

brand_equity_voice_share_calculatorA

Read-onlyIdempotent

Inspect

Calculates brand equity voice share for CMOs by analyzing mentions across 500K+ news articles and forums from Common Crawl and Wayback Machine. Inputs include brand name, competitors, and time range. Outputs voice share percentage, sentiment distribution, and top sources. Ideal for competitive benchmarking and brand visibility tracking. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`brand`	Yes
`time_range`	Yes
`competitors`	No
`include_forums`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`top_sources`	No
`total_mentions`	No
`brand_voice_share`	No
`sentiment_distribution`	No
`competitor_voice_shares`	No

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the annotations (readOnlyHint, idempotentHint, openWorldHint), the description adds details about data sources, output types, and timeout behavior with async support. This enriches the agent's understanding of tool behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is just four sentences, front-loading the primary action, then inputs, outputs, use case, and a practical tip. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, key inputs, outputs, ideal use case, and async behavior. Given the presence of an output schema, the output summary is sufficient. It could mention the async result polling mechanism, but overall it is contextually complete for a tool of moderate complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 20% schema description coverage, the description partially compensates by listing 'brand name, competitors, and time range' as inputs. It mentions forums, hinting at the include_forums parameter, but does not explain time_range format or other details. The async parameter description is covered in schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates 'brand equity voice share' using specific data sources (Common Crawl, Wayback Machine). It targets CMOs and mentions competitive benchmarking, which gives a precise purpose. However, it does not explicitly differentiate it from sibling tools like brand_builder or competitive_deep_dive.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description notes it is 'ideal for competitive benchmarking and brand visibility tracking' and advises using async:true for timeouts. This provides context but lacks explicit guidance on when to use this tool versus alternatives, such as sentiment analysis or other brand tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

budget_variance_aiB

Read-only

Inspect

Analyse d'écart budgétaire — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Answers: Explain the key drivers of the budget vs actual variance for in — what are the top 10 narrative explanations? · Which cost categories drove the budget overrun for in , and what corrective actions should management take? · Revise the Q4 forecast based on observed Q3 variances for — give me 3 scenarios (base, optimistic, conservative). · Prepare a board-ready budget variance memo for — , budget €M vs actual €M, with management actions. · What are the quick wins to reduce budget overspend for by end of quarter without impacting growth targets? Reference case: Doctolib Q3 2026 — budget €38.5M vs actual €41.2M (+7.0%) — cloud + headcount + deals timing. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`entity`	Yes
`budgetContext`	Yes

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and openWorldHint=true, which the description aligns with by noting it returns a 'structured, audited deliverable'. It adds that inputs are validated server-side, but no additional behavioral traits (e.g., rate limits, destruction) are disclosed beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose with a long list of example questions and a reference case, making it harder to scan. Key details (purpose, input requirements) are buried; it could be significantly shortened while retaining clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with nested objects, low schema coverage, and no output schema, the description lacks comprehensiveness. It provides usage examples but does not explain the required input structure or the format of the returned deliverable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 25% (low). The description provides no explanation of any parameters (e.g., 'entity', 'budgetContext'), only stating to 'send the documented case fields'. This fails to compensate for the lack of schema documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs budget variance analysis ('Analyse d'écart budgétaire') and provides specific example queries (e.g., top 10 narrative explanations, corrective actions, forecast scenarios). This differentiates it from siblings by focusing on financial analysis deliverable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage scenarios through example queries (e.g., 'Explain the key drivers') but does not explicitly state when to use or avoid this tool versus alternatives. No exclusions or competitor tool mentions are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

candidate_screening_rankingA

Read-onlyIdempotent

Inspect

AI-powered candidate screening and ranking for recruiters, hiring managers, ATS providers and recruitment AI agents. Ingests a job description and 1-50 candidate resumes, returning a ranked shortlist with score breakdowns across five weighted criteria: skills_match (tech stack and soft skills extracted from JD vs resume), experience_match (years vs seniority level inferred from JD), education_match (degree level + top-school detection), role_progression (Junior to Senior to Lead patterns), culture_fit_estimate (remote/hybrid, startup vs enterprise). Per candidate: overall_score 0-100, matched/missing skills, red_flags (job hopping, employment gaps, seniority mismatch), green_flags (long tenure, promotions), 3-5 interview questions, fit_summary. Diversity signals are first-name proxies ONLY with mandatory ethical WARNING. All processing is local -- no external API calls, instant response, privacy-preserving.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`candidates`	Yes	Array of candidate objects. Maximum 50.
`role_country`	No	Optional ISO 2-letter country code for regional context (informational).
`job_description`	Yes	Full text or summary of the job description and role requirements.
`criteria_weights`	No	Optional weighting per criterion. Default: skills=0.4, experience=0.2, education=0.1, progression=0.15, culture=0.15.

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`nice_to_have`	Yes
`quality_score`	Yes
`required_skills`	Yes
`candidates_ranked`	Yes
`diversity_signals`	No
`shortlist_recommended`	Yes
`job_description_summary`	Yes

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds critical behavioral context: all processing is local, no external API calls, instant response, privacy-preserving. It also details the five weighted criteria and output elements (scores, flags, interview questions), significantly expanding on the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single but dense paragraph that conveys all necessary information. It front-loads the main purpose and then provides details on criteria, output, and privacy. While structured, it could benefit from bullet points for readability, but each sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, nested objects, output schema), the description covers inputs, outputs, criteria, flags, and ethical warnings (diversity signals). It is complete enough for an AI agent to understand and invoke correctly, even without seeing the output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with each parameter described. However, the description adds meaning by explaining the five criteria (skills_match, experience_match, etc.) and how they are computed, which goes beyond the schema's descriptions. The description also clarifies the optional nature and defaults for criteria_weights.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description specifies the verb 'screening and ranking', the resource 'candidates for a job description', and provides details on inputs (JD, 1-50 resumes) and outputs (ranked shortlist with score breakdowns). It clearly distinguishes this tool from siblings by focusing on AI-powered recruitment screening, which is unique among the listed tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states it is for recruiters, hiring managers, ATS providers, and recruitment AI agents, and indicates when to use it (ingesting a job description and candidate resumes). It does not explicitly exclude scenarios or mention alternatives, but the context is sufficient for correct selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

capacity_planningC

Read-only

Inspect

Planification capacitaire — Gapup agent-payable C-suite expertise (CHRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — 22→48 FTE en 12m · ARR €480k→€1.7M · Plan d'embauches par département. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`benchmarks`	No
`financials`	Yes
`constraints`	No
`currentTeam`	Yes
`hiringBudgetEur`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and returns a structured deliverable, but does not disclose additional behavioral traits like cost or latency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short and front-loaded, but includes a specific reference case that may not be universally relevant. Every sentence adds some value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, nested objects, no output schema), the description is insufficient. It does not explain the output format, required data, or how the tool fits into broader workflows.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is very low (14%), and the description provides no parameter specifics beyond 'send the documented case fields'. It does not compensate for the lack of schema documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly indicates it's for capacity planning, targeting CHRO/HR roles, with a reference case. However, the use of French and 'Gapup agent-payable' jargon may obscure purpose for some agents.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description implies it's for HR capacity planning but does not exclude other scenarios or specify prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

capital_strategyC

Read-only

Inspect

Stratégie de financement — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: Alan assurance santé SaaS — séquence Seed→A→B→C (2016-2022). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`growthPlan`	Yes
`financialPosition`	Yes
`founderConstraints`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, which implies a safe read operation. The description notes server-side validation but adds little beyond that. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences, but includes a reference case that may not be essential for an AI agent. Information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has complex nested inputs and no output schema. The description fails to clarify the return format or how the deliverable is structured, making it incomplete for an agent to fully grasp the tool's usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only 'async' parameter described). The description does not explain the meaning of any other parameters or nested fields, leaving the agent to rely solely on property names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: a financing strategy tool that returns a structured audited deliverable, with a reference case to illustrate. However, it does not distinguish from sibling tools like 'funding_hunter' or 'cap_table_strategist'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description only mentions that inputs are validated server-side and to send documented case fields, but does not provide context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cap_table_strategistC

Read-only

Inspect

Stratège du cap table — Gapup agent-payable C-suite expertise (FUNDRAISING). Returns a structured, audited deliverable. Reference case: Aleph AI Series B — modèle dilution multi-rounds + simulations secondaires + hygiène equity · 5 scenarios. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`plannedRounds`	Yes
`currentCapTable`	Yes
`founderObjectives`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, so the tool is read-only with non-deterministic output. The description adds that it returns an 'audited deliverable' and inputs are validated server-side. This provides some context but does not fully disclose behavioral traits like rate limits, authentication, or specific output structure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short (two sentences) and front-loaded with the purpose. The reference case adds useful context but could be omitted or shortened. Overall, it is reasonably concise, though the mix of French and English may reduce clarity for some agents.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, 6 parameters, 4 required, no output schema) and low schema coverage, the description is insufficient. It does not explain the output format, how to structure the nested inputs, or what 'audited deliverable' means. The reference case provides some context but leaves many gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (only 'async' parameter described). The description does not explain any parameters, nor does it compensate for the low coverage. It says 'send the documented case fields' but does not list or clarify them, leaving the agent with insufficient guidance on how to populate required fields like company, currentCapTable, etc.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly indicates the tool is a cap table strategist for fundraising, returning a structured deliverable. It mentions a reference case (Aleph AI Series B) giving concrete examples of functionality. However, it is in French and does not explicitly differentiate from sibling tools like financial_model_3statement or deal_structurer.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no explicit guidance on when to use this tool versus alternatives. It does not specify prerequisites, contexts, or exclusions. The only hint is the 'FUNDRAISING' tag, which implies usage, but no direct comparison with sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

carbon_footprint_calculatorA

Read-onlyIdempotent

Inspect

Calculate a company's greenhouse-gas footprint under the GHG Protocol (Scope 1 + 2 + 3, in tCO2eq, tier-2 accuracy ±20%). Returns the emissions breakdown, hotspot identification, 5-8 reduction levers each with capex and payback, an SBTi-aligned reduction trajectory over 5-25 years, the 15 Scope-3 categories in detail, and CSRD/ESRS reporting readiness. When to use this tool: the user needs a carbon assessment for CSRD compliance pre-audit, green-finance access, or supplier ESG scorecards. Inputs: the company profile and its activity data. Delivered by Émilie, the AI Sustainability lead of the Gapup portfolio.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`perimeter`	Yes
`scope1Sources`	No
`scope2Sources`	Yes
`reductionTargets`	No
`scope3Activities`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`kpis`	No	3-5 headline ESG KPI bubbles
`hotspots`	Yes	Top emission sources ranked by contribution
`breakdown`	Yes	Emissions breakdown by scope
`csrdReadiness`	Yes	CSRD/ESRS reporting readiness assessment
`sbtiTrajectory`	No	SBTi-aligned annual reduction trajectory
`reductionLevers`	Yes	5-8 actionable reduction levers with financial analysis
`executiveSummary`	Yes	Board-ready GHG assessment prose
`scope3Categories`	No	GHG Protocol 15 Scope-3 categories detail
`totalEmissionsTco2eq`	Yes	Total GHG footprint in tCO2eq (Scope 1+2+3 combined, ±20% tier-2 accuracy)

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, idempotentHint=true, destructiveHint=false. Description adds useful details: tier-2 accuracy ±20%, output specifics (breakdown, levers, trajectory), and mentions it's delivered by a named persona. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is 3-4 sentences, front-loaded with core function and outputs. Efficient but could omit the persona line. No repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Tool is complex with many parameters and output schema. Description provides high-level purpose and output but lacks guidance on parameter usage and async support. Output schema exists, so return values are covered, but parameter semantics gap makes it incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%, but description only vaguely mentions 'Inputs: the company profile and its activity data', failing to explain the 8 parameters including async, focus, perimeter, and nested objects. Almost no value added beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool calculates GHG footprint under GHG Protocol, specifying scopes, unit, accuracy, and detailed outputs. It distinguishes from siblings by mentioning specific use cases like CSRD compliance and supplier ESG scorecards.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: for CSRD compliance pre-audit, green-finance access, or supplier ESG scorecards. Does not provide exclusions or alternatives, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

carbon_roadmapC

Read-only

Inspect

Roadmap carbone — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: Cas démo — Roadmap carbone. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`perimeter`	Yes
`scope1Sources`	No
`scope2Sources`	Yes
`reductionTargets`	No
`scope3Activities`	No

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, so the description's mention of 'returns a structured, audited deliverable' adds minimal behavioral context. It notes server-side validation but omits error handling or output format details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (3 sentences) and front-loaded with the name, but includes confusing jargon ('Gapup agent-payable') and lacks clarity. It could be more concise and structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 parameters, nested objects, no output schema), the description is severely incomplete. It does not explain how to structure inputs, the return value, or handle the async option.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%, yet the description adds no parameter-specific information beyond 'send the documented case fields.' It fails to compensate for the lack of schema descriptions for most parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a 'structured, audited deliverable' for carbon roadmap, indicating a report output. However, it does not clearly differentiate from siblings like carbon_footprint_calculator and uses jargon ('Gapup agent-payable C-suite expertise') that may obscure purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It only mentions a reference case but no context for selection or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

champion_mappingB

Read-only

Inspect

Cartographie du champion — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Spendesk × Decathlon (deal €120k/an) — Champion identifié : CFO Group · Plan 6 semaines multi-touch. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`deal`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`knownContacts`	Yes
`sellerContext`	Yes

Tool Definition Quality

B3.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true and openWorldHint=true. The description adds that the tool returns a 'structured, audited deliverable' and inputs are validated server-side. No behavioral contradictions; it clearly indicates a non-destructive operation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences: purpose, reference case, instruction. Front-loaded and efficient, but the reference case example could be shortened or separated for clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested objects, 4 parameters, no output schema), the description is adequate but incomplete. It provides an example and states the deliverable is structured and audited, but lacks details on output format and field-level semantics.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is very low (25%), only the 'async' parameter has a description. The description does not explain the other parameters (deal, knownContacts, sellerContext) beyond a vague reference to 'send the documented case fields', failing to add meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it's about 'Cartographie du champion' (champion mapping) for C-suite expertise (CRO) and returns a structured deliverable. The reference case clarifies the output, but the scope is not fully differentiated from similar tools like deal_coach.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions a reference case and instructs to 'send the documented case fields', implying use when you have deal data, but does not explicitly state when to use this tool instead of siblings like battle_cards_live or win_loss_decoder.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

change_failure_root_cause_classifierA

Read-onlyIdempotent

Inspect

Classifies root causes of change failures for CTO-level incident analysis. Uses GitHub PR metadata and Snyk vulnerability data to identify patterns like dependency vulnerabilities, configuration drift, or deployment process gaps. Inputs include GitHub PR URL or incident ID, and outputs structured root cause categories with confidence scores. Ideal for post-mortem analysis and change risk assessment.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`pr_url`	Yes
`incident_id`	No
`snyk_org_id`	No
`time_range_days`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`root_causes`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint, openWorldHint, and idempotentHint, which the description complements by explaining data sources (GitHub, Snyk) and output format (confidence scores). There is no contradiction; the description adds behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loading purpose, then data sources, then input/output and use case. No unnecessary words; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, the description does not need to detail return values. It covers purpose, inputs, data sources, and ideal use case. However, it omits prerequisites (e.g., GitHub/Snyk access) and does not explain the async pattern or all parameters fully.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (async has a description). The description adds meaning for pr_url and incident_id by stating they are inputs, but does not explain snyk_org_id or time_range_days. It partially compensates for the low coverage but is not comprehensive.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool classifies root causes of change failures for CTO-level incident analysis. It specifies inputs (GitHub PR URL or incident ID), data sources (GitHub PR metadata and Snyk vulnerability data), and outputs (structured root cause categories with confidence scores). This distinguishes it from siblings like dependency_vulnerability_scan by focusing on root cause categorization for post-mortem analysis.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions 'Ideal for post-mortem analysis and change risk assessment' but does not explicitly state when not to use this tool or suggest alternatives. No comparison to sibling tools is provided, so the agent lacks clear boundaries for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

china_ecommerce_intelA

Read-only

Inspect

Chinese e-commerce intelligence for the ZH diaspora (50M+), import-export teams, brand IP enforcement, MENA/Africa entrepreneurs sourcing from China, and brand monitoring. Covers Taobao, Tmall, JD.com, Pinduoduo, 1688.com (B2B) and AliExpress (cross-border).

Five modes: • product_search — search products by keyword across CN platforms. Returns title ZH/EN, price CNY + USD estimate, sales 30d, rating, seller info, product URL. • seller_profile — full seller/supplier dossier: factory vs reseller detection, certifications (ISO, BSCI, CE), rating, years in business, main categories. • price_history — 12-month price trend for a product (live current price + seasonal model for CN shopping festivals: 11.11, 6.18, CNY). • brand_monitoring — detect counterfeits and grey market listings: price anomaly detection (>50% below MSRP = suspicious), counterfeit keyword scan, risk score 0-100. • market_intel — category overview: top 5 sellers by market share, avg/median price, volume estimate, price range.

Data quality note: LIVE data from Taobao/Tmall/JD/Pinduoduo REQUIRES AICI_RESEARCH_PROXY_URL with CN residential routing (Bright Data -country-cn). Without proxy: AliExpress (cross-border) + curated category fallback available.

Input formats for seller_profile: 'platform:id' e.g. 'aliexpress:123456', '1688:87654321', 'tmall:apple-store-official'. Input formats for price_history: AliExpress product URL or numeric product ID.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analysis mode. product_search=find products, seller_profile=supplier dossier, price_history=price trend, brand_monitoring=counterfeit detection, market_intel=category overview.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Keyword, product name, product_id, seller_id (platform:id), brand name, or category. Accepts Chinese characters (ZH) or English.
`region`	No	Market region. CN-domestic=full platform coverage, cross-border=AliExpress+1688 focus. Default: CN-domestic.
`platform`	No	Target platform. Default: all. Note: taobao/tmall/jd/pinduoduo require CN proxy.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`products`	No
`market_intel`	No
`platform_used`	Yes
`price_history`	No
`quality_score`	Yes
`seller_profile`	No
`brand_monitoring`	No

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations: it notes that data is live, explains proxy dependencies, describes what each mode returns, and mentions data quality considerations. No contradiction with annotations (readOnlyHint=true, destructiveHint=false).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear header, bulleted mode list, data quality note, and input format examples. It is comprehensive but not verbose, with every sentence serving a purpose. The most important information (purpose, modes) is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 modes, 5 parameters, proxy requirements, multiple platforms) and the presence of an output schema (mentioned in context), the description is complete. It covers all modes, input formats, data quality, and preconditions, leaving no major gaps for an agent to understand correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with descriptions, but the description greatly expands on parameter meaning. It explains the five modes in detail, provides input format examples for query (e.g., 'platform:id' for seller_profile), and clarifies platform and region behavior. This adds substantial value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as Chinese e-commerce intelligence, lists supported platforms and five distinct modes (product_search, seller_profile, price_history, brand_monitoring, market_intel). It specifies the target audience and differentiates from sibling tools by focusing on specific Chinese platforms.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description identifies intended users (ZH diaspora, import-export teams, etc.) and provides context on when to use each mode. It includes a data quality note about proxy requirements for certain platforms and fallback behavior. However, it does not explicitly state when not to use this tool or name alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

china_market_dataA

Read-only

Inspect

Chinese capital market intelligence for the ZH diaspora (50M+) and institutional investors. Covers A-Shares (SSE/SZSE), H-Shares (HKEX), and ADRs across four modes:

• company — full company profile: name ZH/EN, USCC (18-digit social credit code), exchange, industry (CSRC classification), chairperson, registered capital, SOE flag • market_quote — real-time quote: price (CNY or HKD), change%, volume, market cap, P/E ratio, dividend yield, last update timestamp • sector_overview — sector snapshot: top 5 companies by market cap, avg P/E, 30-day sector index change. Supported sectors: semiconductor, ev, battery, technology, finance, energy, realestate, consumer, pharma, telecom • regulatory_filing — recent regulatory disclosures (HKEX filings: annual, quarterly, announcements, mergers, IPOs) with title, date, document URL

Input formats accepted: • 6-digit A-Share ticker (e.g. '600519' for Moutai SSE) • HKEX ticker (e.g. '0700.HK' or '700' for Tencent) • Company name in EN or ZH (e.g. '腾讯', 'Kweichow Moutai') • Sector keyword (e.g. 'semiconductor', '半导体')

Data sources: Yahoo Finance (primary, always accessible), Eastmoney push2 + CompanySurvey (via Bright Data proxy when AICI_RESEARCH_PROXY_URL is set), HKEX filing API. Note: Eastmoney/CSRC/SSE are blocked from datacenter IPs without proxy — set AICI_RESEARCH_PROXY_URL to unlock full coverage.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analysis mode. company=full profile, market_quote=price data, sector_overview=top 5 by sector, regulatory_filing=recent filings.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Ticker (6-digit A-share, 4-digit HK, Yahoo format), company name (ZH or EN), or sector keyword.
`exchange`	No	Exchange filter. Default: all. Affects sector_overview ticker selection.
`period_days`	No	Lookback period in days for regulatory filings. Default: 30.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`status`	Yes
`company`	No
`sources`	Yes
`market_quote`	No
`quality_score`	Yes
`sector_overview`	No
`regulatory_filings`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. Description adds context on data sources (Yahoo Finance, Eastmoney, HKEX) and critical access constraint that Eastmoney/CSRC/SSE are blocked from datacenter IPs without proxy. This goes beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is well-structured with bullet lists for modes, input formats, and data sources. Front-loaded with core purpose. Every sentence adds value, though slightly lengthy. Could be trimmed without losing information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given complexity (5 parameters, 4 modes, multiple data sources) and presence of output schema, description covers modes, inputs, sources, and access restrictions comprehensively. No major gaps for agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all 5 parameters with 100% coverage. Description elaborates on mode parameter with detailed outputs (e.g., company profile includes USCC, registered capital) and input format examples (e.g., '600519' for Moutai). Adds meaningful context beyond enum labels.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states tool provides Chinese capital market intelligence across four specific modes (company, market_quote, sector_overview, regulatory_filing) covering A-Shares, H-Shares, and ADRs. Distinct from sibling tools like india_market_data by explicit geographic focus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description details four modes and accepted input formats, implicitly guiding usage. Does not explicitly state when to use alternatives or exclude certain cases. Lacks explicit when-to-use vs. when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

churn_defenderC

Read-only

Inspect

Bouclier anti-churn — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Spendesk — portefeuille 400 clients PME/ETI, détection churn Q2 2025 (€8M ARR). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`accounts`	Yes
`csrContext`	No
`analysisWindowDays`	Yes

Tool Definition Quality

C2.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint=true, indicating no side effects. The description adds that the tool returns an audited deliverable, but does not detail what the deliverable contains or any processing behavior beyond input validation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short but uses jargon ('Bouclier', 'CRO', 'Gapup agent-payable') and includes a reference case that may not be helpful to the agent. The key purpose is not front-loaded clearly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of nested objects and 5 parameters, the description fails to explain the return value, the structure of the deliverable, or how to interpret results. It is incomplete for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, yet the description provides no explanation of parameters beyond 'send the documented case fields'. No parameter details are given, leaving the agent to rely entirely on the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description mentions 'anti-churn' and 'returns a structured, audited deliverable', but it lacks a clear verb indicating the exact action (e.g., 'analyze churn risk'). It does not differentiate from sibling tools like upsell_hunter or renewal_optimizer.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool over alternatives. The description only says to send documented case fields, without providing context for proper invocation or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

climate_scenario_rcpA

Read-only

Inspect

Projections climatiques long terme par scénario IPCC (RCP AR5 + SSP AR6) pour toute localisation. Scénarios : RCP_4_5, RCP_8_5 (AR5), SSP1_2_6, SSP2_4_5, SSP3_7_0, SSP5_8_5 (AR6), ou 'all' (compare tous). Horizons : 2030–2100. Métriques : température (delta vs baseline 1990-2010, jours >35°C, nuits chaudes), précipitations (delta%, événements extrêmes, sécheresses), hausse du niveau de la mer (cm vs 2000), événements extrêmes (ouragans, inondations P100, sécheresses), indice incendie. Sorties : comparaison multi-scénarios, probabilité IPCC, signaux d'impact business par secteur. Sources : Open-Meteo CMIP6 (keyless), IPCC AR6 Atlas lookup, NOAA SLR projections. Usages : TCFD/CSRD physical risk, due diligence actifs long terme, assurance catastrophe, planification infrastructure. Cache 7j. SLA ≤20s.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`metrics`	No	Métriques à inclure. Défaut : toutes.
`location`	Yes	Localisation : {city, country?} ou {lat, lon}
`scenario`	Yes	Scénario IPCC. 'all' génère une comparaison multi-scénarios.
`horizon_year`	Yes	Année horizon de la projection (2030–2100)
`compare_baseline`	No	Comparer vs baseline 1990-2010 (défaut true)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`location`	Yes
`scenario`	Yes
`projections`	Yes
`horizon_year`	Yes
`quality_score`	Yes
`baseline_period`	No
`ipcc_likelihood_label`	Yes
`business_impact_signals`	Yes
`multi_scenario_comparison`	No

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses read-only nature (projections), cache behavior (7 days), SLA (≤20s), async option, and data sources. It adds detail beyond the annotations (readOnlyHint, destructiveHint), which already indicate safety. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single dense paragraph that front-loads the core purpose. It is concise for the amount of information conveyed, though could be structured with bullet points for improved readability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity, the description covers purpose, scenarios, metrics, outputs, sources, usage contexts, caching, SLA, and async behavior. With full schema coverage and output schema present, it is complete for effective agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters. The description adds context by explaining scenarios and metrics in narrative form, but the schema already covers the semantics. The description enhances understanding but does not add critical missing info.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: long-term climate projections by IPCC scenario for any location. It specifies scenarios, horizons, metrics, outputs, and sources, distinguishing it from sibling tools which are unrelated to climate projections.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists explicit use cases (TCFD/CSRD physical risk, due diligence, insurance, infrastructure planning) and mentions caching and SLA. It does not explicitly state when not to use it or list alternatives, but the use cases are clear and sufficient for an agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

clinical_evidence_brieferC

Read-only

Inspect

Brief évidence clinique (GRADE) — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Answers: Review the clinical evidence for <drug/intervention> in — GRADE rating, key trials, safety signals. · Scan safety signals for in — adverse events, severity, frequency from FAERS and trial data. · Assess comparative effectiveness of versus for — what does the evidence show? · Is there evidence supporting drug repurposing of for — existing trials and GRADE quality? · What are the evidence gaps for in before formulary adoption? Reference case: Semaglutide 2.4mg · Chronic weight management in non-diabetic adults · GRADE high efficacy · studies found · nausea/GI signals · FDA approved · PubMed+ClinicalTrials+OpenFDA. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`topic`	Yes
`max_studies`	Yes
`intervention`	No
`evidence_focus`	Yes		all
`target_disease`	No
`date_range_years`	Yes
`intervention_type`	No

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that the tool returns a 'structured, audited deliverable' and mentions data sources like FAERS and trial data. This is consistent but adds only moderate behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is overly long and cluttered with multiple example questions and a reference case. It lacks clear structure; the information is packed into a single block of text. Not concise, and the front-loading is confusing (jargon like 'Gapup agent-payable C-suite expertise (RISK)').

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 8 parameters, no output schema, and no nested objects, the description should provide more guidance on input usage and return format. It gives examples but does not explain parameter semantics or expected output structure. Incomplete for an effective tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%, yet the description does not explain any of the 8 parameters. It lists example questions but does not map them to parameters like topic, max_studies, evidence_focus, or date_range_years. The description fails to compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool provides clinical evidence briefs with GRADE ratings, key trials, and safety signals. It gives specific verb-resource pairs like 'Review the clinical evidence for <drug/intervention> in <indication>' and 'Scan safety signals for <molecule> in <population>'. However, it does not differentiate from sibling tools like sci_literature_search or clinical_pharma_intel, which may have similar purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides example questions but no explicit guidance on when to use this tool versus alternatives. It does not state conditions when not to use or mention any prerequisites. The examples are generic and do not help an agent decide between this and sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

clinical_pharma_intelA

Read-only

Inspect

Clinical and pharmaceutical intelligence for biotech analysts, healthcare fund managers, pharma BD teams, catalyst-driven hedge funds and health journalists. Aggregates live data across five modes: • trials — active/completed clinical trials (ClinicalTrials.gov v2 + EU CTR in parallel, 450k+ records) • pipeline — full pipeline by sponsor: trial count by phase + top indications • approvals — FDA drug label approvals + mechanism of action (OpenFDA) • recalls — FDA enforcement recalls classified by severity (Class I/II/III) • adverse_events — FAERS aggregated reactions: top 10 reactions + serious%

Signal detection (P0/P1/P2): P0 if Class I recall OR trial terminated for safety reason P1 if serious adverse events >30% OR ≥3 recalls in 12 months P2 otherwise (standard monitoring)

All sources are public and keyless. Optional env OPENFDA_API_KEY raises daily quota from 1,000 to 120,000 requests. SLA: ≤16s p95 (parallel fetch, 8s budget per source). Cache: 6h trials, 24h approvals, 12h recalls, 6h adverse events.

ParametersJSON Schema

Name	Required	Description
`mode`	No	Analysis mode. Default "trials". trials=clinical trials, pipeline=sponsor overview, approvals=FDA approvals, recalls=enforcement, adverse_events=FAERS
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`phase`	No	Filter trials by phase (1/2/3/4/NA). Only applies to modes trials and pipeline.
`query`	Yes	Drug name, indication, sponsor or molecule (e.g. "atezolizumab", "metastatic NSCLC", "Roche", "semaglutide")
`country`	No	ISO 2-letter country code to filter trial sites (e.g. US, FR, DE).
`max_results`	No	Maximum number of results to return. Default 20.
`status_filter`	No	Filter trials by status. Only applies to modes trials and pipeline.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`status`	Yes
`trials`	No
`recalls`	No
`signals`	Yes
`sources`	Yes
`pipeline`	No
`approvals`	No
`quality_score`	Yes
`adverse_events`	No

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond annotations by disclosing important behavioral traits: all sources are public and keyless, an optional API key increases quota, SLA ≤16s p95, and cache durations per mode (6h trials, 24h approvals, etc.). No contradictions with annotations (readOnlyHint=true, openWorldHint=true).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured, starting with purpose, then listing modes with bullet points, then signal detection, then technical details. Every sentence adds value. It is appropriately sized for the tool's complexity (7 parameters, multiple modes). No redundant phrasing.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the high complexity (7 parameters, 100% schema coverage, output schema present), the description is complete. It covers target users, modes, signal detection, data sources, performance (SLA), caching, and optional API key. No gaps in context for effective tool invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds value by explaining the modes in detail, including signal detection logic (P0/P1/P2) and the scope of each mode (e.g., 'trials — active/completed clinical trials'). This enriches what the schema alone provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: providing clinical and pharmaceutical intelligence for specific user groups (biotech analysts, healthcare fund managers, etc.). It lists five distinct modes (trials, pipeline, approvals, recalls, adverse_events) and explicitly describes what each mode does. The scope is well-defined and distinguishes this tool from sibling tools which cover diverse domains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description identifies target users and contexts (e.g., biotech analysts, pharma BD teams, catalyst-driven hedge funds). It explains when to use the tool based on mode selection and mentions signal detection (P0/P1/P2) for urgency. However, it does not explicitly state when not to use this tool or list alternative tools among siblings, which would improve guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cloud_cost_ri_optimizerA

Read-onlyIdempotent

Inspect

Analyzes AWS and Azure cloud pricing data alongside RIPE regional demand trends to generate Reserved Instance purchase recommendations for CTOs. Inputs include target cloud provider, instance family, region, and desired commitment term. Outputs include cost savings percentage, optimal RI quantity, and regional demand insights. Ideal for reducing cloud spend with data-driven decisions. Keywords: cloud cost optimization, reserved instances, AWS pricing, Azure pricing, RIPE demand trends.

ParametersJSON Schema

Name	Required	Description
`term`	No
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	Yes
`utilization`	No
`cloud_provider`	Yes
`instance_family`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`ri_cost`	No
`sources`	No
`warnings`	No
`on_demand_cost`	No
`break_even_months`	No
`regional_demand_score`	No
`cost_savings_percentage`	No
`recommended_ri_quantity`	No

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and openWorldHint=true, covering safety and idempotence. The description adds behavioral context by listing outputs and purpose, but does not disclose data freshness, latency, or limitations. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured: a clear purpose statement followed by input/output lists, a use case, and keywords. It is front-loaded, but the keywords section is somewhat redundant. Overall efficient with minimal waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers main inputs and outputs, and the output schema exists to detail return values. However, it does not explain the 'utilization' parameter, the role of 'RIPE regional demand trends', or when to use the 'async' parameter. More context would improve completeness for a tool with 6 parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema description coverage at 17%, the description partially compensates by explaining cloud_provider, instance_family, region, and term via natural language. However, it omits explanation of 'utilization' and 'async' parameters, which are not self-explanatory. The description adds meaning but is incomplete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool 'Analyzes AWS and Azure cloud pricing data... to generate Reserved Instance purchase recommendations for CTOs.' It specifies the verb (analyze, generate), resource (cloud pricing, RIPE data), and purpose (RI recommendations). This fully defines its purpose and sets it apart from sibling tools, none of which focus on RI optimization.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides usage context ('Ideal for reducing cloud spend with data-driven decisions'), implying when to use it. However, it lacks explicit guidance on when not to use it or alternatives among siblings. No exclusions or comparison to similar tools are given, leaving the agent to infer usage scope.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

code_review_depth_optimizerA

Read-onlyIdempotent

Inspect

As a CTO, this tool analyzes your team's historical DORA metrics (deployment frequency, lead time, MTTR, change failure rate) and GitHub pull request data to recommend an optimal code review depth. Input your repository identifier and time range, and receive a structured recommendation on review rigor (light, standard, thorough) with supporting metrics and risk-adjusted rationale.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`teamSize`	No	Number of active developers in the team
`repository`	Yes	GitHub repository identifier in format owner/repo
`riskTolerance`	No	Organization's risk tolerance level
`timeRangeDays`	Yes	Number of days of historical data to analyze

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`recommendation`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and openWorldHint. The description adds context on the data analyzed (DORA metrics, PR data) and output format. It does not mention potential slowness (despite the async parameter) or authentication requirements, but overall provides sufficient behavioral insight beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: the first explains the tool's role and function, the second lists required inputs and expected output. It is concise, front-loaded, and contains no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 5 parameters and an output schema, the description provides a solid high-level overview. It lacks mention of the async parameter's behavior and how teamSize/riskTolerance affect recommendations. However, given the existence of the output schema, the description is reasonably complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% coverage with descriptions for all 5 parameters. The description repeats that users should input repository and time range but adds no extra semantic meaning to parameters like teamSize, riskTolerance, or async. Thus, it adds minimal value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: analyzing DORA metrics and GitHub PR data to recommend optimal code review depth. It specifies inputs (repository, time range) and output (structured recommendation on review rigor). No sibling tool has this exact function, so it is well-differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for CTOs seeking to optimize code review depth based on historical data. However, it does not explicitly mention when not to use this tool or suggest alternative tools (e.g., dora_metrics_deep_dive). No guidance on prerequisites or limitations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

comp_benchmark_geo_deltaA

Read-onlyIdempotent

Inspect

Compares local compensation benchmarks against HQ standards for CHROs, adjusting for cost-of-living and tax differentials. Inputs include job role, local and HQ locations, and salary range. Outputs include adjusted benchmark delta, cost-of-living multiplier, and tax impact. Keywords: compensation benchmark, geographic pay equity, cost-of-living adjustment, tax differential analysis.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`jobRole`	Yes	Standardized job role (e.g., 'Software Engineer III')
`currency`	No	ISO 4217 currency code (e.g., 'USD')
`baseSalary`	No	Current base salary in local currency
`hqLocation`	Yes	HQ location (ISO 3166-2 code or city, country)
`localLocation`	Yes	Local work location (ISO 3166-2 code or city, country)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`taxImpact`	No	Estimated tax differential percentage
`adjustedSalary`	No	Salary adjusted for cost-of-living and taxes
`benchmarkDelta`	No	Percentage difference between local and HQ benchmark
`confidenceScore`	No	0-1 confidence in data quality
`costOfLivingMultiplier`	No	Local cost-of-living index relative to HQ

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, confirming safe, idempotent operation. The description adds valuable behavioral context: target audience (CHROs), adjustment types (cost-of-living, tax), and specific outputs. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: three sentences with front-loaded purpose, clear input/output listing, and relevant keywords. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description adequately covers purpose, inputs, and outputs. It mentions all key outputs (delta, multiplier, tax impact) and target audience, making it complete for selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema covers 100% of parameters with clear descriptions. The description rephrases some parameters (job role, locations, salary range) but does not add new semantics beyond the schema, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it compares local compensation benchmarks to HQ standards for CHROs, adjusting for cost-of-living and tax differentials. This verb-resource combination is specific and distinguishes it from sibling tools like executive_comp_peer_benchmark or global_salary_inflation_adjuster.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for geographic pay equity adjustments via its list of inputs and output types, but it does not explicitly state when to use this tool versus alternatives or provide exclusion criteria. Keywords help, but guidance is not direct.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitive_deep_diveA

Read-only

Inspect

Gold-standard competitive deep dive — STRUCTURED multi-source data (no LLM narrative). Pair tool: competitor_intel for LLM-narrated board briefing + slide script. Aggregates Wikipedia, Yahoo Finance, SEC EDGAR, Wayback Machine, DuckDuckGo, HackerNews, domain scraping — all keyless. Returns agent-shaped JSON: KPIs (funding, employees, revenue, market cap), P0/P1/P2 competitive signals, pricing radar, competitor comparison matrix, Wayback timeline, positioning (sector/industry/icp_hypothesis/moat_signals), quality score. Every field is sourced or marked unavailable — no hallucinated figures. SLA: p50 ~25s, p95 ~30s · score 80+ on listed targets (US/EU/foreign) · score ~40 on private companies (no EDGAR/Yahoo data). Use sync for batch agents (≤30s tolerance). Use competitive_deep_dive_async + competitive_deep_dive_result(job_id) for conversational agents. Inputs: company name or domain (required), optional competitor list (≤5), optional depth (easy/medium/hard).

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`depth`	No	Research depth: 'easy' = Wikipedia + DDG (fast, ~15s); 'medium' = + Yahoo Finance + EDGAR + Wayback (default, ~45s); 'hard' = + HackerNews + domain surfaces + competitor deep dive (~120s)
`company`	Yes	Name or domain of the target company (e.g. 'Salesforce', 'notion.so', 'HubSpot CRM')
`competitors`	No	Optional list of competitor names or domains to include in the comparison matrix (max 5)

Output Schema

ParametersJSON Schema

Name	Required	Description
`kpis`	Yes	Key Performance Indicators sourced from public data
`company`	Yes
`quality`	Yes
`signals`	Yes	Competitive intelligence signals, severity-ranked P0 (critical) to P2 (informational)
`sources`	Yes
`comparison`	Yes	Feature/dimension comparison between target and each competitor
`depth_used`	Yes
`positioning`	Yes	Positioning analysis derived from public data
`generated_at`	Yes
`pricing_radar`	Yes	Pricing tiers extracted from public sources
`domain_resolved`	Yes
`wayback_timeline`	Yes	Historical snapshots of the company website from Wayback Machine

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false. Description adds behavioral details: multi-source aggregation, keyless operation, no hallucinated figures, data sourcing and quality scores. Adds transparency about performance (p50/p95) and data coverage by company type.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with clear sections: main purpose, sync/async guidance, parameter details, features, limitations. While dense, every sentence provides useful information. Could be slightly more concise but earns its length.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Description is comprehensive: covers tool purpose, usage context, input parameters, output format (structured JSON with listed fields), performance characteristics, limitations (private companies score ~40), and error behavior (sourced vs unavailable). Output schema is mentioned as 'agent-shaped JSON' with detailed field list.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all 4 parameters with descriptions. Description adds valuable context: what each depth level includes ('easy' = Wikipedia+DDG, etc.), that async returns job_id immediately, and that competitor list is optional with max 5. This goes beyond schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool performs a competitive deep dive using structured multi-source data, explicitly distinguishing from sibling 'competitor_intel' which generates LLM-narrated board briefings. The verb 'deep dive' and resource 'competitive' are specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly mentions when to use sync vs async based on agent type (batch agents with ≤30s tolerance vs conversational agents). Names sibling tools `competitive_deep_dive_async` and `competitor_intel` as alternatives. Provides SLA times and quality scores.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitive_deep_dive_asyncA

Read-only

Inspect

Async variant of competitive_deep_dive. Returns immediately (<200ms) with a job_id. The research runs in the background (p50≈25s, p95≈30s for depth=medium). Poll the result with competitive_deep_dive_result(job_id) after the eta_seconds hint. Use this instead of competitive_deep_dive when the agent cannot wait >15s for a response. Inputs: same as competitive_deep_dive — company (required), competitors (optional list, max 5), depth (easy/medium/hard, default medium). Async tool — register a webhook via webhooks_manage(register, url, [job.completed]) to receive callbacks instead of polling. Faster + lighter.

ParametersJSON Schema

Name	Required	Description
`depth`	No	Research depth: 'easy'≈15s, 'medium'≈30s (default), 'hard'≈60s
`company`	Yes	Name or domain of the target company (e.g. 'Salesforce', 'notion.so')
`competitors`	No	Optional list of competitor names or domains to include in the comparison matrix (max 5)

Output Schema

ParametersJSON Schema

Name	Required	Description
`job_id`	Yes	Unique job identifier — pass to competitive_deep_dive_result
`status`	Yes	Always 'queued' on submission
`eta_seconds`	Yes	Estimated seconds until result is ready
`submitted_at`	Yes	ISO-8601 submission timestamp

Tool Definition Quality

A4.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description explains timing and retrieval, but annotations say readOnlyHint=true, which contradicts the creation of a job resource as implied by 'returns with a job_id'. This inconsistency undermines trust.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured, front-loading key async properties and then usage. Slightly verbose but every sentence adds value; could be tightened.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers async behavior, timing, polling, webhooks, and input equivalence. Minor gaps like job expiry or error handling, but output schema likely fills those. Adequate given complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% so baseline is 3, but description adds value by specifying that inputs are identical to competitive_deep_dive and noting default depth. Slightly more helpful.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is the async variant of competitive_deep_dive, returns immediately with a job_id, and focuses on background research. It distinguishes from siblings by naming the synchronous counterpart and the result polling tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says to use when the agent cannot wait >15s, and provides two retrieval methods: polling via competitive_deep_dive_result or webhook via webhooks_manage. Clearly contrasts with the synchronous alternative.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitive_deep_dive_resultA

Read-onlyIdempotent

Inspect

Poll the result of a competitive_deep_dive_async job. Returns status=pending while running, status=completed with the full report once done, status=failed on error, or status=not_found if the job_id is unknown or expired (TTL 24h). Call this after the eta_seconds hint returned by competitive_deep_dive_async.

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	Yes	The job_id returned by competitive_deep_dive_async

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, idempotent, non-destructive. Description adds concrete details about status transitions and TTL, going beyond annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: the first states the action, the second elaborates on responses and usage. No fluff, efficient and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description adequately covers polling behavior, statuses, TTL, and invocation timing. No gaps identified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the schema description of job_id is identical to the description's mention. The description adds no new semantic meaning beyond restating the source of the ID.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool polls the result of an asynchronous job, listing possible statuses. It distinguishes from siblings by specifically referencing competitive_deep_dive_async, the paired async tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs to call after the ETA hint from competitive_deep_dive_async, providing clear timing guidance. Lacks explicit when-not-to-use statements, but the context is strong.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitor_intelA

Read-onlyIdempotent

Inspect

LLM-narrated competitive-intelligence BRIEFING — for human consumption (board meeting, pitch prep). Pair tool: competitive_deep_dive for raw structured multi-source data (agent-shaped JSON). Returns: recent competitor moves with severity (critical/high/medium/low), prioritised signals, pricing-radar comparison, 3-6 quantified recommendations (impact in € or %, 7/30/90/180-day horizons), and an 8-12 slide presenter script. Use when the buyer wants a narrative briefing or a deck. Inputs: your company (name + one-paragraph pitch) + 1-10 competitors. Delivered by Manue, AI CMO of the Gapup portfolio.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No	Optional — what the buyer wants to track first (e.g. pricing moves, hiring patterns)
`competitors`	Yes	1-10 competitors to analyze
`selfCompany`	Yes	Your company info

Output Schema

ParametersJSON Schema

Name	Required	Description
`kpis`	No	3-5 headline KPI bubbles
`sources`	No	Cited sources
`pricingRadar`	No	Pricing comparison across competitors
`competitorMoves`	Yes	Recent moves per competitor with severity rating
`presenterScript`	Yes	8-12 slide board presenter script
`recommendations`	Yes	3-6 actionable strategic recommendations
`executiveSummary`	Yes	Board-ready prose summary (120-400 chars)

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint, and destructiveHint. The description adds context about the narrative output format and async behavior, but no additional critical behavioral disclosures are needed. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is fairly concise and front-loaded with key purpose and outputs. Some redundancy ('for human consumption (board meeting, pitch prep)') could be trimmed, but overall it is efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, output schema exists), the description covers the essential inputs, outputs, and use case. The presence of an output schema means return value details are not required. Complete for selection purposes.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All parameters have schema descriptions (100% coverage), but the description adds meaning by explaining that selfCompany requires a 'one-paragraph pitch' and that async controls synchronous vs. polling behavior. This adds value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is an 'LLM-narrated competitive-intelligence BRIEFING — for human consumption' and lists specific outputs. It distinguishes from the sibling tool `competitive_deep_dive` by contrasting narrative vs. raw structured data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Use when the buyer wants a narrative briefing or a deck' and mentions pairing with `competitive_deep_dive` for raw data. Provides clear guidance on when to use this tool vs. alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitor_movesB

Read-only

Inspect

Mouvements concurrents — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: What have my named competitors done recently — releases, pricing changes, hires, funding? · Which competitor signals are the most urgent right now and what should I do about them? Reference case: Notion — moves de ClickUp, Asana, Coda. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`competitors`	Yes
`selfCompany`	Yes

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that it returns a 'structured, audited deliverable' and that inputs are validated server-side, which provides some behavioral context but is limited.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description mixes French and English, includes a reference case and bullet points, but could be more concise. It has some structure but is not optimally front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested objects, no output schema), the description is incomplete. It fails to explain the deliverable format, how urgency is determined, or how to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 25%, yet the description does not explain the key parameters (selfCompany, competitors, focus). It vaguely says 'send the documented case fields' without adding meaning to the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool's purpose: to return structured competitor movements (releases, pricing, hires, funding) and urgent signals. The description differentiates from sibling tools like competitor_intel or competitor_profiles by focusing on 'moves' and specific questions it answers.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The sibling list includes many competitor-related tools, but the description does not help an agent decide which one to pick.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitor_pricing_radarB

Read-only

Inspect

Radar pricing concurrents — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: How do my competitors' pricing plans and monthly prices compare to mine? · Which competitor plan undercuts or out-features my equivalent tier? Reference case: Notion — pricing vs ClickUp, Asana, Coda. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`competitors`	Yes
`selfCompany`	Yes

Tool Definition Quality

B3.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint and openWorldHint; description adds that it returns a structured, audited deliverable and mentions async behavior via the async parameter. This supplements the annotations well, though it doesn't detail all nuances like rate limits or data freshness.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately concise but contains mixed language (French/English) and extraneous phrases like 'Gapup agent-payable C-suite expertise (CMO).' The bullet points and reference case add clarity but could be streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description states it returns a structured, audited deliverable and answers specific questions, but lacks detail on the output format or fields, given no output schema. For a complex tool, more completeness would be beneficial.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25%, yet the description does not explain the purpose or constraints of the four parameters (async, focus, competitors, selfCompany). It merely says 'send the documented case fields' without elaboration, adding minimal value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool compares competitor pricing plans and monthly prices, answering specific questions. The verb 'Radar' is unconventional but the purpose is evident. Distinguishes from sibling tools like competitor_pricing_scrape by emphasizing ongoing monitoring and structured deliverables.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides a reference example (Notion vs ClickUp, Asana, Coda) and mentions server-side validation, but does not explicitly state when to use this tool versus alternatives or when not to use it. No exclusions or comparisons to sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitor_pricing_scrapeA

Read-only

Inspect

Scrape and parse a competitor pricing page from a URL or domain. Fetches via proxy-aware timedFetch (tries /pricing, /plans, homepage fallback), then extracts: plan names, prices, billing cadence (monthly/annual/usage-based/one-time), key features, free tier presence, enterprise tier, estimated price range. Returns structured pricing tiers. If unfetchable or no pricing found (anti-bot, SPA, auth wall): returns a clear degraded result with warnings and signals — never fake success. ICP: founders, product managers, pricing strategists, competitive intel teams. Proxy-aware (AICI_RESEARCH_PROXY_URL). Cache TTL 6h.

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	Competitor URL or domain (e.g. 'https://notion.so/pricing', 'notion.so', 'https://www.example.com'). For best results, provide the direct pricing page URL.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.

Output Schema

ParametersJSON Schema

Name	Required	Description
`tiers`	Yes
`domain`	Yes
`status`	Yes
`warnings`	Yes
`url_fetched`	Yes
`has_free_tier`	Yes
`pricing_found`	Yes
`quality_score`	Yes
`raw_price_signals`	Yes
`has_enterprise_tier`	Yes
`plan_names_detected`	Yes
`billing_model_signals`	Yes
`estimated_price_range`	Yes

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds valuable behavioral context: fetches via proxy-aware timedFetch, tries specific URL patterns, returns degraded results on failure (anti-bot, SPA, auth wall), and 'never fake success'. This exceeds annotation coverage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with the core action first, then details on behavior, output, and ICP. It is dense but each sentence adds value. Slightly verbose for a tool with full schema coverage, but still efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multi-step scraping, fallback behavior, error handling, and structured output), the description covers all necessary aspects: URL handling, extraction fields, failure modes, caching, and target users. The output schema (not shown) likely complements this well.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with clear parameter descriptions. The description adds minimal extra value beyond the schema (e.g., 'for best results, provide the direct pricing page URL'). Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Scrape and parse a competitor pricing page' and lists specific extracted data (plan names, prices, billing cadence, etc.). The verb is specific and the resource is well-defined, distinguishing it from siblings like competitor_pricing_radar.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides ICP (founders, product managers, etc.), recommends providing direct pricing page URL, and mentions proxy-awareness and cache TTL. However, it does not explicitly contrast with similar sibling tools like competitor_pricing_radar, leaving some ambiguity about when to choose this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitor_profilesB

Read-only

Inspect

Profils concurrents — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: What are the strengths, weaknesses and positioning of each of my competitors? · Give me a SWOT-style profile of a named competitor. Reference case: Notion — profils de ClickUp, Asana, Coda. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`competitors`	Yes
`selfCompany`	Yes

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side, but does not disclose additional behavioral aspects like rate limits, async behavior consequences, or deliverable format. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description consists of 5 sentences, mixing French and English. While it conveys key points, the first sentence is redundant with the title, and the structure could be tightened by removing the French phrase and focusing on English. Acceptable but not optimal.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 4 parameters, nested objects, and no output schema, the description is incomplete. It does not clarify required vs optional nested fields, the format of the deliverable, or how to handle the async parameter (e.g., polling instructions). The reference case is helpful but insufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (only 'async' is described). The description mentions 'send the documented case fields' but does not explain the purpose or constraints of 'focus', 'competitors', or 'selfCompany' parameters beyond the schema. Nested object fields lack guidance on what constitutes a valid case.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a structured, audited deliverable about competitor strengths, weaknesses, positioning, and SWOT-style profiles, with a concrete reference case (Notion). The purpose is specific to competitor profiling, though the mixed French/English language slightly reduces clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Example questions are provided ('What are the strengths...', 'Give me a SWOT-style profile'), implying when to use the tool. However, no explicit guidance on when not to use it or how it differs from siblings like competitor_intel or competitive_deep_dive, which share similar functionality.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitor_recommendationsC

Read-only

Inspect

Recommandations concurrentielles — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: Given my competitors, what strategic actions should I take and in what order? · What should my 7/30/90/180-day competitive response plan look like? Reference case: Notion — actions face à ClickUp, Asana, Coda. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`competitors`	Yes
`selfCompany`	Yes

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint: true and openWorldHint: true. The description adds that it returns an audited deliverable and inputs are validated server-side, but does not elaborate on pacing, cost, or other behavioral traits. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is reasonably concise but includes filler text and a mix of languages. The reference case could be seen as valuable but adds length. Overall, it is adequate but not optimized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 4 parameters, low schema coverage, no output schema, and many sibling tools, the description is incomplete. It fails to explain the return format, the async option, or the 'focus' parameter. The complexity of the tool is not fully addressed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (25%), only the async parameter is documented. The description does not add meaning for the other parameters (focus, competitors, selfCompany) beyond stating that inputs are validated server-side. This is insufficient for the agent to understand parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides competitive recommendations with strategic actions and timelines, and includes a reference case (Notion). However, it is somewhat verbose and mixes English and French, which slightly reduces clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines1/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is given on when to use this tool versus the many sibling tools (e.g., competitive_deep_dive, competitor_intel, competitor_moves). The description lacks any selection criteria or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

comp_plan_architectB

Read-only

Inspect

Architecture plan de commissionnement — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Comp Plan 8 rôles commerciaux · OTE €65-280k · Budget comp €2.1M · Quota coverage 3.2×. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`targets`	Yes
`geography`	No
`salesTeam`	Yes
`currentChallenges`	Yes
`preferredStructure`	No

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, and the description confirms it returns a deliverable without modifying data. It adds server-side validation context, but does not elaborate on other behaviors like rate limits or output structure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise with two sentences and a reference case. However, the first sentence is wordy, and the structure could be more streamlined. It earns its place but is not exceptionally tight.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity with 7 parameters, nested objects, and no output schema, the description is incomplete. It lacks information on output format, prerequisites, and interpretation of results. The reference case helps but does not fully compensate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 14%, meaning most parameters lack descriptions in the schema. The description does not compensate by explaining parameters; it only vaguely refers to 'documented case fields.' The reference case provides an example but not explicit parameter mapping.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it produces a compensation plan architecture deliverable, with a specific reference case. However, the use of French and jargon ('Gapup agent-payable C-suite expertise') may reduce clarity for some agents.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies it is used for designing compensation plans via a reference case, but does not explicitly state when to use this tool versus alternatives like comp_benchmark_geo_delta or executive_comp_peer_benchmark. No exclusion criteria are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_audience_profileA

Read-only

Inspect

Return the audience targeting profile of a content entity — its enrichment tags reframed as audience facets with confidence, corroboration and full provenance (verifiable, sourced). The response also carries an entity-level provenance block (average confidence, data freshness). When to use this tool: an ad-tech or marketing agent needs a machine-readable, verifiable audience descriptor for a franchise or work. Input: an entity_id and its type.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entity_id`	Yes	Entity id from content_catalog
`entity_type`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`entity_id`	Yes
`provenance`	Yes	Entity-level trust & freshness summary.
`entity_type`	Yes
`audience_facets`	Yes	Map facet → array of { label, confidence, corroboration, source_count, sources }

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark the tool as readOnlyHint=true. The description adds value by detailing that the response includes confidence, corroboration, full provenance, and an entity-level provenance block with average confidence and data freshness. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (three sentences) and front-loaded with the main purpose, followed by use case and input. Every sentence adds value without fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 3 parameters and an output schema, the description covers the key behavioral aspects including provenance and data freshness. It does not mention the async parameter explicitly, but the schema handles that. Overall, adequate for context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 67% (2 of 3 parameters have descriptions). The description mentions 'Input: an entity_id and its type' but adds no new parameter-level detail beyond the schema. Baseline for moderate coverage is 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns the audience targeting profile of a content entity, reframing enrichment tags as audience facets with confidence, corroboration, and provenance. It specifies the use case for ad-tech or marketing agents and distinguishes from sibling tools by its focus on machine-readable audience descriptors.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'When to use this tool: an ad-tech or marketing agent needs a machine-readable, verifiable audience descriptor for a franchise or work.' This provides clear context for usage, though it does not explicitly mention when not to use or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_catalogA

Read-only

Inspect

Browse the Gapup gold-standard content catalogue — video games, films, TV series and music. Returns franchises with their works (title, release year). When to use this tool: an agent needs structured, audited metadata for a cultural franchise, wants to resolve a title to a canonical entity, or browses a domain's catalogue before requesting enrichment. Inputs: a content domain and an optional case-insensitive name filter. Each franchise id can be passed to content_enrichment for its fine-grained tag profile.

ParametersJSON Schema

Name	Required	Description
`name`	No	Optional case-insensitive substring filter on franchise name
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No	Maximum number of franchises to return (default 20)
`domain`	Yes	Content domain to browse

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`domain`	Yes
`franchises`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds context beyond annotations (readOnlyHint, openWorldHint) by detailing the return structure (franchises with works) and the relationship to content_enrichment. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, well-structured, and front-loaded with the tool's purpose. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, simple return), the description fully covers what the tool does, its inputs, outputs, and integration path to content_enrichment. The presence of an output schema reduces the need for detailed return documentation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the description still adds value by explaining the name filter as case-insensitive substring and noting the default limit of 20. It clarifies the domain parameter's allowed values and the async option's behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool browses a content catalogue of video games, films, TV series, and music, returning franchises with works. It mentions resolving titles to canonical entities. However, it does not explicitly differentiate among sibling content tools like content_audience_profile or content_ranking.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool: needing structured metadata for a cultural franchise, resolving a title, or browsing before enrichment. It also mentions passing franchise id to content_enrichment. No when-not or alternatives are given, but the use cases are clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_compareA

Read-only

Inspect

Compare the tag profiles of two content entities (franchises or works) and measure how similar they are. Returns a Jaccard similarity score, the list of shared tags, the tags unique to each entity, and a breakdown of shared tags by facet. When to use this tool: an agent needs to compare two franchises or works (e.g. 'how similar are Dark Souls and Elden Ring?', 'what do Street Fighter and Mortal Kombat have in common?', 'on which axes do these two games differ?'), find positioning overlap, identify cross-sell opportunities, or answer 'if you liked X you might like Y' questions backed by data. Works for any domain (video-games, music, film, tv).

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entity_a`	Yes	Id of the first entity from content_catalog (e.g. 'game-dark-souls', 'music-daft-punk').
`entity_b`	Yes	Id of the second entity from content_catalog (e.g. 'game-elden-ring', 'music-justice').
`entity_type`	No	Whether both ids are franchises or works (applies to both). Defaults to 'franchise'.

Output Schema

ParametersJSON Schema

Name	Required	Description
`entity_a`	Yes
`entity_b`	Yes
`similarity`	Yes	Jaccard index = \|shared\| / \|union\|, rounded to 2 decimal places. 0 = no overlap, 1 = identical profiles.
`a_tag_count`	Yes
`b_tag_count`	Yes
`entity_type`	Yes
`shared_tags`	Yes	Tags present in both entities (up to 40).
`unique_to_a`	Yes	Tags present only in entity_a (up to 40).
`unique_to_b`	Yes	Tags present only in entity_b (up to 40).
`shared_count`	Yes
`shared_by_facet`	Yes	Count of shared tags per facet (e.g. { genre: 3, theme: 5 }). Shows which dimensions drive the similarity.

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. Description adds detail about return structure (Jaccard score, shared tags, etc.) but doesn't mention any behavioral traits beyond what annotations cover. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two compact paragraphs: first defines functionality and outputs, second provides usage guidance with examples. No redundant sentences; each adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given output schema exists and schema coverage is 100%, the description covers purpose, usage, and domain. Lacks error handling or edge cases, but overall sufficient for selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with each parameter described clearly (entity_a, entity_b as IDs from content_catalog; entity_type enum). Description does not add additional parameter-level detail beyond the schema, so baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states: 'Compare the tag profiles of two content entities...' with specific verb and resource, and lists return values. Differentiates from siblings like content_similar by specifying tag profile comparison with Jaccard similarity and facet breakdown.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit 'When to use this tool' section with concrete examples (e.g., 'how similar are Dark Souls and Elden Ring?') and use cases (cross-sell, recommendations). Clearly tells when to apply.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_discoveryA

Read-only

Inspect

Discover content franchises within a domain. Two modes: pass tag for a precise taxonomy match (every game tagged 'co-op'), or pass query for free-text SEMANTIC search powered by pgvector embeddings — finding franchises by meaning ('dark atmospheric games about isolation') even when no literal tag matches. Results are verifiable: tag mode carries tag confidence/corroboration, semantic mode carries a similarity score; both carry entity freshness. When to use: an agent wants a domain-scoped shortlist by tag or by intent. Inputs: a domain plus either a tag or a free-text query.

ParametersJSON Schema

Name	Required	Description
`tag`	No	Tag label to match precisely (e.g. 'thriller', 'co-op'). Mutually exclusive with `query`.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No	Maximum franchises to return (default 25)
`query`	No	Free-text intent for semantic search (e.g. 'melancholic synth-pop about heartbreak'). Mutually exclusive with `tag`.
`domain`	Yes	Content domain to search within

Output Schema

ParametersJSON Schema

Name	Required	Description
`tag`	No
`count`	Yes
`query`	No
`domain`	Yes
`method`	Yes
`franchises`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds verifiability details beyond annotations (tag confidence/corroboration, similarity score, freshness). However, it omits the async polling behavior, which is present in the input schema but not in the description.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (4 sentences), front-loads the purpose, then covers modes, results, and usage in a logical flow with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description sufficiently covers purpose, modes, and result verification. It lacks mention of async behavior, but overall it is complete for a read-only tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. Description adds semantic meaning: tag for precise match, query for free-text semantic search, and explicitly states mutual exclusivity, going beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool discovers content franchises within a domain, and distinctively describes two modes (tag for precise taxonomy match, query for semantic search), which differentiates it from sibling tools like content_catalog or content_ranking.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use the tool ('agent wants a domain-scoped shortlist by tag or intent') and outlines the two input modes with mutual exclusivity, but does not explicitly state when not to use it or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_engineC

Read-only

Inspect

Moteur de contenu — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Notion — content engine 2026 (productivity B2B). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`brand`	Yes
`months`	Yes
`cluster`	Yes
`maxArticlesPerMonth`	Yes

Tool Definition Quality

C2.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds that the deliverable is 'audited' and inputs are 'validated server-side', which provides some behavioral insight beyond the annotations (readOnlyHint, openWorldHint). However, it does not disclose auth requirements, rate limits, or what happens if inputs are invalid. Since annotations already provide readOnlyHint, the bar is lowered, and the description does not contradict them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is only two sentences and does not waste words, but it is too sparse to be effective. It lacks structure and fails to front-load critical information. Every sentence earns its place, but the content is insufficient for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the nested parameters, lack of output schema, and many sibling tools, the description is critically incomplete. It does not explain the deliverable's structure, how parameters map to the output, or what the reference case implies. The 20% schema coverage exacerbates the gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, so the description should compensate, but it does not describe any parameter details. It only says 'send the documented case fields', which is vague. The schema contains nested objects with required fields (brand, cluster, etc.), but the description adds no meaning to these or how they affect the deliverable.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose2/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states 'Returns a structured, audited deliverable' but does not specify the type of content (e.g., article, strategy, calendar) or a clear action (e.g., generate, analyze). The tool name 'content_engine' is generic, and the sibling tools include many content-related tools, yet no differentiation is provided. The reference to 'Notion — content engine 2026' gives context but is insufficient for clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No usage guidance is provided; there is no mention of when to use this tool compared to alternatives. The description only notes that inputs are validated server-side, which is a technical detail, not a usage condition. Exclusions, prerequisites, and when-not-to-use scenarios are absent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_enrichmentA

Read-only

Inspect

Return the enriched tag profile of a content entity — the Gapup moat. Each tag carries a facet (genre, theme, play-mode, perspective…), a confidence score, a corroboration score and its full provenance (which sources corroborated it, when). The response also carries an entity-level provenance block (average confidence, data freshness). When to use this tool: an agent has a franchise or work id (from content_catalog) and needs a fine-grained, machine-readable, verifiable characterisation for matching, recommendation, contextual targeting or analysis. Inputs: an entity id and its type.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entity_id`	Yes	Entity id from content_catalog (e.g. 'music-daft-punk', 'film-the-dark-knight-collection:the-dark-knight')
`entity_type`	No	Whether the id is a franchise or a work (default franchise)

Output Schema

ParametersJSON Schema

Name	Required	Description
`tags`	Yes
`entity_id`	Yes
`tag_count`	Yes
`provenance`	Yes	Entity-level trust & freshness summary.
`entity_type`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint and openWorldHint. The description adds value by detailing the response structure (facets, scores, provenance) and entity-level provenance block, but does not contradict annotations. It could further mention any rate limits or response time expectations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (four sentences) with no extraneous text. It is front-loaded with the core purpose, followed by output details, usage guidance, and inputs. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity, the presence of an output schema, and full schema coverage, the description sufficiently explains the tool's purpose, inputs, output structure, and appropriate use cases. It is complete without being redundant.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema covers all three parameters with full descriptions, so the description adds little beyond stating 'an entity id and its type.' The mention of 'from content_catalog' provides helpful context, but overall parameter semantics are adequately covered by the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Return the enriched tag profile of a content entity' with a specific verb and resource. It distinguishes itself from siblings (e.g., content_catalog, content_audience_profile) by highlighting fine-grained, machine-readable output with provenance, making its purpose clear and unique.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a clear 'When to use this tool' section detailing prerequisites (having a franchise or work id from content_catalog) and use cases (matching, recommendation, targeting). It lacks explicit exclusions or alternative tools, but provides strong contextual guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_evergreen_score_analyzerA

Read-onlyIdempotent

Inspect

Evaluates content evergreen potential for CMOs by analyzing historical traffic patterns and backlink authority. Takes a content URL and optional time range, returns an evergreen score (0-100), traffic trend analysis, and backlink profile. Ideal for content strategy planning, SEO optimization, and identifying high-value evergreen assets. Uses Wayback Machine and Common Crawl public APIs.

ParametersJSON Schema

Name	Required	Description
`url`	Yes	Content URL to analyze
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`toDate`	No	End date for historical analysis (YYYY-MM-DD)
`fromDate`	No	Start date for historical analysis (YYYY-MM-DD)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`lastSeen`	No
`warnings`	Yes
`firstSeen`	No
`trafficTrend`	Yes
`backlinkCount`	No
`evergreenScore`	Yes
`backlinkDomains`	No

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, openWorldHint. The description adds useful behavioral context: uses Wayback Machine and Common Crawl public APIs, indicating external data sources. Does not mention rate limits or latency, but the async parameter addresses potential slowness.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first sentence states core function and audience, second provides use cases and data sources. No redundant or unnecessary information. Efficient and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With annotations covering safety and idempotency, and an output schema assumed, the description covers purpose, targeted user, inputs, outputs, use cases, and data sources. No gaps for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptive parameter definitions. The description adds value by summarizing the output (evergreen score, traffic trend, backlink profile), which goes beyond the schema. Provides enough context for parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool evaluates 'content evergreen potential' for CMOs, with specific outputs (score, traffic trend, backlink profile). It distinguishes from siblings like 'content_ranking' by focusing on evergreen assets and using historical data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly identifies use cases: content strategy planning, SEO optimization, identifying high-value evergreen assets. Lacks explicit exclusions or alternatives, but the context is clear enough for an agent to decide when to invoke.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_provenanceA

Read-only

Inspect

Audit the full data provenance of a content entity — all its enrichment tags with their extraction source, corroboration score, source list and last verification date, plus an entity-level freshness summary. Use this tool before citing or relying on enriched content data in a high-stakes context (ad targeting, editorial, analysis). Inputs: entity_id (required) and entity_type (franchise or work).

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entity_id`	Yes	Entity id from content_catalog (e.g. 'video-game-elden-ring')
`entity_type`	No	Whether the id is a franchise or a work (default: franchise)

Output Schema

ParametersJSON Schema

Name	Required	Description
`lineage`	Yes	Full tag lineage from v_data_lineage — one entry per tag.
`entity_id`	Yes
`entity_type`	Yes
`freshness_summary`	Yes	Entity-level freshness & trust summary from v_entity_freshness.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations mark it as read-only and open-world. The description adds behavioral context by detailing the return fields (tags, source, score, freshness). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences plus a brief input list. Every sentence adds value, front-loaded with purpose. No extraneous content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and the presence of an output schema, the description covers the essential aspects: what is audited, the use case, and inputs. It is fully adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description restates the parameters but adds little new meaning beyond 'entity_id from content_catalog' and enum values. The async parameter is not mentioned.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool audits data provenance, listing specific fields (enrichment tags, extraction source, corroboration score, etc.). This distinguishes it from sibling tools like content_catalog or content_enrichment.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explicitly advises using this tool before relying on enriched content in high-stakes contexts. Though it does not mention when not to use or alternatives, the guidance is clear and actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_rankingA

Read-only

Inspect

Return the TOP-ranked content entities in a category, by a chosen criterion — the direct answer to superlative / decision queries: 'best video games', 'top RPGs', 'cheapest games', 'best value RPGs', 'best FPS playable right now', 'most popular music artists'. Criteria: critic_score, popularity, price, value (critic score per unit price). direction flips it (asc = cheapest/lowest first). available_only restricts to entities currently buyable. Sliceable by genre and release-year window; every result carries its score, price and source. When to use: an agent must produce a ranked shortlist to support a recommendation, a purchase or a 'what is the best X' decision.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`genre`	No	Optional genre filter, e.g. 'RPG', 'FPS', 'thriller'
`limit`	No	Number of ranked results (default 20)
`domain`	Yes	Content domain to rank within
`year_to`	No	Optional latest release year
`criterion`	No	critic_score (0-100, default) · popularity · price · value (critic score per unit price)
`direction`	No	desc = best/highest first (default); asc = cheapest/lowest/least first. Defaults to asc for price.
`year_from`	No	Optional earliest release year
`available_only`	No	If true, restrict to entities currently available to buy/play (default false)

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`genre`	No
`domain`	Yes
`ranking`	Yes
`year_to`	No
`criterion`	Yes
`direction`	No
`year_from`	No
`available_only`	No

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds context by explaining that results carry score, price, and source, and details the default behavior for direction (desc except asc for price). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is concise, front-loaded with purpose and examples, then details parameters efficiently. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, the description need not explain return values. It covers purpose, usage, key parameters, and behavioral traits, making it complete for an agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, providing baseline 3. The description adds meaning beyond the schema by explaining criteria like 'value (critic score per unit price)', default direction for price, and the effect of available_only. It also clarifies that results can be sliced by genre and release-year window.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description explicitly states 'Return the TOP-ranked content entities in a category, by a chosen criterion' and provides concrete examples like 'best video games', 'top RPGs'. It clearly identifies the tool as the direct answer to superlative/decision queries, distinguishing it from sibling tools like candidate_screening_ranking which rank different entities.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a dedicated 'When to use' section stating the tool should be used when an agent must produce a ranked shortlist for recommendations or purchase decisions. While it does not explicitly mention alternatives, the context is clear and helps an agent decide applicability.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_similarA

Read-only

Inspect

Find content entities similar to a given one. For embedded franchises this uses SEMANTIC vector similarity (pgvector) over the enrichment profile — surfacing entities that feel alike even when their tags differ literally. Falls back to shared enrichment-tag overlap for works or non-embedded entities. Each result carries a similarity score and its entity-level freshness/confidence (verifiable, sourced). When to use this tool: an agent wants recommendations or lookalikes for a franchise or work. Input: an entity_id and its type.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No
`entity_id`	Yes	Entity id from content_catalog
`entity_type`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`method`	Yes	How similarity was computed.
`similar`	Yes
`entity_id`	Yes
`source_provenance`	Yes	Provenance of the source entity used to compute similarity.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint and openWorldHint. The description adds algorithmic details (pgvector for embedded franchises, tag overlap fallback) and mentions result fields (similarity score, freshness/confidence). This goes beyond annotations, though some details like potential latency (async parameter) are not explained.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Every sentence adds value: purpose, algorithmic explanation, result contents, usage guidance. No redundant information. Front-loaded with verb and resource.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, algorithm, when to use, and result fields. Given the output schema exists, return values are covered. However, it omits discussion of edge cases (e.g., no similar entities) and async behavior, but overall sufficient for a retrieval tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 50%, with async and entity_id described in schema. The description adds mapping for entity_type (franchise or work) and context that entity_id comes from content_catalog, but does not describe limit or async beyond schema. Minimal added value over schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool finds similar content entities using semantic vector similarity for franchises and tag overlap for others. It specifies the input (entity_id and type) and distinguishes from siblings like content_catalog or content_discovery by detailing the algorithmic approach.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'when to use this tool: an agent wants recommendations or lookalikes for a franchise or work.' While it doesn't list exclusions or alternatives, the context is clear and useful for an agent deciding between this and other tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_taxonomyA

Read-only

Inspect

Return the enrichment taxonomy of a content domain — every tag grouped by facet (genre, theme, mood, play-mode…). When to use this tool: an agent needs the controlled vocabulary to filter, classify or query content. Input: a domain.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`domain`	Yes	Content domain

Output Schema

ParametersJSON Schema

Name	Required	Description
`domain`	Yes
`taxonomy`	Yes	Map facet → array of tag labels
`tag_count`	Yes
`facet_count`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, indicating a safe read-only operation. The description adds no further behavioral details beyond the return type, which is consistent but does not enhance transparency beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with no unnecessary words. The first sentence establishes purpose, the second provides usage context. Highly efficient and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the low complexity, presence of output schema, and thorough annotations, the description provides enough context for the agent to use the tool correctly. Could optionally mention the structure of the returned taxonomy but is not necessary.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description's mention of 'Input: a domain' adds no meaningful information beyond the schema's enum description. The async parameter is fully described in the schema, resulting in no additional semantic value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Return' and specifies the resource 'enrichment taxonomy of a content domain' with explicit details about grouping by facets like genre, theme, mood. It distinguishes itself from sibling tools by focusing on providing the controlled vocabulary for filtering and classification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly includes 'When to use this tool: an agent needs the controlled vocabulary to filter, classify or query content,' providing clear context for appropriate usage. It lacks explicit when-not or alternative tools but is sufficient given the simple retrieval nature.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contract_risk_scannerA

Read-only

Inspect

Scanner de risques contractuels — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Salesforce MSA — revue d'un client SaaS B2B EMEA. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`contractText`	Yes
`contractContext`	Yes

Tool Definition Quality

A3.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint. The description adds context about server-side validation and a structured, audited deliverable. This goes beyond the annotations by clarifying input handling and output nature. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (3 sentences) and front-loaded with the purpose. However, the mixed language and vague reference case reduce clarity. It is efficient but could be more structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the 4 parameters, nested objects, and no output schema, the description provides minimal completeness. It mentions a 'structured, audited deliverable' but no specifics on return format. Sibling tools suggest a crowded space, but the description does not fully set context for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (25%). The description mentions 'documented case fields' but does not explain individual parameters beyond what the schema provides. It hints at the nested contractContext structure but adds no new semantics. For low coverage, a higher burden is expected; the description is minimal.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a contract risk scanner that returns a structured, audited deliverable. The verb 'scanner' and resource 'risques contractuels' are present, but the mixed French/English and lack of explicit distinction from siblings like 'legal_clause_extractor' or 'talent_contract_risk_mapper' lower the specificity. It is clear but not fully differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a reference case (Salesforce MSA) which implies a typical use scenario, but it does not explicitly state when to use this tool versus alternatives, nor does it give exclusions or prerequisites. Usage guidance is implied but not explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

corporate_registry_lookupA

Read-onlyIdempotent

Inspect

Resolve legal information about a company from its national corporate registry. Returns a normalised, sourced company profile: legal status, registration number, directors, shareholders, recent filings, registered address, share capital, and a quality score (0–100). Coverage: France (INPI, keyless — full SIREN/SIRET with directors), 3M+ entities worldwide via GLEIF LEI (keyless, large companies), UK (Companies House, optional key), Netherlands (KvK, optional key), and OpenCorporates (token required since 2026). Sources are tried in cascade; quality_score increases with each source that succeeds. When to use: due-diligence, KYC screening, supplier verification, M&A research, or any workflow needing verified company identity and legal status. Optional env vars: COMPANIES_HOUSE_API_KEY (UK), KVK_API_KEY (NL), OPENCORPORATES_API_TOKEN (OpenCorporates token).

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`country`	No	ISO 3166-1 alpha-2 country code (e.g. 'FR', 'GB', 'NL', 'DE', 'SG', 'AU', 'US'). If omitted, inferred from legal suffix in company name, then falls back to global search.
`identifier`	No	Optional registry identifier for a fast direct lookup: SIREN (FR, 9 digits), Companies House number (GB, 8 chars), KvK number (NL, 8 digits), etc.
`company_name`	Yes	Company name or trading name to look up (e.g. 'Sanofi', 'Tesco PLC', 'Notion Labs Inc')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`registry`	Yes
`directors`	Yes
`freshness`	Yes	ISO timestamp
`identifier`	Yes
`legal_form`	No
`legal_name`	No
`company_name`	Yes
`jurisdiction`	Yes
`shareholders`	Yes
`quality_score`	Yes	0-100 confidence score
`share_capital`	No
`filings_recent`	Yes
`incorporation_date`	No
`registered_address`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, destructiveHint, openWorldHint. The description adds value by detailing the cascade of sources (France INPI, GLEIF LEI, UK Companies House, etc.) and the quality_score behavior. It also mentions optional environment variables for API keys. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (5 sentences) and well-structured, with a clear break for 'When to use'. Every sentence adds value: listing returns, coverage, cascade logic, and usage contexts. No redundancy or filler, and it is appropriately front-loaded with the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, multiple sources, optional keys, output schema), the description covers all essential aspects: purpose, coverage, behavior, optional configuration, and use cases. Combined with complete annotations and schema, the description is fully adequate for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All four parameters have descriptions in the input schema (100% coverage). The tool description does not add new semantic details beyond what the schema provides, but it contextualizes the parameters (e.g., country inference). Baseline 3 is appropriate as the schema already does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: resolving legal information from national corporate registries. It uses specific verbs like 'Resolve' and 'Returns', and specifies the resource (company profile). The list of returned data (legal status, registration number, etc.) further defines its scope, distinguishing it from sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists when to use the tool: due-diligence, KYC screening, supplier verification, M&A research. It also covers coverage details and source cascade. However, it does not directly state when not to use it or provide alternative tools for exclusion, but the given contexts are clear enough for appropriate selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

court_filings_multiA

Read-only

Inspect

Aggregate court filings, judgments and litigation records for a company or individual across five major legal jurisdictions: US (CourtListener / PACER), UK (National Archives — EWHC/EWCA/UKSC/UKUT), EU (ECHR HUDOC — European Court of Human Rights), France (Légifrance / Cour de cassation) and Germany (BGH / BVerfG). Returns structured case records with type classification (civil/criminal/antitrust/bankruptcy/administrative/unknown), status (filed/pending/decided/appealed/unknown), parties extracted from case titles, opinion URLs and verbatim snippets. Cross-case pattern recognition produces severity-ranked signals (P0–P2) for criminal, antitrust, bankruptcy, regulatory, data-breach and IP categories. Use when: due diligence on a counterparty, vendor risk assessment, competitive intelligence (litigation history), regulatory exposure mapping. All sources are public and keyless. Optional env var COURTLISTENER_API_KEY raises US rate limits beyond the default 5 req/s anonymous tier. SLA: ≤25s p95 (all jurisdictions fetched in parallel, 8s budget per source). Quality score: 20 pts per jurisdiction with ≥1 case retrieved, +10 if signals detected, +5–10 if ≥2–3 distinct sources contributed.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`date_to`	No	ISO date YYYY-MM-DD — latest filing or decision date to include
`date_from`	No	ISO date YYYY-MM-DD — earliest filing or decision date to include
`party_name`	Yes	Name of the company or individual to search (e.g. "Apple Inc", "TotalEnergies", "Volkswagen AG")
`jurisdiction`	No	Jurisdictions to search. Defaults to all ["US","UK","EU","FR","DE"].

Output Schema

ParametersJSON Schema

Name	Required	Description
`cases`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`party_name`	Yes
`quality_score`	Yes
`by_jurisdiction`	Yes
`jurisdictions_searched`	Yes

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint and openWorldHint. The description adds rich behavioral context: SLA (≤25s p95, 8s per source), quality scoring rules, async support, and parallel execution. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is dense but well-structured: purpose first, then jurisdictions, output details, use cases, source notes, SLA, scoring. Every sentence adds distinct value. No fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex multi-jurisdiction tool with async, quality scoring, and an output schema, the description covers all critical aspects. It explains inputs, outputs, performance, and scoring, leaving minimal gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are well-defined. The description adds value by explaining default jurisdictions, that date_from/to refer to filing/decision dates, and that party_name expects company or individual names. This contextualizes the schema beyond its base descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool aggregates court filings across five specified jurisdictions, with detailed output including case types, status, and signals. It is specific and distinguishes from siblings; no other tool in the list serves this exact multi-jurisdiction litigation purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly lists use cases: due diligence, vendor risk, competitive intelligence, regulatory exposure. Also provides context on public sources, keyless access, and optional env var for rate limits. Though 'when not to use' is absent, the stated use cases are sufficiently clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

crm_connectorAInspect

Push, update, search and log activities in HubSpot, Salesforce or Pipedrive. 4 modes: push_lead (create contact/lead), update_opportunity (update deal stage/amount), search_contact (lookup by email), log_activity (call/email/meeting/note). Returns resource_id, direct CRM URL, signals and quality_score. If credentials are absent, returns a mock result with a warning signal. Auth: HubSpot via Bearer access_token; Salesforce via access_token + base_url; Pipedrive via api_key.

ParametersJSON Schema

Name	Required	Description
`data`	Yes	Payload depending on mode. push_lead: {email,first_name,last_name,company,phone,job_title}. update_opportunity: {deal_id/opportunity_id,stage,amount,close_date}. search_contact: {email}. log_activity: {type,body,contact_id/person_id,subject}.
`mode`	Yes	Action to perform in the CRM
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`provider`	Yes	CRM provider to target
`credentials`	No	Auth credentials. HubSpot: access_token. Salesforce: access_token + base_url. Pipedrive: api_key.

Output Schema

ParametersJSON Schema

Name	Required	Description
`url`	No
`mode`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`success`	Yes
`provider`	Yes
`data_synced`	No
`resource_id`	No
`quality_score`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description discloses that if credentials are absent, a mock result with a warning signal is returned. It also details authentication methods for each provider. Annotations (readOnlyHint=false, openWorldHint=true) are consistent, so no contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is front-loaded with the tool's main actions and modes, and each sentence adds essential information. It is appropriately sized for the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's multi-mode, multi-provider complexity and presence of an output schema, the description covers all key aspects: modes, payloads, credentials, error handling, and return values. It is complete enough for agent selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds significant value by specifying payload structures per mode and credential requirements per provider, going beyond schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool pushes, updates, searches, and logs activities in HubSpot, Salesforce, or Pipedrive with four explicit modes. This specific verb+resource combination distinguishes it from sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description lists four modes and their purposes, and explains credential requirements for each provider. However, it does not explicitly state when not to use this tool versus alternatives, missing some usage boundaries.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cross_sell_recoC

Read-only

Inspect

Recommandations cross-sell — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Alan × Gapup Hub — 3 produits recommandés · Fit 'perfect' × 2 · ARR potentiel +€18k. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`account`	Yes
`company`	Yes
`portfolio`	Yes

Tool Definition Quality

C2.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, covering safety and non-determinism. The description adds that it returns an 'audited deliverable' and inputs are validated server-side, which provides minor context but does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description contains a reference example that may be distracting and uses overly promotional language. It could be more compact and front-loaded with essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and the openWorldHint, the description fails to specify the structure or format of the returned deliverable. The agent lacks guidance on what to expect from the result.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25%, meaning most parameters lack descriptions. The tool description does not explain the parameters (account, company, portfolio) beyond a vague reference to 'documented case fields', offering no help for correct parameter construction.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description indicates the tool provides cross-sell recommendations and returns a structured deliverable, but it is phrased in a marketing-heavy way with mixed languages and unclear terms like 'Gapup agent-payable C-suite expertise'. The purpose is discernible but not crisp.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is given on when to use this tool versus alternatives like upsell_hunter or account_expansion_mapper. The description does not discuss prerequisites, context, or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

crypto_wallet_intelA

Read-only

Inspect

Multi-chain on-chain analytics for crypto trading agents, on-chain analysts, AML/compliance teams and DeFi BD. Covers Ethereum, Base, Polygon, BSC, Arbitrum, Optimism — EVM-compatible addresses only.

5 modes: • wallet_profile — full wallet summary: type (EOA/contract/CEX/protocol), inferred persona (whale/MEV-bot/DeFi-user/hodler…), age, tx count, native balance, ERC-20 count, NFT collections, OFAC sanctions flag • token_flows — ERC-20 inflows/outflows per token on the selected period, priced in USD via CoinGecko • pnl_estimate — FIFO realized + unrealized P&L on the period with confidence rating (high/medium/low) • counterparties — top 20 counterparties ranked by USD volume with CEX/DEX/protocol labels • defi_positions — active DeFi positions detected via Etherscan interaction history (Aave/Compound/Uniswap/Curve/Lido/Balancer/SushiSwap)

Signal detection (P0/P1/P2): P0 if OFAC SDN match OR direct Tornado Cash / sanctioned-protocol interaction P1 if >$1M volume on wallet <30 days old OR MEV-bot pattern OR >80% volume on single counterparty P2 informational (CEX wallet, new wallet, no anomaly)

Sources: Etherscan family (keyless free-tier, optional API key per chain), DefiLlama (keyless), public EVM RPC (keyless), CoinGecko free tier (keyless). Cache TTL: 5 min (wallet activity evolves fast). Budget: 8s per source.

Env vars (all optional, raise Etherscan rate-limit from 1 req/5s to 5 req/s): ETHERSCAN_API_KEY · BASESCAN_API_KEY · POLYGONSCAN_API_KEY BSCSCAN_API_KEY · ARBISCAN_API_KEY · OPTIMISM_API_KEY

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analysis mode. wallet_profile=full wallet summary + persona + sanctions flag. token_flows=ERC-20 inflows/outflows per token priced in USD. pnl_estimate=FIFO realized+unrealized P&L with confidence. counterparties=top 20 counterparties by volume. defi_positions=active positions on Aave/Compound/Uniswap/Curve/Lido/etc.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`chain`	No	Chain to analyze. Default "ethereum". Use "all" to scan all 6 chains (slower, ~30s).
`address`	Yes	EVM-compatible wallet address (0x... 40 hex chars). Works on all supported chains.
`period_days`	No	Lookback window in days for token_flows, pnl_estimate, counterparties, defi_positions. Default 30.
`min_value_usd`	No	Minimum USD value filter for token_flows and counterparties. Default $100.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`address`	Yes
`signals`	Yes
`sources`	Yes
`token_flows`	No
`pnl_estimate`	No
`quality_score`	Yes
`counterparties`	No
`defi_positions`	No
`wallet_profile`	No
`chains_analyzed`	Yes

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds significant behavioral context: 5 modes, signal detection levels (P0/P1/P2), sources, cache TTL, budget, and optional env vars. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with sections for modes, signals, sources, etc. It's front-loaded with purpose. Slightly long but every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 6 parameters, 5 modes, and multi-chain support, the description is thorough. It covers sources, env vars, signal levels, and performance characteristics. Output schema is present, so return values are not needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline at 3. The description adds meaningful context beyond schema: explains each mode in detail, async flag use, chain options with speed implications, and default values.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as multi-chain on-chain analytics for crypto trading, compliance, and DeFi. It enumerates 5 distinct modes with specific behaviors. The tool is unique among siblings, so no confusion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implicitly tells when to use (for EVM address analysis) but lacks explicit guidance on when not to use or alternatives. However, given the unique functionality, the gap is minor.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

customer_marketingC

Read-only

Inspect

Marketing clients & ambassadeurs — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Gapup Hub — 12 clients analysés · 4 ambassadeurs identifiés · Programme + 6 case studies + référral. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`goals`	Yes
`company`	Yes
`product`	Yes
`customers`	Yes
`targetUseCases`	No
`contentBudgetEur`	No

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that it returns an audited deliverable and inputs are validated server-side, which aligns with the read-only nature. However, it does not elaborate on other behavioral aspects like rate limits or output structure beyond the deliverable claim.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph with moderate length, but it includes a specific reference case that may not be universally helpful. The core message is somewhat buried in jargon. It is adequately concise but could be better structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, nested objects, async option), the description lacks completeness. It does not explain the async parameter, output format, or how the deliverable is structured. The absence of output schema information is a gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (14%), and the description does not compensate by explaining parameter meanings. It simply says 'send the documented case fields' without clarifying how each parameter contributes to the tool's function. The schema has many properties with poor or missing descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool is for 'Marketing clients & ambassadeurs' and mentions it returns a structured deliverable, but the inclusion of specific jargon ('Gapup agent-payable C-suite expertise') and a reference case makes the purpose somewhat unclear. It does not differentiate itself from marketing-related sibling tools like marketing_roi_dashboard or event_marketing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description only mentions input validation and case fields, but does not provide context for selection among sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

customer_voice_synthC

Read-only

Inspect

Synthèse voix client — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Alan (assurance santé) — 3 personas · Top 5 douleurs · Repositionnement messagerie recommandé. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`dataSources`	Yes
`targetSegments`	Yes
`repositioningFocus`	No

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds 'Returns a structured, audited deliverable' which aligns with read-only, but goes no further on behavioral traits like auth requirements, rate limits, or side effects. It does not add significant value beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but mixes French and English, includes bullet points in the reference case, and lacks a clear structure. It could be more concise and better formatted for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 5 parameters (including nested objects), no output schema, and a complex domain, the description is insufficient. It does not explain the deliverable structure, how to fill parameters, or how the tool handles multiple data sources. The reference case offers minimal context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, and the description does not explain any parameters beyond the 'async' flag. The phrase 'send the documented case fields' is vague and does not map to specific schema properties. Low coverage forces the description to compensate, but it fails to do so.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The name 'customer_voice_synth' and title 'Synthèse voix client' clearly indicate the tool's focus on customer voice synthesis. The description mentions it returns a structured, audited deliverable and provides a concrete reference case (Alan, insurance, personas, pains, messaging). This gives a specific purpose, though it could be more explicit about the output format.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide any guidance on when to use this tool versus its many siblings. It mentions that inputs are validated server-side and to 'send the documented case fields', but lacks explicit context for usage or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cve_security_lookupA

Read-only

Inspect

Look up CVE vulnerability data for enterprise security teams, DevSecOps and SOC analysts. Supports two modes: exact CVE ID lookup (e.g. 'CVE-2024-3094') or keyword search by product/vendor (e.g. 'openssl', 'Apache Tomcat'). Cross-references four authoritative keyless sources: NVD NIST (official CVE database, CVSS v3 scores, affected CPEs), CISA KEV (Known Exploited Vulnerabilities catalog — exploit_in_wild flag), EPSS FIRST (exploit probability 0-1), GitHub Security Advisories (ecosystem-specific: npm/pypi/maven). Returns structured vulnerability records with CVSS v3 scores, affected product version ranges, CWE weakness classification, references and exploitation status. Signals engine produces P0/P1/P2 alerts: P0=CVSS>=9 + active exploitation, P1=CVSS>=7 or EPSS>=70%, P2=CWE pattern clusters. Relevant for EU NIS2 and DORA supply chain risk obligations. Optional env: NVD_API_KEY (raises NVD rate-limit 5→50 req/30s), GITHUB_TOKEN (raises GHSA GraphQL rate-limit). Cache TTL 6h. SLA <=25s p95.

ParametersJSON Schema

Name	Required	Description
`mode`	No	Override auto-detection: "lookup" for exact CVE ID, "search" for product/vendor keyword.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	CVE ID (e.g. "CVE-2024-3094") or product/vendor keyword (e.g. "openssl", "Apache Tomcat"). Mode is auto-detected from the CVE-YYYY-XXXXX pattern.
`max_results`	No	Maximum number of vulnerabilities to return (default 20, max 50).
`severity_min`	No	Minimum CVSS v3 severity to include in results (default: no filter).
`published_after`	No	ISO date YYYY-MM-DD — only include CVEs published after this date. Defaults to 365 days ago for search mode.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`quality_score`	Yes
`vulnerabilities`	Yes

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description significantly adds behavioral context beyond the annotations. It details the cross-referencing of four authoritative sources, the custom alerting engine (P0/P1/P2), optional API keys for rate limiting, a 6-hour cache TTL, and a performance SLA. This is consistent with the readOnlyHint and destructiveHint annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is information-dense, covering many facets (sources, alerts, env vars, SLA), but it is not overly concise. It contains multiple sentences that could be streamlined without losing meaning. However, the purpose is front-loaded in the first sentence, aiding quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (six parameters, an output schema, and multiple behavioral nuances), the description is comprehensive. It explains the return format, caching, rate limits, and alerting. The annotations and output schema (present but not shown) reduce the burden on the description, making it sufficiently complete for selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

While the schema already provides 100% description coverage for all six parameters, the tool description adds value by explaining auto-detection of mode from query pattern, the purpose of the async parameter, and default values for severity_min and published_after. This extra context helps the agent understand parameter behavior beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to look up CVE vulnerability data with two explicit modes (exact CVE ID or keyword search). It specifies the target audience (enterprise security teams, DevSecOps, SOC analysts) and lists the data sources. The tool name itself is specific, and the description distinguishes it from the vast sibling list by focusing on CVE data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use the tool (for CVE lookups and searches) and explains the two modes. It mentions optional environment variables for rate limits and cache TTL, which aids usage. However, it does not explicitly state when not to use it or name alternative tools for other CVE-related tasks.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cyber_risk_auditorC

Read-only

Inspect

Auditeur de risque cyber — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Qonto — Audit cyber risque B2B FinTech · Score 58/100 → roadmap 90j · 8 findings critiques/high · économie prime -28%. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`techStack`	Yes
`currentPosture`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that the tool returns a deliverable, but does not elaborate on potential wait times (despite an async parameter) or result structure, providing only marginal additional behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise but mixes languages (French title, English body) and includes a specific case that may be irrelevant for general use. It front-loads the purpose but wastes space on a reference case.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, nested objects, no output schema, many siblings), the description is incomplete. It lacks explanation of output structure (beyond 'structured deliverable'), when to use async, and the focus parameter's role.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 20% (only 'async' has a description). The description does not explain any other parameters, failing to compensate for the low coverage. Users are left to infer meaning from parameter names and nested structure.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function as a cyber risk auditor that returns a structured deliverable, citing a specific case. However, it does not differentiate from similar sibling tools like 'vendor_risk_assessor' or 'attack_surface_monitor', which reduces the score from 5.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. The only usage instruction is 'send the documented case fields', which is minimal and does not help with tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

deal_coachB

Read-only

Inspect

Coach de deal MEDDIC — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Datadog Enterprise deal Société Générale €1.2M ARR — coaching MEDDIC + escalation plays + 14 next actions. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`deal`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`knownContext`	Yes
`buyingCommittee`	Yes

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true (safe read) and openWorldHint=true (external/AI integration). The description states it returns a 'structured, audited deliverable' and references a case study, adding context beyond annotations. However, it does not disclose potential latency, the nature of the audit, or server-side validation behavior in detail.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise, using three sentences. It includes a reference to a case study which adds illustrative value, though the sentence is somewhat long. It front-loads the core purpose and ends with usage note. No extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite the tool's complexity (nested objects, multiple parameters, no output schema, low schema coverage), the description does not provide sufficient context. It lacks details on the output format, how to interpret the deliverable, or which deal scenarios are appropriate. The case study reference is specific but not universally clarifying.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (20%), and the description does not explain individual parameters (deal, buyingCommittee, knownContext, focus, async). It only says 'send the documented case fields', which is vague. The parameters include nested objects and constraints (e.g., buyingCommittee roles enum) that go unmentioned, leaving the agent to rely solely on the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool coaches deals using MEDDIC methodology for C-suite/CRO level, and returns a structured deliverable. It references a specific case study, but does not explicitly differentiate from sibling tools like meddic_scoring or deal_structurer, though the focus on MEDDIC coaching and CRO expertise provides some distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions that inputs are validated server-side and to send 'documented case fields', which provides basic usage guidance. However, there is no explicit statement of when to use this tool versus alternatives (e.g., meddic_scoring, battle_plans), nor any prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

deal_structurerC

Read-only

Inspect

Structuration de deal — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: Agicap × Kyriba — Partenariat API Banking · 5 structures comparées · Term sheet 7 clauses · Score 83/100 JV. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`deal`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint. The description adds that inputs are validated server-side and it returns a deliverable, but lacks details on output format or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is moderately concise but includes a specific reference case that may not be universally helpful; key info is not front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema and nested inputs, the description should explain the deliverable structure and interpretation, but it does not. The agent is left with incomplete context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only 33% schema description coverage; the description says 'send documented case fields' but adds no specific meaning for the nested parameters, leaving the agent to infer from schema names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it structures deals and returns a structured, audited deliverable, with a reference case. However, it does not differentiate from sibling tools like deal_coach or term_sheet_negotiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives; no when-not-to-use or prerequisite information.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dependency_vulnerability_scanA

Read-only

Inspect

SCA (Software Composition Analysis) — scans a project dependency manifest and returns known vulnerabilities for each dependency. Supports: package.json (npm), requirements.txt (Python), go.mod (Go), Cargo.toml (Rust), composer.json (PHP), Gemfile.lock (Ruby), CycloneDX SBOM JSON. PRIMARY source: OSV.dev (keyless, free, covers npm/PyPI/Go/crates.io/Packagist/RubyGems + GHSA advisories federated). CVSS enrichment: NVD NIST (when OSV lacks score). Exploitation flag: CISA KEV (known-exploited-vulnerabilities catalog). Returns per-vuln CVE/GHSA IDs, severity, CVSS score, fixed version, and actionable upgrade recommendations. Relevant for EU NIS2 supply chain risk obligations, DORA, SOC 2 vendor assessments. Cache TTL 6h. Parallel OSV queries (concurrency=10). SLA <=30s p95.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Manifest type: "package_json"=npm, "requirements_txt"=pip, "go_mod"=Go modules, "cargo_toml"=Rust, "composer_json"=PHP, "gem_lock"=Ruby, "sbom_cyclonedx"=CycloneDX SBOM JSON.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`severity_min`	No	Minimum severity to include in results (default: "medium").
`manifest_content`	Yes	Raw text content of the manifest file to scan (e.g. full contents of package.json, requirements.txt, etc.).
`include_transitive`	No	Include transitive/indirect dependencies in results (default: true).

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`sources`	Yes
`summary`	Yes
`ecosystem`	Yes
`quality_score`	Yes
`recommendations`	Yes
`vulnerabilities`	Yes
`dependencies_parsed`	Yes

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, and the description reinforces this by describing the tool as a scan. Additionally, the description discloses useful behavioral details such as cache TTL (6h), concurrency (10), SLA (<=30s p95), and data sources (OSV, NVD, CISA KEV), which add significant value beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is dense but structured with bullet-like punctuation, covering key aspects in a single paragraph. It is efficient and front-loaded with the core purpose, making it easy to scan. However, it could be slightly more organized (e.g., using bullet points) to improve readability without adding length.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, output schema exists), the description covers all necessary aspects: supported formats, data sources, caching, concurrency, SLA, and relevant regulations. It leaves no obvious gaps for an AI agent to misunderstand the tool's capabilities or behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so each parameter already has a clear description. The tool description lists supported manifest types and data sources, but this is more about overall behavior than parameter-specific detail. Since the schema does the heavy lifting, the description adds minimal additional meaning to the parameters, earning a baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that this tool scans a dependency manifest for known vulnerabilities, listing supported formats and data sources. It specifies the action (scanning), the resource (dependency manifest), and the outcome (known vulnerabilities), distinguishing it from sibling tools that serve different purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context for when to use this tool (e.g., EU NIS2, DORA, SOC 2) and mentions performance characteristics like SLA and caching. However, it does not explicitly state when not to use this tool or mention alternatives among sibling tools, though the purpose is clear enough to guide usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

discovery_prepC

Read-only

Inspect

Préparation discovery — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Discovery Salesforce × Airbus — VP Digital Marc Legrand · Signaux achat confirmés · +28 pts conversion demo. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`contact`	Yes
`ourOffer`	Yes
`prospect`	Yes
`meetingGoal`	No

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true and openWorldHint=true, which the description does not contradict. Description adds that it returns an audited deliverable and that inputs are validated server-side, but lacks details on performance, side effects, or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences with some useful context but also includes a reference case that may be extraneous. The description is moderately concise but could be streamlined to focus on the tool's core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given low schema coverage and no output schema, the description should compensate with more detail about the deliverable format, required fields, and success criteria. It currently lacks this, leaving the agent with incomplete context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 20% (only 'async' has a description). The description does not explain parameter semantics beyond stating 'send the documented case fields,' which is too vague to guide parameter selection.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it is for preparing a discovery deliverable, including a reference case. However, it uses jargon ('Gapup agent-payable C-suite expertise') that may confuse the AI agent, and does not explicitly distinguish from sibling tools like 'battle_plan' or 'meddic_scoring'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description only mentions inputs validated server-side but does not specify prerequisites, context, or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

diversity_inclusion_metricsC

Read-only

Inspect

Métriques diversité & inclusion — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: Cas démo — Métriques diversité & inclusion. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`ambitions`	Yes
`currentState`	Yes
`regulatoryContext`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, so the description's claim 'Returns a structured, audited deliverable' adds minimal new behavioral context. It does not specify whether the deliverable is cached, real-time, or requires specific permissions. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (3 sentences) and front-loads the purpose. However, the reference case line 'Reference case: Cas démo — Métriques diversité & inclusion' adds limited value and could be removed for conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema, 6 parameters with low schema coverage, and nested objects. The description only states it returns 'a structured, audited deliverable' without details on the output format, key fields, or how to interpret results. An ESG/D&I tool of this complexity demands more contextual completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% — most parameters lack descriptions. The description adds nothing about the meaning of 'company', 'currentState', 'ambitions', or 'regulatoryContext'. It merely says 'send the documented case fields', which is redundant. For a tool with nested objects and 6 parameters, this is insufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a 'structured, audited deliverable' for diversity & inclusion metrics, but the verb is implied rather than explicit (e.g., 'generate' or 'calculate'). It references a demo case and mentions sustainability expertise, which helps distinguish it from generic tools, but sibling tools like 'sustainability_report' or 'action_plan_esg' have overlapping domains without clear differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives (e.g., sustainability_report, bias_amplification_tracker). The only usage hint is 'Inputs are validated server-side — send the documented case fields,' which is about input format, not strategic selection. Sibling list includes many D&I-relevant tools, making the lack of context a notable gap.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

domain_tech_fingerprintB

Read-only

Inspect

Empreinte tech d'un domaine — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: What is the tech stack of — frontend, CMS, analytics, CRM, CDN, hosting? · What buying signals does 's technology footprint reveal for sales prospecting? · Analyze for supply-chain technology risk and third-party vendor exposure. · What is the best outreach angle for a sales rep targeting based on their detected stack? · Run a CISO-style technology fingerprint on — identify legacy tech, missing security headers, and vendor risk. · Has recently changed their marketing or analytics stack — any vendor adoption signals? Reference case: velora-payments.io · Next.js + Cloudflare + Stripe + GA4 + HubSpot · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`depth`	Yes		standard
`focus`	Yes		tech-buying
`target_domain`	Yes

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true, which convey non-destructive and dynamic output. The description adds that it returns a 'structured, audited deliverable' and mentions async behavior. It does not contradict annotations. However, it lacks disclosure on rate limits, authentication, or error handling beyond server-side validation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose, mixing French and English, with an example case and multiple questions. It is not front-loaded with essential information for an AI agent. The structure is more suited to a human salesperson than a concise tool description.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

There is no output schema, and the description only vaguely mentions a 'structured, audited deliverable'. It lists questions but not the specific format or fields returned. The async mechanism is described, but overall, the description does not provide sufficient detail for an agent to fully understand the output structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (only async has description). The tool description does not explain the meaning of focus, depth, or target_domain beyond listing questions that imply their use. The description adds marginal value for async (job_id mention) but fails to compensate for the sparse schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: it provides a technology fingerprint of a domain, answering specific questions about tech stack, buying signals, supply-chain risk, etc. The verb is implied by 'empreinte tech' (tech fingerprint) and the name 'domain_tech_fingerprint' is highly descriptive. It distinguishes from siblings by focusing solely on domain technology analysis.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists several use cases and questions the tool answers, giving context for usage. However, it does not explicitly state when not to use this tool versus alternatives, nor does it compare to sibling tools like competitive_deep_dive or competitor_intel. The lack of exclusion criteria limits its guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dora_metrics_deep_diveA

Read-onlyIdempotent

Inspect

Analyzes DORA metrics (Deployment Frequency, Mean Time to Recovery, Change Failure Rate) with deep correlation to code review patterns. Designed for CTOs to identify bottlenecks in software delivery pipelines. Inputs include GitHub repository identifiers and optional time ranges. Outputs structured metrics with trend analysis and code review depth insights.

ParametersJSON Schema

Name	Required	Description
`repo`	Yes	GitHub repository in format 'owner/repo'
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`since`	No	Start date for analysis (ISO 8601)
`until`	No	End date for analysis (ISO 8601)
`branch`	No	Branch name to analyze (default: main)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`metrics`	No
`sources`	No
`warnings`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, openWorldHint, and idempotentHint. The description adds context about deep correlation to code review patterns and trend analysis, which goes beyond the annotations without contradicting them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: first states action, second states audience/purpose, third states inputs/outputs. No wasted words, highly efficient structure.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of output schema and annotations, the description covers purpose, audience, inputs, and output style. It is nearly complete, though it could briefly mention the async pattern for long-running queries.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds minimal context by grouping parameters (GitHub repo identifiers, time ranges) but does not detail the async or branch parameters, which the schema already covers.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Analyzes') and resource ('DORA metrics'), and adds the unique aspect of correlation to code review patterns, clearly distinguishing it from siblings like change_failure_root_cause_classifier and mttr_breakdown_analyzer.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It states the target audience (CTOs) and purpose (identify bottlenecks), but does not provide explicit when-not-to-use guidance or mention alternative sibling tools for more specific analyses.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dora_operational_resilience_stress_tesA

Read-onlyIdempotent

Inspect

Assess DORA operational resilience by simulating ICT failure scenarios for financial entities. Designed for legal/compliance teams to evaluate ICT risk management under DORA Article 25. Inputs include failure scenario parameters (e.g., ICT service type, duration, impact radius) and entity profile. Outputs structured resilience scores, regulatory gaps, and mitigation recommendations with EUR-Lex/FTC enforcement references.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entityType`	Yes
`impactRadius`	Yes
`ictServiceType`	Yes
`existingMitigations`	No
`failureDurationHours`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`warnings`	No
`regulatoryGaps`	Yes
`resilienceScore`	Yes
`simulationTimestamp`	No
`recommendedMitigations`	Yes

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond annotations: it explains that the tool simulates scenarios (no real impact), outputs structured scores and recommendations, and references EUR-Lex/FTC. Given annotations already provide safety hints, this additional context is valuable.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (two sentences) and front-loaded with purpose. It efficiently conveys key information but lacks structural elements like bullet points for clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (6 parameters, output schema exists, annotations present), the description covers purpose, target users, regulatory context, and output types. It does not detail all parameters or the async mechanism, but the output schema handles return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description explains 4 of 6 parameters (ictServiceType, failureDurationHours, impactRadius, entityType) as 'failure scenario parameters', adding meaning beyond schema enums. However, it omits the 'async' flag and 'existingMitigations' parameter. With only 17% schema coverage, the description partially compensates.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to assess DORA operational resilience by simulating ICT failure scenarios. It specifies the verb 'Assess', the resource 'operational resilience', and the method. The target audience (legal/compliance teams) and regulatory context (DORA Article 25) differentiate it from siblings like 'dora_metrics_deep_dive'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for DORA compliance evaluation but does not explicitly state when to use this tool versus alternatives, nor does it provide when-not-to-use guidance. There is no mention of prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dpdp_consent_artifact_generatorB

Read-onlyIdempotent

Inspect

Generates structured consent artifacts compliant with India's Digital Personal Data Protection Act (DPDP). Designed for legal teams to verify or create consent records with timestamped logs, purpose limitation, and data subject rights. Accepts data subject details, processing purpose, and legal basis as inputs. Returns a signed artifact with audit trail and validation status.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`legalBasis`	Yes	Legal basis for processing under DPDP
`dataSubjectId`	Yes	Unique identifier for the data subject
`dataCategories`	No	Categories of personal data being processed
`processingPurpose`	Yes	Specific purpose for data processing
`retentionPeriodDays`	No	Retention period in days

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`artifact`	No
`warnings`	No

Tool Definition Quality

B3.2/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description states 'Generates structured consent artifacts' implying a write/mutation operation, but annotations declare readOnlyHint: true, which denotes a read-only operation. This is a direct contradiction. Additionally, idempotentHint: true conflicts with generating timestamped artifacts. No other behavioral traits are disclosed beyond the misleading contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences long, front-loads the core purpose, and avoids redundant or extraneous information. Every element serves a clear function, making it highly efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While the description mentions the return value ('signed artifact with audit trail and validation status'), the annotation contradiction fundamentally undermines its completeness. With readOnlyHint: true, the described generative behavior is false, making the description misleading rather than complete. Given the complexity and existing output schema, the description fails to provide accurate contextual completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters. The description reiterates that it accepts data subject details, processing purpose, and legal basis, but adds no additional semantic clarity beyond what the schema provides. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it generates structured consent artifacts compliant with India's DPDP Act, specifies the target users (legal teams), and mentions key features like timestamped logs and data subject rights. This verb+resource+regulation scope makes it distinct from sibling tools, which address different domains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates the tool is for legal teams to verify or create consent records, providing some context on when to use it. However, it lacks explicit guidance on when not to use it or mention of alternative tools, leaving the agent to infer usage from domain specificity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dual_use_export_risk_mapperA

Read-onlyIdempotent

Inspect

As a COO, quickly assess export compliance risks for components in your supply chain. This tool analyzes bills of materials (BOMs) against EU dual-use export control lists and ICAO/IMO restricted items data. Input a list of part numbers, descriptions, or HS codes to receive a risk assessment with actionable insights. Output includes risk levels, applicable regulations, and source references for audit trails.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`bomItems`	Yes
`includeSources`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`results`	No
`sources`	No
`warnings`	No

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly, openWorld, idempotent. Description adds context on data sources and output, aligning with safe behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three focused sentences front-load purpose, then data sources, then input/output. No fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While output schema exists, description covers input/output adequately but could mention async usage for large BOMs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 33% coverage; description clarifies bomItems format and includeSources purpose, adding value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it assesses export compliance risks for BOM components against EU dual-use and ICAO/IMO lists, distinguishing it from sibling 'dual_use_tech_diversion_monitor'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description targets COOs and supply chain contexts, but lacks explicit when-not or alternative tools guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dual_use_tech_diversion_monitorA

Read-onlyIdempotent

Inspect

Asynchronous T5-level tool for COO persona to detect unauthorized diversion of dual-use technologies. Cross-references shipment manifests, EU sanctions lists, and ICAO/IMO transport data to identify suspicious transfers. Inputs: shipment IDs, company identifiers, or geographic routes. Outputs structured diversion risk assessment with source provenance. Requires async:true to avoid 402 timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`route`	No
`companyId`	No	Company registration number or tax identifier
`shipmentId`	No	Unique shipment identifier (e.g., bill of lading number)
`techCategory`	No	Dual-use technology category

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`matches`	No
`sources`	No
`warnings`	No
`diversionRisk`	No	Calculated diversion risk score (0-100)

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, openWorld, and idempotent hints. The description adds the critical async behavior detail ('Requires async:true to avoid 402 timeout') and mentions structured output with source provenance, which goes beyond annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, front-loaded with purpose, and contains no filler. It efficiently conveys key points, though the async mention could be integrated more naturally.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with multiple parameters, nested objects, and async behavior, the description covers purpose, inputs, outputs, and async requirement. Output schema exists separately, so lack of return details is acceptable. Slightly lacking in usage context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 80%, with parameters well-documented (e.g., country codes, identifiers). The description summarizes inputs ('shipment IDs, company identifiers, or geographic routes') but does not add new meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool detects unauthorized diversion of dual-use technologies, specifies the COO persona, and lists cross-referencing of shipment manifests, sanctions lists, and transport data. It distinguishes itself from the sibling 'dual_use_export_risk_mapper' by focusing on diversion monitoring.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for monitoring shipments and diversion risk but does not explicitly state when to use this tool versus alternatives like 'dual_use_export_risk_mapper'. It mentions async requirement but lacks 'when-not' guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

earnings_reviewerC

Read-only

Inspect

Earnings Reviewer — Gapup agent-payable C-suite expertise (FUNDRAISING). Returns a structured, audited deliverable. Reference case: Salesforce Q3 FY2026 — call transcript + 10-Q + guidance → analyst note. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`quarter`	Yes
`analystFocus`	No
`secFilingContext`	No
`transcriptExcerpt`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly and openWorld hints. Description adds server-side validation and async option, but does not fully explain deliverable format or latency. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Short at 3 sentences, but the first sentence contains jargon and the reference case could be shortened. Some fluff reduces conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema, complex nested input, and no description of return format. Async behavior mentioned but not explained for polling. Incomplete for seamless agent invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 17% schema coverage, description adds little: 'send the documented case fields' does not clarify analystFocus or secFilingContext. Parameters are partially self-explanatory but need more elaboration.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states tool reviews earnings and returns a structured deliverable, with a concrete reference case. However, the phrase 'Gapup agent-payable C-suite expertise' is opaque and may confuse.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs siblings like earnings_transcript_signals or sec_filing_decoder. Fails to distinguish use cases or provide exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

earnings_transcript_signalsA

Read-only

Inspect

Earnings call transcript signal extractor for equity research analysts, catalyst-driven hedge funds, and BD teams. Parses earnings transcripts (fetched or provided) to surface:

• signals (P0/P1/P2): guidance raise/cut, miss/beat vs consensus, buyback, dividend change, new product, executive change, capex shift, M&A intent, regulatory risk, competitive threat, supply chain, hiring • kpis_mentioned: Revenue, EBITDA, EPS, FCF, Gross Margin, Operating Margin with YoY/QoQ % • guidance: raised / maintained / cut / new_initiated items extracted • q_and_a_topics: top Q&A themes detected (AI strategy, China exposure, M&A pipeline, macro, etc.) • overall_tone: bullish / neutral / bearish

Sources fetched automatically: SEC EDGAR 8-K filings, Yahoo Finance earnings news, Motley Fool transcripts. If no transcript can be retrieved from any source, returns status:'failed' with an explicit warning and empty signals — never fabricated data. Accepts transcript_text override for direct analysis. Supports multilingual transcripts (de/fr/es/zh). European tickers (SAP.DE, BMW.DE) mapped to EDGAR-compatible equivalents automatically.

ParametersJSON Schema

Name	Required	Description
`lang`	No	Language hint for the transcript. Affects mock transcript language when fetch fails.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`quarter`	No	Fiscal quarter in format Q1-2026. Defaults to the most recent past quarter.
`transcript_text`	No	If provided, skips all external fetches and analyses this text directly. Minimum 100 characters.
`company_or_ticker`	Yes	Company name or ticker symbol (e.g. 'Tesla', 'TSLA', 'SAP', 'SAP.DE', 'Sanofi', 'SNY'). European tickers (SAP.DE, BMW.DE) are mapped to their ADR equivalents for EDGAR lookup.

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false, establishing a safe read profile. The description adds substantial behavioral context: automatic source fetching from SEC EDGAR, Yahoo Finance, Motley Fool; failure behavior (returns status:'failed' with warning); multilingual support; European ticker mapping. This enriches understanding beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with purpose and uses bullet points for outputs, making it scannable. While detailed (multiple sentences), every sentence adds value for a complex tool. It could be slightly more concise but remains well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of the tool and no output schema, the description thoroughly covers outputs (signals, KPIs, guidance, Q&A topics, tone), sources, failure mode, multilingual support, and ticker mapping. It provides complete context for an AI agent to understand what the tool returns and when it succeeds or fails.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for each parameter. The description does not add significant meaning beyond what the schema already provides for individual parameters; it instead focuses on overall tool behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as an earnings call transcript signal extractor, specifying its target users (equity research analysts, hedge funds, BD teams) and detailing the types of signals, KPIs, guidance, Q&A topics, and tone it surfaces. This distinguishes it from siblings like 'earnings_reviewer' by focusing on structured signal extraction rather than a broader review.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions target users and provides context on tool behavior (e.g., sources, failure mode), but it does not explicitly guide when to use this tool over alternatives like 'earnings_reviewer' or 'sec_filing_decoder'. Usage guidance is implied but not direct.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

economic_indicatorA

Read-only

Inspect

Return a precise macroeconomic indicator for a country — the exact figure for a market-sizing, finance or strategy workflow. Indicators: gdp_usd, gdp_per_capita, gdp_growth, inflation, unemployment, population. Source: World Bank. When to use: an agent's analysis needs an authoritative country-level economic figure. Inputs: country (ISO-2 or ISO-3 code) and indicator name.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`country`	Yes	Country code, ISO-2 or ISO-3 (e.g. FR, USA)
`indicator`	Yes	Macroeconomic indicator name

Output Schema

ParametersJSON Schema

Name	Required	Description
`year`	Yes
`value`	Yes
`source`	Yes
`country`	Yes
`indicator`	Yes
`source_url`	No
`indicator_code`	No

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds minimal extra behavioral context (source and 'exact figure'). No contradictions, but does not discuss rate limits, data freshness, or response format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, front-loaded with purpose, and contains no unnecessary words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description does not mention the async parameter or how to use job_result for polling. The output format is not described despite having an output schema. Missing these details for a complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds value by explicitly listing all indicator names in text, making it easier for the agent to see without parsing the enum. Also clarifies country code format.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns a precise macroeconomic indicator for a country with specific indicator names and source (World Bank). It distinguishes from siblings by targeting market-sizing, finance, or strategy workflows.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides a clear 'when to use' statement for authoritative country-level economic figures. Does not explicitly state when not to use or list alternatives, but given the context of many sibling tools, it is adequate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

email_domain_health_checkA

Read-onlyIdempotent

Inspect

Comprehensive email domain health check: MX routing, SPF authentication, DKIM signing, DMARC policy enforcement, DNSBL blacklist status (Spamhaus/SpamCop/Barracuda), TLS certificate validity, and WHOIS registration age. Aggregates a reputation score 0-100 and generates P0/P1/P2 deliverability signals. Accepts a domain (stripe.com) or email address (info@stripe.com). Detects role-based addresses (info@, support@, admin@, noreply@) that have higher bounce rates. Detects email provider (Google Workspace, Microsoft 365, Amazon SES, etc.). P0 signals: blacklisted / no MX / TLS expired / no SPF + DMARC none. P1 signals: SPF soft-fail / no DKIM selector / DMARC no reporting. P2 signals: role-based address / TLS expires <30d / domain age <90 days. All checks are keyless (no API keys required). Cache TTL 1h. SLA <=10s p95.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`email`	No	Full email address for additional checks: format validity, role-based detection (e.g. "ceo@stripe.com").
`checks`	No	Subset of checks to run. Defaults to all 8: ["mx","spf","dkim","dmarc","blacklist","whois","tls","reputation"]. Use a subset for faster responses (e.g. ["mx","spf","dmarc","reputation"] for quick scoring).
`domain`	Yes	Domain to check (e.g. "stripe.com" or "@stripe.com"). If an email address is provided here, the domain is extracted automatically.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mx`	Yes
`spf`	Yes
`tls`	No
`dkim`	Yes
`dmarc`	Yes
`whois`	No
`domain`	Yes
`status`	Yes
`sources`	Yes
`blacklist`	Yes
`email_valid`	No
`quality_score`	Yes
`reputation_score`	Yes
`email_is_role_based`	No
`deliverability_signals`	Yes

Tool Definition Quality

A4.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint true, idempotentHint true, destructiveHint false. Description adds that it is keyless, has cache TTL 1h, SLA <=10s p95, and generates reputation score/signals. No contradiction; adds useful behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single dense paragraph with all essential info: checks, signals, inputs, performance, caching, keyless. Every sentence adds value, no redundancy. Well structured and concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given tool complexity (many checks, signals, optional async, performance details), description covers input, parameters, output types, performance guarantees, and operational details (keyless, cache TTL). Output schema exists but description still explains outputs. Very complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds significant value: explains domain accepts email, email for additional checks, checks parameter with defaults and subset recommendation, role-based detection, provider detection. Much more than schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs a 'comprehensive email domain health check' and lists specific checks (MX, SPF, DKIM, DMARC, etc.), outputs like reputation score and deliverability signals. It is distinct from sibling tools, which are business/strategy related.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains acceptable inputs (domain or email), async option, and default checks. It does not explicitly state when not to use, but the specificity makes usage clear. No directly competing sibling is present.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

enps_autoC

Read-only

Inspect

eNPS automatisé — Gapup agent-payable C-suite expertise (CHRO). Returns a structured, audited deliverable. Reference case: BlaBlaCar — eNPS pulse mensuel · 700 FTE 8 pays · segments × tenure × manager · plays correctifs ciblés. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`context`	Yes
`toolStack`	Yes
`segmentation`	Yes
`presenterScript`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description confirms it returns a deliverable and mentions server-side validation, adding marginal context. No contradictions, but no deep behavioral disclosure beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is short (3 sentences) and front-loaded with purpose, but contains marketing fluff ('Gapup agent-payable C-suite expertise'). Could be more concise and directly describe inputs/outputs.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, 7 parameters with 4 required, and 14% schema coverage, the description is insufficient. It does not specify input structure or expected result, leaving the agent underinformed for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 14% (only async has description). The description says 'send the documented case fields' but does not explain any required parameters (company, segmentation, context, toolStack). This fails to compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a structured, audited deliverable for eNPS automation. It includes a reference case, making the purpose clear and distinct from generic reporting tools. However, it could be more specific about the deliverable content.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like qbr_auto or knowledge_base_auto. The description only mentions server-side validation and the reference case, lacking context on exclusions or preferred scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

esg_audit_multiA

Read-only

Inspect

Multi-mode ESG intelligence for ESG analysts, sustainability officers and impact investing fund managers. Aggregates live data from CDP, SBTi, Wikipedia, Yahoo Finance and web search across five modes: • company_score — ESG score 0-100 with E/S/G breakdown + heuristic rating (AAA-CCC), from CDP grade + SBTi + sector profile • controversy_check — controversies detected via web search, classified P0/P1/P2 by type (greenwashing, emissions fraud, labour, governance) • emissions — GHG Scope 1/2/3 estimates, SBTi validation flag, net-zero target year, carbon intensity per M€ revenue • esrs_readiness — CSRD gap across 12 standards (E1-E5, S1-S4, G1-G3): readiness % + gap list + CSRD deadline + effort man-days • sfdr_classification — suggested SFDR Article 6/8/9 with rationale and sustainability indicators met

Signals: P0=critical (controversy/score<40), P1=significant (score<55/SBTi missing/ESRS<50%), P2=watch. Cache 24h.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analysis mode.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Company name, ticker, ISIN or LEI (e.g. "Microsoft", "Sanofi", "Volkswagen").
`pillar`	No	ESG pillar filter (optional, default: all).
`framework`	No	ESG framework filter (optional, default: all).

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`emissions`	No
`company_score`	No
`controversies`	No
`quality_score`	Yes
`esrs_readiness`	No
`sfdr_classification`	No

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly and openWorld. Description adds behavioral details: live data from CDP, SBTi, Yahoo Finance, web search; caching for 24h; signal levels P0/P1/P2. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is information-dense with structured list of modes, but could be slightly trimmed. Front-loads purpose effectively. Every sentence contributes, though total length is high for a tool description.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 modes, 5 parameters, output schema exists), the description covers sources, mode outputs, signals, and caching. No major gaps identified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% but description enriches mode parameter by explaining each mode's output and signals, adding value beyond the schema's 'Analysis mode' default. Other parameters are adequately described.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly identifies the tool as a multi-mode ESG intelligence aggregator with five specific analysis modes (company_score, controversy_check, etc.). Distinguishes from siblings by covering multiple ESG dimensions and data sources in one call.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies usage through mode list but does not explicitly state when to use this tool versus other ESG tools like supplier_esg_audit, carbon_footprint_calculator, or sustainability_report. Missing when-not and alternative tool guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

esrs_narrative_builderC

Read-only

Inspect

Architecte du narratif ESRS / CSRD — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: L'Oréal France — narratif ESRS E1+E5 + S1 + G1 · CSRD reporting 2025-2026 · double-matérialité chiffrée. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`scope`	Yes
`company`	Yes
`context`	Yes
`presenterScript`	No

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint=true and openWorldHint=true, establishing the tool as read-only and possibly broad in scope. The description adds that it returns an audited deliverable and performs server-side validation, providing some behavioral context beyond annotations. However, it does not describe potential side effects or limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (3 sentences) and includes a reference case, but the front-loaded first sentence is the title. The reference case adds specificity but may be unnecessary. Overall, it is fairly concise, though it could remove the parenthetical 'SUSTAINABILITY' for clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 nested parameters, no output schema, low schema description coverage), the description is severely incomplete. It does not explain what the deliverable contains, how to structure the input, or what to expect in return, making it insufficient for an agent to use effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, and the description does not explain any parameters beyond the vague 'send the documented case fields.' The description fails to compensate for the low schema coverage, adding no meaning to the complex nested properties.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description identifies the tool as an 'Architecte du narratif ESRS/CSRD' that returns a structured, audited deliverable. The reference case provides a concrete example, distinguishing it from generic sustainability tools. However, the purpose is somewhat vague due to French jargon and lack of explicit differentiation from closely related siblings like sustainability_report.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states 'Inputs are validated server-side — send the documented case fields,' but gives no guidance on when to use this tool versus alternatives such as sustainability_report or rse_policy_builder. No exclusions or context are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

event_marketingC

Read-only

Inspect

Marketing événementiel — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Pennylane (€120k/an budget événements) — 7 événements sélectionnés · coût-MQL -38% vs année précédente. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`teamSize`	Yes
`geography`	Yes
`objectives`	Yes
`currentEvents`	Yes
`targetAudience`	Yes
`annualBudgetEur`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, which the description does not contradict. The description adds that inputs are validated server-side, but does not disclose any other behavioral traits (e.g., mutation, auth needs, rate limits). With annotations, the description adds minimal value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short (two sentences) but contains jargon ('Gapup agent-payable C-suite expertise') and a reference case that may not be universally helpful. It could be clearer and more relevant.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 parameters, nested objects, no output schema), the description is severely lacking. It does not explain the deliverable's structure, return format, or prerequisites, leaving the agent with insufficient information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 13%, and the description does not explain any parameters besides 'send the documented case fields'. It fails to add meaning beyond the schema, especially for the many undocumented fields.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a structured, audited deliverable for event marketing, targeting C-suite (CMO). This provides a clear verb and resource, but does not differentiate it from siblings like marketing_roi_dashboard, so it's specific but not sufficiently distinctive.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. A reference case is given, but no explicit conditions, prerequisites, or when-not-to-use instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

executive_comp_peer_benchmarkA

Read-onlyIdempotent

Inspect

As a Chief Human Resources Officer (CHRO), benchmark executive compensation packages against peer companies using public SEC filings and private compensation data from Equilar and Bloomberg. Inputs include executive name, title, company ticker, and peer group criteria. Outputs structured compensation metrics (base salary, bonus, equity, total compensation) with source attribution and confidence scores.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`peerGroup`	No
`fiscalYear`	No
`companyTicker`	Yes
`executiveName`	Yes
`executiveTitle`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`compensation`	No

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint, idempotentHint, and openWorldHint. Description adds behavioral clarity by specifying data sources (SEC filings, Equilar, Bloomberg) and output attributes (structured metrics, attribution, confidence scores), going beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is concise (3 sentences), front-loads the user role and purpose, and avoids redundancy. Every sentence adds value, though could be slightly more structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested parameters, output schema), the description covers source, input types, and output format. Missing details like typical output size or pagination are acceptable as output schema exists. Overall adequate for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (only async documented). Description lists parameter categories (executive name, title, ticker, peer group) but lacks semantic detail like format, constraints, or relationships. Baseline is 3 due to low coverage, and description marginally compensates.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it benchmarks executive compensation against peers, specifying the role (CHRO) and data sources. While it distinguishes itself from general compensation tools, it does not explicitly differentiate from sibling tools like comp_benchmark_geo_delta or comp_plan_architect.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context (CHRO role) implying use case, but lacks explicit guidance on when not to use, prerequisites, or alternatives among sibling tools. Agent must infer suitability from the description.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

financial_model_3statementA

Read-only

Inspect

Pure-compute 3-statement financial model builder (Income Statement + Balance Sheet + Cash Flow). Feed assumptions (revenue growth, COGS%, OpEx, CapEx, working capital, tax rate, depreciation, debt schedule) → receive a full 3-5 year projection with integrated DCF valuation. Supports IFRS / US_GAAP / PRC_GAAP (中国会计准则) norms with bilingual ZH+EN labels for PRC. Modes: build (full 3-statement model) | scenario_analysis (base/bull/bear ±20% growth) | sensitivity (1 KPI × 1 input, 5-point grid). No external data needed — all computed from assumptions. ICP: VC due diligence, M&A analysts, CFO SMB, startup founders pitching investors, biotech/SaaS modeling. Returns balance_check_ok per year, DCF enterprise/equity value, and coherence warnings.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	build = full 3-statement model \| scenario_analysis = base/bull/bear \| sensitivity = 1 KPI × 1 input
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`assumptions`	Yes	Financial assumptions for the model
`sensitivity_kpi`	No	KPI to observe in sensitivity mode.
`sensitivity_input`	No	Assumption param to vary in sensitivity mode. E.g. 'growth_rates_pct[0]' or 'cogs_pct_of_revenue'.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`norms`	Yes
`status`	Yes
`sources`	No
`warnings`	Yes
`cash_flow`	No
`scenarios`	No
`sensitivity`	No
`balance_sheet`	No
`quality_score`	Yes
`valuation_dcf`	No
`income_statement`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, and the description adds 'Pure-compute' and 'No external data needed', reinforcing the non-destructive nature. However, it does not disclose the async parameter behavior or potential execution time, which would be helpful.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the purpose. It covers modes, accounting standards, ICP, and outputs. It could be slightly shorter but remains effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of the tool, the description covers the main aspects: modes, assumptions, accounting standards, and return values. However, it omits information about the async option and does not fully explain the relationship between assumptions and outputs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description provides an overview of what assumptions are needed but does not add significant per-parameter detail beyond what the schema already offers.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a 'pure-compute 3-statement financial model builder' that takes assumptions and returns projections with DCF valuation. It distinguishes itself by listing modes and intended users, and the verb-resource pairing is specific and unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context on when to use (feed assumptions, no external data needed) and lists modes. However, it does not explicitly contrast with sibling tools like working_capital or budget_variance_ai, missing an opportunity for clear differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

fraud_detectorC

Read-only

Inspect

Détecteur de fraude — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: TechManu SAS — Industriel FR €32M CA, 148 FTE · 30j · 21 anomalies · €487k à risque. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`analysisPeriodDays`	Yes
`transactionVolumes`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that it returns a deliverable and mentions server-side validation, but does not clarify costs (agent-payable implies potential charges) or side effects beyond annotations. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is relatively concise (three sentences), but includes a reference case that may be extraneous. The French language could hinder English-speaking agents. Structure is adequate but could be more streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex input schema with nested objects and no output schema, the description omits details on return structure, error handling, or execution time. The async parameter is explained in schema but not reinforced in description. The tool is moderately complex but the description does not fill gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (20%), with only the 'async' parameter documented. The description vaguely says 'send the documented case fields' but does not explain the purpose or format of 'focus', 'company', 'analysisPeriodDays', or 'transactionVolumes' beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies it as a fraud detector that returns a structured, audited deliverable, with a concrete reference case. However, it does not explicitly differentiate from sibling fraud detection tools (e.g., affiliate_fraud_clickstream_detector, x402_payment_fraud_analyzer), so score is 4.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives, nor any when-not-to-use conditions. It only mentions inputs are validated server-side and to send documented case fields, which is not enough context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_business_ideasA

Read-only

Inspect

Return vetted, automation-scored business ideas from the FTG idea bank — each with an autonomy score, monetization model and conservative/median/optimistic MRR projections. When to use this tool: an agent or founder wants ranked, buildable business ideas. Input: optional category and limit.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No
`category`	No	Optional category filter

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`ideas`	Yes

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, so description's role is additive. It details output contents (autonomy score, monetization model, MRR projections), which goes beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with clear, front-loaded structure: first sentence describes output, second gives usage and param hints. No redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, the description covers purpose, usage, and key inputs. It could detail the effect of the limit parameter, but overall it is adequate for a read-only tool with well-documented annotations and output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 67% (2 of 3 params have descriptions). The description mentions 'optional category and limit' but adds minimal meaning beyond the schema's param descriptions and constraints. The async parameter is well-documented in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns 'vetted, automation-scored business ideas' with specific metrics (autonomy score, monetization model, MRR projections). It distinguishes from siblings like ftg_business_plan (which generates plans) by specifying 'from the FTG idea bank' and 'ranked, buildable ideas,' though it does not explicitly contrast with all related tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit usage guidance: 'When to use this tool: an agent or founder wants ranked, buildable business ideas.' It sets clear context but does not mention when not to use or list alternatives among the many sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_business_planA

Read-only

Inspect

Return the business plan for a market-gap opportunity — direct-trade or local-production, with CAPEX, OPEX, ROI, payback period, automation level and the full plan. Cache-first: returns the stored plan when available, otherwise reports that generation is required (the FTG platform produces plans on demand). When to use this tool: an agent has an opportunity_id (from ftg_market_gap) and needs the investable plan. Input: an opportunity_id.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`opportunity_id`	Yes	Opportunity id obtained from ftg_market_gap

Output Schema

ParametersJSON Schema

Name	Required	Description
`plans`	No
`status`	Yes
`message`	No
`plan_count`	No
`opportunity_id`	Yes

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint and openWorldHint. Description adds cache-first behavior and generation-on-demand explanation, which is valuable beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Compact description with logical flow: purpose, behavior, usage, input. Slightly redundant but effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers key behavioral aspects (cache-first), dependencies, and input. Output schema exists so return details are covered elsewhere.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good param descriptions. Description only reiterates opportunity_id input; does not add new semantic value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Return the business plan for a market-gap opportunity' with specific content (CAPEX, OPEX, ROI, etc.). Does not explicitly differentiate from siblings like ftg_production_economics but notes dependency on ftg_market_gap.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use advice: 'an agent has an opportunity_id (from ftg_market_gap) and needs the investable plan.' Also mentions cache-first behavior as a usage consideration.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_country_regulationsA

Read-only

Inspect

Return import, trade and production regulations for a country — category, title, summary and source. When to use this tool: an agent checks regulatory or compliance requirements before trading or producing in a market. Input: a country, with an optional category.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No
`country`	Yes	Country ISO code or name
`category`	No	Optional regulation category filter

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`regulations`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds the output structure (category, title, summary, source), providing context beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences: output, when to use, input. No wasted words. Front-loaded with the main purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers what, when, and input adequately. Has output schema and annotations. Lacks mention of async behavior, but that is in the schema. Overall complete for a lookup tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is high (75%), so baseline is 3. The description mentions 'country' and optional 'category' but does not address 'async' or 'limit'. It adds modest value by noting category is optional.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns import, trade, and production regulations with specific fields (category, title, summary, source). It distinguishes itself from sibling tools like ftg_country_study by focusing on regulations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use: when an agent checks regulatory or compliance requirements before trading or producing. It does not explicitly mention when not to use or alternatives, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_country_studyA

Read-only

Inspect

Return the in-depth FTG country study — multi-part structured analysis of a country's trade and production landscape. When to use this tool: an agent needs deep country context before a sourcing, export or investment decision. Input: a country.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`country`	Yes	Country ISO code or name

Output Schema

ParametersJSON Schema

Name	Required	Description
`note`	No
`parts`	Yes
`country`	Yes
`part_count`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds that it returns a multi-part structured analysis, providing useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, with three short sentences that front-load the core purpose. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and full parameter documentation, the description adequately covers purpose, usage context, and input. No missing critical information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with both parameters documented. The description adds minimal extra meaning beyond the schema, merely restating 'Input: a country.' Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns an in-depth FTG country study, a multi-part structured analysis, which is specific and distinct from sibling tools like ftg_country_regulations or ftg_production_economics.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool: when deep country context is needed for sourcing, export, or investment decisions. It does not mention alternatives, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_investor_directoryA

Read-only

Inspect

Return investors from the FTG directory — VC, PE and impact funds with type, firm, website, ticket-size range, sectors and stages of interest. When to use this tool: an agent builds a fundraising shortlist. Input: optional country and limit.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No
`country`	No	Optional country ISO code or name

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`investors`	Yes

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, indicating safe read operation. The description adds details about returned fields but doesn't contradict or significantly extend behavioral disclosure beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences plus a short input line, all front-loaded with key information. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the existence of an output schema (though not shown), the description adequately outlines the tool's purpose and inputs. It mentions output fields implicitly but not pagination or response format, which is acceptable for a simple lookup tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema describes two of three parameters (async and country) with descriptions; limit only has type constraints. The description adds 'Input: optional country and limit' which marginally reinforces parameter usage but doesn't add new meaning. Schema coverage is high (67%), so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns investors from the FTG directory with specific attributes like type, firm, website, etc. It distinguishes from siblings like 'investor_list' and 'investor_shortlist' by specifying the FTG source and the detailed fields returned.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'When to use this tool: an agent builds a fundraising shortlist,' providing clear context for usage. However, it does not mention when not to use it or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_market_gapA

Read-only

Inspect

Return the import/production market-gap opportunities for a country — commodities where local demand outpaces local supply. Each opportunity carries the gap value (USD/year), the gap volume (tonnes/year), a 0-100 opportunity score and the potential margin. When to use this tool: an agent needs to know what a country structurally under-produces or over-imports, for trade sourcing, import/export or local-production investment decisions. Input: a country (ISO-2 code or name).

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No	Maximum opportunities to return (default 20)
`country`	Yes	Country ISO-2 code (e.g. 'SN', 'KE') or name (e.g. 'Senegal')

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`country`	Yes
`opportunities`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds value by detailing output fields (gap value, volume, score, margin), giving agents a clear picture of what to expect.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: one paragraph with clear sentences for purpose, output, usage, and input. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown) and annotations, the description covers purpose, usage, and output fields. Minor gaps like pagination or error handling are acceptable for this simple lookup tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions. The description repeats the country input format but adds no significant extra meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns 'import/production market-gap opportunities for a country' and specifies the data fields. It distinguishes itself from sibling tools by focusing on market gaps for trade sourcing decisions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use scenarios (trade sourcing, import/export, local-production investment) and input format. Lacks when-not-to or alternative tool mentions, but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_opportunity_scoutA

Read-only

Inspect

Rank the best countries for a given commodity — where the market gap, opportunity score and potential margin are highest. Cross-country scouting. When to use this tool: an agent has a commodity and needs to know WHERE to sell, export to or set up local production. Input: a commodity name.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No	Maximum countries to return (default 20)
`commodity`	Yes	Commodity name (e.g. 'rice', 'soybean', 'poultry')

Output Schema

ParametersJSON Schema

Name	Required	Description
`note`	No
`count`	Yes
`commodity`	Yes
`countries`	Yes

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and open-world behavior. The description adds that the tool returns a ranking based on specific metrics, which provides useful behavioral context beyond structured fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, with two sentences plus a usage guideline. No extraneous information, and it is front-loaded with the primary action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and the tool's relative simplicity, the description is sufficiently complete. It covers the tool's purpose, input, and output characteristics. Minor omission of handling edge cases, but overall adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining that the output will include market gap, opportunity score, and potential margin, which are not detailed in the parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action (rank countries) and purpose (identify best countries based on market gap, opportunity score, and margin). It uses specific verbs and resource, and distinguishes from sibling tools by specifying cross-country scouting with a commodity input.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use: when an agent has a commodity and needs to know where to sell, export, or set up production. It does not exclude alternatives but provides a clear decision rule.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_production_economicsA

Read-only

Inspect

Return production cost benchmarks (CAPEX/OPEX per unit, value ranges, scenarios, quality tiers) and agronomic yields (t/ha, cycles per year) for a commodity. When to use this tool: an agent sizes the economics of producing a commodity. Input: a commodity, with an optional country.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No
`country`	No	Optional country ISO code or name
`commodity`	Yes	Commodity name or slug

Output Schema

ParametersJSON Schema

Name	Required	Description
`yields`	Yes
`commodity`	Yes
`cost_benchmarks`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint=true and openWorldHint=true, which already inform the agent that the tool is safe and side-effect free. The description adds context about the output (cost benchmarks, yields) but does not conflict with annotations. It does not address behavior like pagination or async, but annotations reduce the burden.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: first detailing outputs, second providing usage context and inputs. It is front-loaded with the most important information and contains no redundant text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With annotations (readOnlyHint, openWorldHint) and an output schema available, the description is fairly complete. It explains the output, when to use, and required/optional parameters. It lacks details on pagination or async, but overall it is sufficient for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 75% (limit missing description). The description highlights commodity and country as key inputs, tying them to the tool's purpose. It does not explain async or limit, leaving some burden on the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns production cost benchmarks (CAPEX/OPEX per unit, value ranges, scenarios, quality tiers) and agronomic yields (t/ha, cycles per year) for a commodity. It specifies the verb 'Return' and the resource 'production cost benchmarks and agronomic yields', distinguishing it from sibling tools like ftg_market_gap or ftg_production_methods.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a specific use case: 'When to use this tool: an agent sizes the economics of producing a commodity.' It mentions the required input (commodity) and optional country. However, it does not explicitly state when not to use or suggest alternative tools, though the sibling context implies differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_production_methodsA

Read-only

Inspect

Return the production methods for a commodity — each with a description, ordered process steps, pros/cons and a popularity rank. Methods are commodity-canonical: one curated set per commodity, reusable across every country. When to use this tool: an agent evaluates HOW a commodity is produced or processed, compares techniques, or builds a production plan. Input: a commodity slug or name.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`commodity`	Yes	Commodity slug or name (e.g. 'rice', 'tomato', 'cashew')

Output Schema

ParametersJSON Schema

Name	Required	Description
`note`	No
`methods`	Yes
`commodity`	Yes
`method_count`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. Description adds context that methods are curated and reusable across countries, reinforcing the read-only, open-world nature without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three focused sentences: output, usage, input. No fluff, front-loaded with purpose, every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Tool has output schema and two parameters. Description covers output content, usage context, and input format. For a read-only tool, this is comprehensive; missing error handling details are acceptable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline applies. Description mentions 'commodity slug or name' which echoes the schema description, adding minimal new meaning beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it returns production methods for a commodity, detailing the content (description, steps, pros/cons, rank) and scope (commodity-canonical). This distinguishes it from sibling tools like ftg_production_economics or ftg_country_study.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit 'When to use' section specifies contexts (evaluate how a commodity is produced, compare techniques, build production plan). Does not mention when not to use, but provides clear context for appropriate invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_seller_catalogA

Read-only

Inspect

Return seller catalogues registered on FTG — exporters and producers with their commodity, monthly capacity, certifications and target export markets. When to use this tool: an agent builds a supplier or sourcing shortlist. Input: optional seller country and commodity.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No
`country`	No	Optional seller country ISO code or name
`commodity`	No	Optional commodity filter

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`sellers`	Yes

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. Description adds detail about returned data fields but does not cover behavior like pagination, rate limits, or result format. Adequately supplements annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences only: first states purpose and data content, second gives usage context and input hints. No redundant words, perfectly front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given schema covers params and output schema exists (not shown), description adequately explains what the tool returns and when to use it. Missing mention of limit parameter or async behavior, but those are covered in schema descriptions.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is high (75%) and descriptions for async, country, commodity are clear. Description only reiterates optional country and commodity, adding no new semantic value beyond schema. No mention of limit or async.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns seller catalogues with specific data fields (commodity, capacity, etc.) and provides a use case. However, it does not explicitly differentiate from sibling FTG tools like ftg_sourcing_buyers.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Includes explicit when-to-use guidance ('an agent builds a supplier or sourcing shortlist'). Lacks when-not-to-use or alternative tool references, but the usage context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_sourcing_buyersA

Read-only

Inspect

Return verified local buyers in a country — companies sourcing a given commodity, with buyer type, city, website, annual volume range and certification requirements. When to use this tool: an agent builds a sourcing or export shortlist, or needs real B2B demand contacts in a market. Input: a country and an optional commodity filter.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No	Maximum buyers to return (default 20)
`country`	Yes	Country ISO-2 code or name
`commodity`	No	Optional commodity slug to filter buyers by

Output Schema

ParametersJSON Schema

Name	Required	Description
`buyers`	Yes
`country`	Yes
`commodity`	No
`buyer_count`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint and openWorldHint. The description adds that the tool returns 'verified' buyers and lists specific output fields (buyer type, city, website, etc.), providing useful context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is highly concise: two sentences plus a short usage guideline and input line. Every sentence adds value, and the main purpose is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema and 100% schema coverage, the description adequately covers the return fields and usage context. It mentions verification and specific fields, but could add more about rate limits or data freshness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema already documents all four parameters. The description reinforces the main parameters (country and optional commodity) but adds no additional semantic detail beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns verified local buyers in a country, specifying the verb (return) and resource (verified local buyers). It distinguishes itself from sibling ftg_ tools by focusing on buyer contacts, not business ideas or investor directories.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: building a sourcing or export shortlist, or needing real B2B demand contacts. It provides clear context but does not explicitly mention alternatives or when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

funding_hunterC

Read-only

Inspect

Chasseur de financements — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: PME deeptech cleantech FR €8M CA — top 30 dispositifs BPI+France2030+EU+VC. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`project`	Yes
`financials`	Yes
`preferences`	Yes

Tool Definition Quality

C2.7/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds minimal behavioral context beyond 'returns a structured, audited deliverable' and 'inputs validated server-side'. It does not disclose what external data sources are accessed, any authentication requirements, or performance characteristics.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (3 sentences) and front-loaded with the tool's name and purpose. It is efficient but could be better structured with explicit sections or bullet points. The mix of French and English may confuse some agents.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 nested parameters, no output schema), the description is incomplete. It does not explain the deliverable format, pagination, or error handling. The reference case helps but is insufficient for an agent to fully understand the tool's capabilities and limitations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description provides no additional meaning for the input schema parameters beyond the generic 'send the documented case fields'. With low schema description coverage (20%), the description fails to compensate by explaining the purpose or format of key nested objects like 'company' or 'financials'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's verb ('hunts for funding') and resource ('structured, audited deliverable'), and provides a specific reference case (PME deeptech cleantech). However, it does not explicitly distinguish from sibling tools like 'capital_strategy' or 'investor_list', which share some domain overlap.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lacks explicit guidance on when to use or not use this tool versus alternatives. The reference case hints at a target profile (French deeptech cleantech SME), but no exclusion criteria or alternative tool recommendations are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

fx_rateA

Read-only

Inspect

Get the current or historical foreign-exchange rate for any currency pair — the exact exchange rate, FX rate or conversion rate an agent needs to convert a currency amount or feed a finance, trading, invoicing or pricing workflow. Covers EUR/USD, USD/JPY, GBP/EUR and every ISO-4217 currency pair. Returns the latest spot rate, or a historical rate by date. Use when a workflow needs a precise live or past currency exchange rate, or to convert money between two currencies. Source: European Central Bank reference rates via Frankfurter. Inputs: from/to ISO-4217 currency codes, optional date (YYYY-MM-DD).

ParametersJSON Schema

Name	Required	Description
`to`	Yes	Quote currency, ISO-4217 (e.g. USD)
`date`	No	Optional YYYY-MM-DD for a historical rate
`from`	Yes	Base currency, ISO-4217 (e.g. EUR)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.

Output Schema

ParametersJSON Schema

Name	Required	Description
`to`	Yes
`from`	Yes
`rate`	Yes
`as_of`	Yes
`source`	Yes
`source_url`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds valuable behavioral context by naming the data source (European Central Bank via Frankfurter) and stating it returns a spot or historical rate, which goes beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with 3-4 sentences, front-loading the purpose and key inputs. It is well-structured but could be slightly more compact without losing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity, rich schema, and existing output schema, the description adequately covers purpose, usage, data source, and inputs. It does not need to explain return values as the output schema handles that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds minimal value by mentioning ISO-4217 codes and date format, but does not elaborate on the 'async' parameter which is already described in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Get the current or historical foreign-exchange rate for any currency pair', using a specific verb and resource. It covers all ISO-4217 pairs, distinguishing it from sibling tools which are mostly unrelated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises using the tool when a workflow needs a precise live or past currency exchange rate, which is clear context. However, it does not explicitly state when not to use it or provide alternatives, though no direct competitor exists.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

geographic_expansionB

Read-only

Inspect

Expansion géographique — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Expansion 4 marchés (DE/UK/ES/NL) · €1.8M budget · ARR cible €3.2M Y2. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`product`	Yes
`financials`	No
`constraints`	No
`targetMarkets`	Yes
`preferredEntryMode`	No
`expansionHorizonMonths`	No

Tool Definition Quality

B3.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that the tool returns an 'audited deliverable' and that inputs are validated server-side, which aligns with the read-only nature and external data access implied by openWorldHint. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief and mostly to the point. The reference case adds context but could be considered extraneous. Overall, it is not verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 parameters, nested objects, no output schema), the description lacks detail on return value format, parameter relationships, and usage constraints. The phrase 'structured, audited deliverable' is vague.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (13%), and the description does not elaborate on the meaning of parameters beyond 'send the documented case fields.' For a tool with nested required objects (company, product, targetMarkets), this is insufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is for geographic expansion, returns a structured deliverable, and provides a concrete reference case. However, it does not explicitly differentiate from sibling tools like market_entry_strategist or market_sizing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description offers no guidance on when to use this tool versus alternatives, nor does it mention when not to use it. It only instructs to 'send the documented case fields.'

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

geo_logistics_intelA

Read-only

Inspect

Geospatial logistics intelligence for supply chain, maritime and transport agents. Four modes: (1) geocode_batch — resolve up to 50 addresses to lat/lon with confidence scores (OSM Nominatim + Open-Meteo fallback, 1 req/s rate-limit respected); (2) routing — road/cycling/walking route with distance_km, duration_seconds and ETA ISO timestamp between two addresses or lat/lon points (OSRM public, keyless, global); (3) port_congestion — congestion status for any UN/LOCODE port (e.g. NLRTM, SGSIN, CNSHA) with waiting vessel count, severity (low/medium/high/extreme) and average wait hours; (4) ship_tracking — AIS position, speed, course, destination and ETA for a vessel by its 9-digit MMSI. No API key required for geocode/routing/port. Optional env: AIS_STREAM_API_KEY for live ship data (otherwise MarineTraffic scrape best-effort). SLA: <=25s p95. Cache: 24h geocoding / 1h routing / 30min port / 5min ship. Quality score 0-100. Status: final/partial/failed.

ParametersJSON Schema

Name	Required	Description
`to`	No	routing only: destination address or 'lat,lon'
`from`	No	routing only: origin address or 'lat,lon'
`mode`	Yes	'geocode_batch': address -> lat/lon. 'routing': route + ETA. 'port_congestion': UN/LOCODE port state. 'ship_tracking': vessel by MMSI
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Primary input: address for geocode/routing, UN/LOCODE (e.g. NLRTM) for port_congestion, 9-digit MMSI for ship_tracking
`addresses`	No	geocode_batch only: up to 50 addresses (overrides query if provided)
`mode_transport`	No	routing only: transport mode. Default: driving

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`routing`	No
`sources`	Yes
`geocode_batch`	No
`quality_score`	Yes
`ship_tracking`	No
`port_congestion`	No

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, etc.), the description discloses data sources (OSM, OSRM, MarineTraffic), rate limits (1 req/s), caching duration per mode, SLA (≤25s p95), quality score range, and status fields. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph but well-organized with clear mode labels. Each sentence adds value, covering modes, sources, usage, and constraints. It could benefit from bullet points for easier scanning, but it remains concise given the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers all four modes with inputs, outputs, data sources, rate limits, caching, SLA, and key requirements. With an output schema present, it appropriately omits detailed return field documentation. It is complete for understanding the tool's capabilities.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds meaning by explaining the mode enum choices, how the query parameter maps to different modes, and the role of optional parameters like to, from, addresses, mode_transport, and async. It also clarifies fallback behavior and key requirements.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Geospatial logistics intelligence for supply chain, maritime and transport agents' and enumerates four distinct modes with specific verbs (geocode_batch, routing, port_congestion, ship_tracking). Each mode's function is explicitly defined, and the tool is self-contained with no ambiguity versus siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use each mode by specifying the input and output for each. It also clarifies API key requirements (none for three modes, optional for ship_tracking) and mentions rate limits. It lacks explicit when-not-to-use guidance but is otherwise clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

global_salary_inflation_adjusterA

Read-onlyIdempotent

Inspect

Adjusts salary benchmarks for local inflation using OECD, IMF, and World Bank data. Designed for CHROs to normalize compensation across regions with accurate inflation adjustments. Inputs include country codes, base salary, and reference year. Outputs inflation-adjusted salary with data sources and warnings.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`baseSalary`	Yes
`targetYear`	No
`countryCode`	Yes
`referenceYear`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`targetYear`	No
`countryCode`	No
`inflationRate`	No
`referenceYear`	No
`adjustedSalary`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint) indicate safe, repeatable behavior. The description adds context: inputs include country code, base salary, reference year; outputs include inflation-adjusted salary with data sources and warnings. No contradiction with annotations. The description enriches understanding of the tool's behavior beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each contributing unique information. The main action is front-loaded. Could be slightly more concise by omitting the explicit target audience ('Designed for CHROs') but still efficient overall.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, the description adequately covers return value (inflation-adjusted salary with data sources and warnings). It addresses inputs and purpose. However, it does not mention the openWorldHint (external data may vary) or any limitations, which would improve completeness for a tool relying on external data sources.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only 'async' described). The description lists three key inputs (countryCode, baseSalary, referenceYear) but misses the optional 'targetYear' parameter. It provides basic meaning but not full compensation for low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: 'Adjusts salary benchmarks for local inflation' using specific data sources (OECD, IMF, World Bank). It identifies the target user (CHROs) and distinguishes it from sibling tools, none of which match this specialized inflation adjustment purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the target audience (CHROs) and use case (normalize compensation across regions), implying when to use it. However, it does not provide explicit comparisons to sibling tools like 'comp_benchmark_geo_delta' or 'executive_comp_peer_benchmark', nor does it specify when not to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

gl_reconcilerC

Read-only

Inspect

GL Reconciler — Réconciliation grand livre — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Answers: Identify the root causes of the GL breaks in 's ledger for — cluster them and rank by materiality. · For Q close: which accounts have unreconciled items over €? Provide a sign-off routing and resolution plan. · Run an automated GL reconciliation for — AR/AP/intercompany entries — flag open items, suggest journal entries. · What are the top 5 systemic control weaknesses causing recurring GL breaks at ? Recommend preventive controls. · Generate a month-end close reconciliation report for — breaks by account type, aging analysis, sign-off assignments. Reference case: Acme SaaS Q4 2026 — 47 breaks GL, €1.4M variance non postée. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`entity`	Yes
`ledgerContext`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and openWorldHint=true, so the tool is read-only and may use external knowledge. The description adds that inputs are validated server-side and that it returns an audited deliverable. It does not mention async behavior, rate limits, or other side effects, but the core behavioral traits are covered by annotations and description combined.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose, including French text, a specific case reference ('Acme SaaS Q4 2026'), and multiple example queries. While it front-loads the purpose, it contains unnecessary details that could be streamlined. Every sentence does not earn its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 4 parameters, low schema coverage, and no output schema, the description should provide more context. It fails to explain the async parameter (beyond schema), the focus parameter, and the nested entity and ledgerContext objects fully. The examples are helpful but not comprehensive enough to make the tool self-contained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25%, meaning most parameters (entity, ledgerContext, focus, async) lack meaning in the schema. The tool description does not explain these parameters; it only loosely refers to 'send the documented case fields'. This does not compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs GL reconciliation and returns a structured, audited deliverable. It lists specific questions it answers, making the purpose explicit. However, it does not differentiate from sibling tools, lowering the score from 5 to 4.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides multiple example queries (e.g., 'Identify root causes of GL breaks', 'Run automated GL reconciliation') that imply usage scenarios. However, it lacks explicit guidance on when not to use the tool or how it compares to alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

gov_procurement_multiA

Read-only

Inspect

Aggregate public procurement tenders (calls for tender / appels d'offres) from multiple government sources simultaneously: TED Europa v3 (27 EU countries, keyless API), BOAMP France (opendatasoft, keyless), UK Contracts Finder (OCDS standard, keyless), SAM.gov United States (requires SAM_GOV_API_KEY env var), and bund.de Germany (HTML scraping, partial). Returns structured tender records with buyer authority, EU CPV sector code, estimated contract value converted to EUR via live FX rates, submission deadlines, and direct notice URLs. Use when: a B2G agent needs to find government contract opportunities matching keywords across multiple jurisdictions; building a pipeline of public tenders for bid/no-bid qualification; monitoring a domain by CPV code; market sizing public sector spend. Key inputs: query (keywords), countries (ISO-2 array), cpv_codes (EU standard codes, e.g. 72000000=IT services, 45000000=construction, 79000000=business services), min_value_eur (filter), published_after (ISO date, defaults to 30 days ago). SLA: <=25s p95 (all sources fetched in parallel, 8s budget per source). Optional env var SAM_GOV_API_KEY enables US federal tenders (free key at api.sam.gov). Quality score: 25 pts if TED EU retrieved, 15 pts per other source retrieved (max 60), 10 pts if >= 10 tenders returned, 5 pts if aggregates computed. Status: failed < 30 / partial 30-59 / final >= 60.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Keywords to search for tenders (e.g. "cybersecurity audit", "construction", "consulting AI")
`countries`	No	Countries to search. Defaults to ["EU","US","FR","UK","DE"]. Use "EU" for all 27 EU member states via TED Europa.
`cpv_codes`	No	EU Common Procurement Vocabulary codes (e.g. ['72000000'] for IT services, ['45000000'] for construction). Optional.
`min_value_eur`	No	Minimum contract value in EUR. Tenders below this are excluded. Optional.
`published_after`	No	ISO date YYYY-MM-DD. Only return tenders published after this date. Defaults to 30 days ago.

Output Schema

ParametersJSON Schema

Name	Required	Description
`query`	Yes
`status`	Yes
`sources`	Yes
`tenders`	Yes
`by_source`	Yes
`by_country`	Yes
`quality_score`	Yes
`countries_searched`	Yes

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and destructiveHint. Description adds SLA (<=25s p95), quality scoring system, partial results status criteria, and optional env var for US tenders, revealing key behavioral traits beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is lengthy (multiple paragraphs) but well-structured with clear sections: sources, use cases, inputs, SLA, quality scoring. Every sentence adds value, but could be more concise while retaining information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex multi-source tool with 6 parameters and async option, the description covers usage, behavior, quality scoring, and return values comprehensively. Output schema existence further supports completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but description adds context like ISO-2 for countries, CPV code examples, default for published_after, and explains the async parameter. This adds meaningful interpretation beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it aggregates public procurement tenders from multiple government sources, lists specific sources, and details return fields (buyer authority, CPV code, value in EUR, deadlines, URLs). This distinguishes it from sibling tools like procurement_spend_optim.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: B2G agent needs contract opportunities, pipeline building, CPV monitoring, market sizing. Provides key inputs and examples. Does not explicitly mention when not to use or alternatives, but context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

growth_path_architectC

Read-only

Inspect

Architecte de croissance — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: Pennylane (€30M ARR) — 3 voies de croissance · Mix recommandé : Organique + Geo EU · ARR cible €120M en 36 mois. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`constraints`	Yes
`growthTarget`	Yes
`currentDrivers`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, which the description complements by stating 'Returns a structured, audited deliverable' and that inputs are validated server-side. No contradictions, but no additional behavioral details beyond what annotations imply.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief (two lines plus a reference case) but front-loaded. However, it could be more structured and include key details without being verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 params, nested objects, no output schema) and numerous siblings, the description is insufficient. It lacks explanation of the deliverable's contents, how to interpret results, and how it relates to similar tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, and the description adds no parameter-specific details. It merely says 'send the documented case fields', failing to explain semantics or constraints not already in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a structured, audited deliverable for growth architecture, and provides a concrete reference case (Pennylane). However, it does not clearly distinguish itself from similar strategic siblings like 'market_entry_strategist' or 'strategic_options_analyzer', limiting clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The reference case hints at a scenario but does not specify conditions or exclude cases where other tools are better.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

hallucination_confidence_meterA

Read-onlyIdempotent

Inspect

Evaluates the likelihood of hallucination in LLM responses by comparing against HuggingFace model confidence scores. Designed for risk assessment personas to quantify response reliability. Accepts text snippets or model outputs, returns confidence metrics and potential hallucination warnings. Cross-references with top-performing models from the HuggingFace leaderboard.

ParametersJSON Schema

Name	Required	Description
`text`	Yes	The LLM-generated text to evaluate for hallucination risk
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`model_id`	No	Optional specific HuggingFace model ID to use for evaluation
`threshold`	No	Confidence threshold below which hallucination warnings are triggered

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`confidence_scores`	No
`hallucination_likelihood`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint, openWorldHint, and idempotentHint. The description adds that it cross-references with HuggingFace leaderboard models and returns confidence metrics and warnings, providing behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at four sentences, front-loaded with the main purpose, and free of fluff. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present and annotations like idempotentHint and readOnlyHint, the description provides sufficient context about using HuggingFace models and generating warnings. No major gaps are evident.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description mentions 'text snippets or model outputs' but this is already clear from the schema. It does not add significant new meaning to parameters beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool evaluates hallucination likelihood in LLM responses using HuggingFace model confidence scores, with a specific verb and resource. It distinguishes well from the diverse sibling tools by focusing on AI response reliability.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions it is designed for risk assessment personas to quantify response reliability, but does not explicitly state when to use or not use this tool versus alternatives. Usage context is implied rather than explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

historical_price_seriesA

Read-onlyIdempotent

Inspect

Fetch historical OHLCV price series for any ticker: stocks (AAPL, SAP.DE, 7203.T), ETFs, indices, commodities (GC=F for gold) or cryptocurrencies (BTC-USD). Returns a full date-indexed series of open/high/low/close/volume plus pre-computed statistics: total return, annualised return (CAGR), annualised volatility, max drawdown and Sharpe estimate (rf=4%). Automatically detects crypto tickers (→ CoinGecko) vs traditional assets (→ Yahoo Finance primary, Stooq fallback). Adjusts for dividends and splits when adjusted=true (default). Use cases: backtesting, factor analysis, performance attribution, charting, financial modelling. Sources: Yahoo Finance, CoinGecko, Stooq. All keyless. Optional env: AICI_RESEARCH_PROXY_URL for Bright Data routing (lifts Yahoo 429), TWELVE_DATA_API_KEY for higher Twelve Data quota.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`period`	No	Look-back period. Default: 1y.
`ticker`	Yes	Yahoo Finance ticker symbol. Examples: AAPL (US stock), SAP.DE (Frankfurt), 7203.T (Tokyo), BTC-USD (Bitcoin), GC=F (gold futures), ^GSPC (S&P 500).
`metrics`	No	Subset of fields to include (informational — all fields always returned).
`adjusted`	No	Adjust close prices for dividends and splits. Default: true.
`interval`	No	Bar interval. Default: 1d (daily).

Output Schema

ParametersJSON Schema

Name	Required	Description
`stats`	Yes
`period`	Yes
`series`	Yes
`status`	Yes
`ticker`	Yes
`sources`	Yes
`currency`	Yes
`interval`	Yes
`data_points`	Yes
`quality_score`	Yes
`splits_detected`	No
`resolved_exchange`	No
`dividends_detected`	No

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, idempotentHint, etc.), the description adds crucial behavioral details: automatic ticker detection, dividend/split adjustment, data source fallback (Yahoo Finance primary, Stooq fallback), and optional proxy configuration. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a dense single paragraph that front-loads the main purpose. Every sentence contributes information, though it could be slightly broken into sections for readability. No waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, supported assets, data sources, parameters, use cases, and optional configuration. Given the presence of an output schema and 100% parameter schema coverage, it is fully complete for effective tool invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema has 100% coverage, baseline 3. The description adds value by explaining the 'adjusted' parameter's effect, clarifying that 'metrics' is always returned in full, and providing ticker examples that enhance understanding beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it fetches historical OHLCV price series for any ticker, with examples across stocks, ETFs, indices, commodities, and cryptocurrencies. It distinguishes itself from siblings like 'fx_rate' and country-specific market data tools by being general-purpose and returning a full historical series.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists explicit use cases such as backtesting, factor analysis, and charting. It also mentions automatic detection of crypto vs traditional assets. However, it does not explicitly state when not to use or provide direct alternatives from the sibling list, which would strengthen the guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

hr_benefits_esg_alignerA

Read-onlyIdempotent

Inspect

Asynchronous tool for Chief Human Resources Officers (CHROs) to align employee benefits packages with ESG (Environmental, Social, Governance) goals. Uses Eurostat HR data, MSCI ESG ratings, and Sustainalytics metrics to generate actionable recommendations. Inputs include company location, industry, and current benefits structure. Outputs ESG-aligned benefits adjustments with sustainability impact scores. Requires async:true to avoid timeout errors.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`esgFocus`	No	Primary ESG pillars to prioritize
`industryCode`	Yes	NACE or ISIC industry classification code
`companyLocation`	Yes	ISO 2-letter country code of company headquarters
`currentBenefits`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`recommendations`	No
`overallESGAlignmentScore`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint and idempotentHint, indicating read-only, idempotent behavior. The description adds that the tool is asynchronous and requires async:true to avoid timeouts. This supplements the annotations well, with no contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (three sentences) and front-loaded with purpose and audience. Every sentence adds value: audience, purpose, data sources, inputs, outputs, and async requirement. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 5 parameters, nested objects, and async behavior, the description covers key aspects: audience, purpose, data sources, inputs, outputs, and async requirement. It omits details on polling (job_result) but the output schema likely covers that. With annotations and output schema, it is fairly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 80% per context, so baseline is 3. The description adds context about data sources (Eurostat, MSCI, Sustainalytics) but does not elaborate on each parameter beyond what the schema provides. The mention of 'inputs include company location, industry, and current benefits structure' helps slightly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: aligning employee benefits with ESG goals for CHROs. It specifies inputs (location, industry, benefits), data sources (Eurostat, MSCI, Sustainalytics), and outputs (adjustments, impact scores). This distinguishes it from siblings like esg_audit_multi or supplier_esg_audit, which focus on broader ESG audits.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the async requirement to avoid timeouts, but provides no explicit guidance on when to use this tool vs other ESG tools (e.g., when benefits-specific alignment is needed). No exclusions or alternatives are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

incident_response_evidence_collectorA

Read-onlyIdempotent

Inspect

As a CTO, gather forensic evidence (logs, network flows, MITRE TTPs) from public breach reports and threat intelligence sources to support incident response post-mortems. Inputs include incident identifiers, date ranges, or MITRE technique IDs. Outputs structured evidence with attack patterns, indicators of compromise, and source references. — pass async:true REQUIRED to avoid x402 timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`date_range`	No
`incident_id`	Yes	Unique identifier for the incident (e.g., CVE, GitHub Advisory ID)
`mitre_technique_ids`	No	List of MITRE ATT&CK technique IDs (e.g., T1059)
`include_network_flows`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`timeline`	No
`warnings`	No
`indicators`	No
`incident_id`	No
`network_flows`	No
`attack_patterns`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, idempotentHint, openWorldHint. Description adds crucial behavioral detail: the async requirement and the risk of x402 timeout without it. Also describes output structure. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences: purpose, inputs, outputs, and a critical usage note. Front-loaded with key info. Could be slightly more structured but overall efficient and concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, nested date_range, output schema exists), the description provides sufficient high-level guidance. Covers inputs, outputs, and a special requirement (async). Agent can reasonably infer how to invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Description mentions three of the five parameters (incident_id, date_range, mitre_technique_ids) and references network flows and MITRE TTPs which map to include_network_flows and mitre_technique_ids. Schema coverage is 60%, and description adds context by linking parameters to evidence types. Async parameter is explained in usage guidelines.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool gathers forensic evidence from public breach reports and threat intel sources for incident response post-mortems. It specifies inputs (incident IDs, date ranges, MITRE technique IDs) and outputs (structured evidence with attack patterns, IOCs, source references). Differentiates from siblings like ai_act_incident_response by focusing on evidence collection.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs to pass async:true to avoid x402 timeout. Provides clear usage context but does not explicitly mention when not to use or compare to alternatives, though the description is specific enough for an agent to infer appropriate use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

india_market_dataA

Read-only

Inspect

Indian capital market intelligence for the IN diaspora (30M+) and investors. Covers NSE, BSE, and MCA corporate registry across four modes:

• company — full company profile: name, CIN, exchange, NSE/BSE tickers, industry, incorporation date, paid-up capital, registered office, status, directors • market_quote — real-time quote: price (INR), change%, volume, market cap, P/E ratio. Sources: Yahoo Finance (primary), BSE API, NSE API (proxy-gated) • sector_overview — Nifty/Sensex sector snapshot: top 5 companies by market cap. Supported sectors: it, banking, pharma, energy, auto, fmcg, realestate, metals, telecom, consumer • mca_filing — Ministry of Corporate Affairs filings. Requires CIN for direct lookup.

Input formats accepted: • NSE ticker (e.g. 'RELIANCE', 'TCS.NS') • BSE 6-digit code (e.g. '500325' for Reliance) • CIN 21-char (e.g. 'L17110MH1973PLC019786') • Company name EN (e.g. 'Reliance Industries', 'Tata Consultancy') • Sector keyword (e.g. 'IT services', 'banking', 'pharma')

ENV: AICI_RESEARCH_PROXY_URL with country-in routing unlocks NSE direct API and MCA.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analysis mode.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	NSE/BSE ticker, CIN (21 chars), company name (EN), or sector keyword.
`exchange`	No	Exchange filter. Default: all.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`status`	Yes
`company`	No
`sources`	Yes
`mca_filings`	No
`market_quote`	No
`quality_score`	Yes
`sector_overview`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark it as readOnlyHint=true, and the description adds valuable behavioral context: data sources (Yahoo Finance primary, BSE API, NSE proxy-gated), the need for CIN in mca_filing, and environment variable requirements. This goes beyond annotations without contradicting them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with bullet points and a clear flow, front-loading the main purpose. However, it is somewhat lengthy due to detailed mode explanations; minor redundancy could be trimmed without losing meaning.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with four parameters and moderate complexity, the description covers modes, input formats, and environment needs thoroughly. The presence of an output schema reduces the need to detail return values. Minor gaps: no mention of error handling or rate limits.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, the baseline is 3. The description adds significant value beyond the schema by explaining each mode's purpose, providing example inputs (e.g., 'RELIANCE', '500325'), and detailing the async parameter's use case. This enhances the agent's understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides 'Indian capital market intelligence' and lists four specific modes (company, market_quote, sector_overview, mca_filing) with detailed outputs for each. This differentiates it from siblings like 'china_market_data' and 'corporate_registry_lookup' by focusing exclusively on Indian markets.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context for Indian diaspora and investors, but lacks explicit guidance on when to use this tool over alternatives. It does not list exclusions or suggest other tools for non-Indian data, leaving the agent to infer usage from the detailed mode descriptions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

industry_classifier_naics_sicB

Read-only

Inspect

Classificateur d'industrie NAICS/SIC/NACE — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: What is the NAICS code for a company that does ? · Give me NAICS + SIC + NACE classification for this company description. · Which industry sector (GICS) does this company belong to for equity analysis? · What HS code applies to products manufactured by this company? · For EU procurement compliance, what NACE Rev. 2 code applies to this company? · Classify this business into NAICS + SIC + ISIC + GICS + NACE + HS with hierarchy and confidence. · I need to segment my ICP list by NAICS 4-digit subsector — classify these company descriptions. Reference case: Helios Cold Chain EU — Freight forwarding maritime réfrigéré · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company_url`	No
`company_name`	No
`company_description`	Yes
`focus_classifications`	No
`primary_revenue_source`	No

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds that inputs are validated server-side and it returns an audited deliverable. It does not contradict annotations but adds only modest behavioral context beyond them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description includes a long list of example questions and a reference case, which could be trimmed for brevity. Core purpose is stated early, but the verbosity reduces conciseness without adding critical value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 6 parameters, low schema coverage, and no output schema, the description should cover parameter usage and output structure but does not. It mentions a 'structured, audited deliverable' without specifics, leaving gaps for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (17%), with only the 'async' parameter described. The description does not explain the meaning or usage of key parameters like company_description, focus_classifications, or primary_revenue_source, failing to compensate for the schema gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is a classifier for industry codes (NAICS/SIC/NACE) and returns a structured, audited deliverable. It provides example questions that cover various classification needs, making the purpose specific and distinct from siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description offers example questions that implicitly guide usage, and includes a reference case. However, it does not explicitly state when to use this tool versus alternatives or provide exclusions, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

infra_blueprint_designerB

Read-only

Inspect

Architecte infra cloud — Gapup agent-payable C-suite expertise (CTO). Returns a structured, audited deliverable. Answers: Design a cloud infrastructure blueprint for a app with expected traffic and requirements. · What is the recommended AWS vs GCP vs Azure architecture for a SaaS multi-tenant app with EU data residency and SOC2? · How should I architect my cloud infra to stay under €5k/month with GDPR compliance and a junior DevOps team? · What cloud services do I need for a with load — compute, DB, cache, CDN, observability? · Give me an end-to-end cloud architecture with scaling plan, security baseline, and IaC tool recommendation. Reference case: Spinora fintech B2B SaaS — saas-multi-tenant · medium load (1k-100k req/d) · eu-west · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`team_size`	No
`expected_load`	Yes
`workload_type`	Yes
`business_context`	No
`cloud_preference`	No
`region_preference`	Yes
`budget_monthly_eur`	No
`compliance_required`	No

Tool Definition Quality

B3.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true and openWorldHint=true. The description adds that it returns an 'audited deliverable' but does not elaborate on behavioral traits like side effects or rate limits. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is lengthy and includes both French and English text, a reference case, and multiple example queries. While informative, it could be more concise and front-loaded. The structure is reasonable but contains redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 9 parameters and no output schema, the description should explain the return format and cover all inputs. It only vaguely states 'structured, audited deliverable' and lacks details on output schema or comprehensive parameter guidance. This is insufficient for a tool of this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 11%, meaning most parameters lack descriptions in the schema. The tool description indirectly covers some parameters (e.g., workload_type, expected_load) through examples but does not systematically explain all 9 parameters, especially team_size, business_context, and budget_monthly_eur. This is insufficient for accurate invocation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose ('design a cloud infrastructure blueprint'), provides specific verb-object pairing, and includes multiple example queries that illustrate the scope. It effectively distinguishes from the large set of sibling tools by focusing on architectural design.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use the tool (when needing a cloud architecture blueprint) through its examples but does not explicitly state when not to use it or suggest alternative tools. Given the many siblings, explicit exclusions would improve the score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

insurance_coverage_analyzerC

Read-only

Inspect

Analyseur de couvertures d'assurance — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Gapup Hub — 3 polices · €24k prime · Score 58/100 · 3 gaps critiques · RFP template. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`arrEur`	Yes
`sector`	Yes
`objectives`	Yes
`companyName`	Yes
`riskProfile`	Yes
`jurisdiction`	Yes
`employeeCount`	Yes
`currentPolicies`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint and openWorldHint. Description adds that inputs are validated server-side and returns a structured deliverable, which is marginally helpful. No side effects, auth needs, or rate limits are disclosed, but no contradiction exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences plus a reference line, but includes marketing fluff ('Gapup agent-payable C-suite expertise (RISK)'). The core purpose is front-loaded, but extra words reduce conciseness without adding value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 9 parameters, nested objects, no output schema, and high complexity, the description lacks detail on input structure, validation rules, or return format beyond 'structured deliverable.' More guidance is essential for correct usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 11%, with only 'async' described. The description does not explain any parameter meaning or relationships, saying only to 'send the documented case fields.' This fails to compensate for the low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes insurance coverages and returns a structured deliverable, with a reference case illustrating typical output. The verb 'analyze' is implied. Although it doesn't differentiate from siblings, the purpose is specific and distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description implies use for insurance coverage analysis but lacks when-not-to-use or comparison to siblings. No usage context is provided beyond the tool's purpose.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

interest_rateA

Read-only

Inspect

Return a precise reference interest rate — the exact figure an agent injects into a treasury, lending, valuation or trading model. Available rates: fed_funds, sofr, us_10y, us_2y, us_3m, ecb_main, euribor_3m. Source: FRED (Federal Reserve Bank of St. Louis). When to use: an agent's computation needs a current benchmark rate as a precise input.

ParametersJSON Schema

Name	Required	Description	Default
`rate`	Yes	Reference rate name
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.

Output Schema

ParametersJSON Schema

Name	Required	Description
`rate`	Yes
`unit`	Yes
`as_of`	Yes
`value`	Yes
`source`	Yes
`series_id`	No
`source_url`	No

Tool Definition Quality

A4.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description's job is to add context beyond that. It adds that the tool sources data from FRED and returns a current rate, but does not disclose details like data freshness, caching, or error handling. This is adequate but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no wasted words. It front-loads the core purpose and then lists rates, source, and usage guidance efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (2 parameters, output schema present), the description covers the essential aspects: what it returns, available options, source, and when to use. It does not explain the return format (e.g., decimal vs percent), but that is likely handled by the output schema. Overall, it is sufficiently complete for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description enumerates the available rates in a readable format ('fed_funds, sofr, us_10y, ...'), adding value over the schema's enum list. It also mentions the source, helping agents select the correct rate. The async parameter is not mentioned in the description, but the schema already explains it.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Return a precise reference interest rate' with a specific verb and resource, lists available rates, and explicitly mentions the use case for treasury, lending, valuation, or trading models. This distinguishes it from other financial tools like economic_indicator or fx_rate.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a 'When to use' statement specifying that it should be used when an agent's computation needs a current benchmark rate as a precise input. However, it does not explicitly mention when not to use it or provide direct alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

internal_communicationC

Read-only

Inspect

Communication interne — Gapup agent-payable C-suite expertise (CHRO). Returns a structured, audited deliverable. Reference case: Cas démo — Communication interne. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`context`	Yes
`audienceSegments`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds that the deliverable is 'audited' but provides limited behavioral context beyond what annotations offer.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, but includes unnecessary reference to a demo case. Could be more streamlined while still being concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the nested parameters and no output schema, the description is vague. It does not clearly explain what constitutes 'documented case fields' or the full scope of inputs and outputs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%. The description only says 'send the documented case fields' without explaining any parameter, failing to compensate for the low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it's about internal communication for CHRO/C-suite and returns a structured deliverable, but the jargon 'Gapup agent-payable' is unclear and it fails to differentiate from many HR-related sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The only instruction is about input validation, which does not help with usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

investor_listC

Read-only

Inspect

Liste d'investisseurs + warm intros — Gapup agent-payable C-suite expertise (FUNDRAISING). Returns a structured, audited deliverable. Reference case: Agicap Série D — 25 VCs matchés · Tier A: Balderton/Accel/Partech · Warm intro path chaque investisseur. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`round`	Yes
`company`	Yes
`existingInvestors`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds that the tool returns a 'structured, audited deliverable' and mentions async behavior via the 'async' parameter, but it does not elaborate on other behavioral aspects like rate limits or authentication requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise, starting with the purpose and including a reference case. However, it is in French, which may reduce clarity for non-French-speaking agents, but the structure is front-loaded and each sentence adds information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, nested objects, no output schema) and low schema coverage, the description is insufficient. It does not specify the output format, the warm intro path mechanism, or how to interpret results, leaving significant gaps for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (25%), but the description does not add meaning for the nested parameters (company, round) beyond what the schema provides. The phrase 'send the documented case fields' is vague and does not clarify parameter usage or constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides an investor list with warm intros for fundraising, and it references a specific case (Agicap Series D) to illustrate the output. This differentiates it from siblings like 'investor_shortlist' by emphasizing warm intro paths and a structured, audited deliverable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs. alternatives is provided. The description only shows a use case example (Agicap) but does not state conditions, prerequisites, or when to avoid it, leaving the agent to infer usage from context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

investor_shortlistC

Read-only

Inspect

Shortlist d'investisseurs ciblés — Gapup agent-payable C-suite expertise (FUNDRAISING). Returns a structured, audited deliverable. Reference case: Aleph AI — Series B €30M · 60 investisseurs EU/US matchés par stage/thèse · fit score + warm intro path + first message angle. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`round`	Yes
`company`	Yes
`preferences`	Yes

Tool Definition Quality

C2.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and openWorldHint=true, which the description reinforces by mentioning a structured, audited deliverable. It discloses server-side validation, adding value beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description includes a reference case which adds context but also length. It is front-loaded with the purpose but could be more concise by removing the example.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 params, nested objects, no output schema), the description is incomplete. It lacks details on return format, generation process, timing, or how the shortlist is structured.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% and the description does not explain any parameters. It merely says to send documented case fields, providing no additional meaning to the complex nested schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a targeted investor shortlist for fundraising, with a reference case. However, it does not explicitly distinguish from sibling tools like 'investor_list' or 'funding_hunter', leaving some ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives. The description lacks context about prerequisites, exclusions, or typical use cases beyond the reference case.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ip_contract_clause_extractorB

Read-onlyIdempotent

Inspect

For CHRO use: analyzes employment contract text to identify and extract IP-related clauses such as invention assignment, confidentiality, non-compete, and patent rights. Returns structured data with clause types, risk levels, and relevant legal context. Ideal for contract review workflows, compliance checks, and IP protection strategy. Sources: USPTO PatFT and EPO Espacenet public datasets. Keywords: employment contract, IP clause, invention assignment, confidentiality agreement, non-compete, patent rights, CHRO tool.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`contractText`	Yes	Full text of the employment contract to analyze
`jurisdiction`	No	Country/state jurisdiction for legal context (e.g., 'US-CA', 'DE')
`includeContext`	No	Whether to include legal context for each clause

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`clauses`	Yes
`sources`	No
`summary`	Yes
`warnings`	Yes

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the read-only nature is clear. The description adds context about sources (USPTO PatFT, EPO Espacenet) but this may confuse agents expecting the tool to consult external databases rather than analyze the provided text. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph with some redundancy (e.g., 'CHRO' twice, keywords list). It is front-loaded with the main action but includes promotional language that could be trimmed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and the existence of an output schema, the description covers the main functionality and return types (clause types, risk levels). However, it omits guidance on how jurisdiction and includeContext affect results, and the async parameter behavior is not addressed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description does not add meaningful detail beyond the schema for parameters like jurisdiction or includeContext, though it reinforces the contractText purpose.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes employment contract text to extract IP-related clauses (invention assignment, confidentiality, etc.) and returns structured data with clause types and risk levels. However, it does not distinguish from sibling tools like 'legal_clause_extractor' or 'contract_risk_scanner', which have overlapping purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description targets CHRO use and mentions ideal workflows (contract review, compliance checks), providing clear context. However, it offers no exclusion criteria or alternative tools, leaving the agent to infer when not to use this tool versus similar ones.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ip_employee_invention_trackerA

Read-onlyIdempotent

Inspect

For CHROs: tracks employee patent filings and flags unassigned inventions. Input employee name or ID to retrieve their patent applications from USPTO and WIPO databases. Returns list of inventions with assignment status, filing dates, and potential ownership gaps. Useful for IP audits, inventor onboarding, and compliance checks. Keywords: patents, IP ownership, employee inventions, USPTO, WIPO.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`endDate`	No	Filter patents filed before this date (YYYY-MM-DD)
`startDate`	No	Filter patents filed after this date (YYYY-MM-DD)
`employeeId`	No	Internal employee ID (optional if name provided)
`companyName`	Yes	Exact legal name of company for assignment check
`employeeName`	Yes	Full name of employee to track (e.g., 'John Doe')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`patents`	Yes
`sources`	No
`warnings`	Yes
`employeeId`	No
`companyName`	Yes
`employeeName`	Yes
`totalPatents`	Yes
`unassignedPatents`	Yes

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, openWorldHint, idempotentHint) indicate safe, non-modifying behavior. Description adds value by explaining data sources (USPTO, WIPO), output fields, and the 'flags unassigned inventions' feature, providing behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences with no waste: target audience, action, data sources, output summary, use cases, keywords. Front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, input, output, and use cases. Output schema documents return values. Missing guidance on handling when both name and ID provided, but overall adequate for a 6-param tool with output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description mentions 'employee name or ID' matching schema but does not add significant extra meaning beyond what is already in parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states verb 'tracks' and resource 'employee patent filings', specifying data sources (USPTO, WIPO) and output (assignment status, filing dates, ownership gaps). Distinct from siblings like patent_landscape and patent_ownership_audit by focusing on individual employees.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Targets 'CHROs' and lists use cases (IP audits, inventor onboarding, compliance checks), providing clear context. However, does not explicitly state when to avoid this tool or mention alternatives like patent_ownership_audit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ip_protection_pilotB

Read-only

Inspect

Pilote de protection IP — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Carbios SA — Deeptech FR recyclage PET enzymatique · 14 brevets EP/US/FR · 5 concurrents · licensing €2-8M potentiel. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`competitors`	Yes
`targetMarkets`	Yes
`patentPortfolioSummary`	Yes

Tool Definition Quality

B3.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true (safe read) and openWorldHint=true. The description adds that inputs are validated server-side and it returns an audited deliverable, which aligns with and extends the annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, starting with purpose. The reference case adds useful context but could be moved to an example. Overall, it is relatively concise and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (6 params, nested objects, no output schema), the description is incomplete. It does not explain the deliverable's structure or contents, nor how it differs from similar tools. The annotations help but leave gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 17% schema description coverage, the description should compensate but does not. It only mentions 'documented case fields' without describing any parameters. No parameter details are provided beyond what is in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is an IP protection pilot that returns a structured deliverable, and provides a reference case. However, it does not differentiate from sibling tools like patent_landscape or ip_contract_clause_extractor, and the deliverable content is vaguely described.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The description lacks context about when this pilot is appropriate or when to choose other IP-related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

jailbreak_attempt_detectorA

Read-onlyIdempotent

Inspect

Detects potential LLM jailbreak attempts by analyzing user input against NIST AI Risk Management Framework adversarial patterns. Designed for persona risk assessment, this tool evaluates text for common jailbreak techniques such as prompt injection, role-playing, or obfuscation. Inputs include the user message and optional context, returning a risk assessment with confidence scores and pattern matches. Ideal for real-time moderation in chat applications or API gateways.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`context`	No	Optional conversation context for better pattern matching
`message`	Yes	User input text to analyze for jailbreak attempts
`threshold`	No	Confidence threshold for flagging attempts

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`riskScore`	No	Confidence score of jailbreak attempt
`patternsMatched`	No	List of detected adversarial patterns
`isJailbreakAttempt`	No	Whether the input exceeds the risk threshold

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Descriptions adds behavioral details beyond annotations: it analyzes against NIST patterns, returns risk assessment with confidence scores and pattern matches. Annotations indicate read-only, open-world, idempotent, which are consistent. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Five sentences, each serving a purpose: core function, framework reference, inputs, outputs, and ideal use case. No redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description sufficiently covers inputs (message, context, threshold) and outputs (risk assessment, scores, patterns). It is complete for a detection tool with moderate complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description clarifies the threshold parameter's role (confidence threshold) and context usage, adding value beyond schema definitions for two of four params. Output description hints at return structure.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool detects LLM jailbreak attempts using NIST adversarial patterns, specifying the verb, resource, and method. It distinguishes itself from sibling tools by focusing specifically on jailbreak detection in real-time moderation contexts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly recommends use in real-time moderation for chat applications or API gateways, providing clear usage context. While it does not list alternatives, the unique purpose makes exclusions unnecessary.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_postings_intelligenceA

Read-only

Inspect

Agrégation d'offres d'emploi publiques pour inférer les tendances de recrutement. Trois modes : (1) company_hiring — analyse des postings d'une société : volume, fonctions (engineering/sales/marketing/ops/finance/hr), seniorité, géographie, croissance vs période précédente, signaux stratégiques inférés ; (2) role_market — volume marché global pour un rôle (open positions estimate, top employeurs, compétences demandées, médiane seniorité) ; (3) competitor_hiring_comparison — comparaison multi-sociétés (total postings, growth%, focus areas). Sources : Adzuna (ADZUNA_APP_ID/KEY env), RemoteOK (keyless), Himalayas (keyless), baseline statique 40 top employeurs. Usages : due diligence VC, intelligence compétitive, benchmarks RH, signaux pivots stratégiques. Cache 6h. SLA ≤15s.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Mode d'analyse : 'company_hiring' \| 'role_market' \| 'competitor_hiring_comparison'
`role`	No	Intitulé de poste à analyser (pour role_market, ex. 'data scientist', 'compliance officer')
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	No	Nom de la société (pour company_hiring ou comme 1er concurrent)
`location`	No	Pays ou ville (ex. 'France', 'United States', 'London')
`competitors`	No	Liste de sociétés à comparer (pour competitor_hiring_comparison, min 2)
`period_days`	No	Fenêtre d'analyse en jours (défaut 30)

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`sources`	Yes
`role_market`	No
`quality_score`	Yes
`company_hiring`	No
`competitor_comparison`	No

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, openWorldHint), the description adds key behavioral traits: 6-hour cache, ≤15s SLA, async mode support, data sources (Adzuna, RemoteOK, Himalayas), and a static baseline of 40 top employers. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description efficiently covers modes, sources, use cases, and constraints in a well-organized paragraph. Minor improvement could be more bullet-point structure, but no redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (three modes, seven parameters, output schema present), the description is exceptionally thorough: it explains sources, async option, caching, SLA, and strategic applications. No critical gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with good descriptions. The description adds value by linking parameters to modes (e.g., company used in company_hiring and as first competitor, competitors for comparison). It clarifies role for role_market and location scope.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it aggregates public job postings for recruitment trend inference, with three distinct modes (company_hiring, role_market, competitor_hiring_comparison) each detailed. This specificity distinguishes it from siblings like competitor_intel or talent_intelligence.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit use cases (due diligence VC, competitive intelligence, HR benchmarks, strategic pivot signals) and outlines mode-specific analysis. However, it does not explicitly state when to avoid this tool or compare directly to alternatives in the sibling list.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_resultA

Read-onlyIdempotent

Inspect

Poll the result of any tool called with async:true. Returns status=pending while running, status=completed with the full result once done, status=failed on error, or status=not_found if the job_id is unknown or expired (TTL 24h).

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	Yes	The job_id returned by an async tool call

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark it as read-only and idempotent. The description adds important behavioral details: TTL of 24h, status progression (pending, completed, failed, not_found). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, directly front-loaded with the purpose. No redundant information, every word earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the existence of an output schema, the description fully explains the possible return statuses and TTL. No additional information is needed for an agent to use this tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter (job_id) with 100% schema coverage. The description does not add semantics beyond what the schema already provides. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it polls the result of any tool called with async:true, listing all possible statuses. This distinguishes it from sibling async result tools that are tool-specific (e.g., competitive_deep_dive_result).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly describes when to use: after an async tool call. Covers all return statuses. Lacks explicit when-not-to-use or alternatives, but the context is clear enough for an agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

knowledge_base_autoB

Read-only

Inspect

Base de connaissance automatique — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Klarna — knowledge base auto · Slack+Notion+Drive · 12 articles seed + structure 8 catégories. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`sources`	Yes
`topPainPoints`	Yes

Tool Definition Quality

B3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, indicating no side effects. The description adds that inputs are validated server-side and it returns a deliverable, which aligns with readOnly. No contradictions. Could still benefit from output format details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences and gets to the point, but mixes French and English, which may confuse some agents. It is not overly verbose, but could be more structured by separating purpose, inputs, and output clearly. Front-loads the purpose adequately.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 parameters, nested objects, no output schema), the description is insufficient. It fails to explain the output format or structure, the meaning of 'focus', or how the deliverable is delivered (e.g., synchronous vs async). The reference case helps but does not compensate for missing behavioral details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (20%, only 'async' has a description). The tool description does not elaborate on parameters like 'focus', 'company', 'sources', or 'topPainPoints'. It merely says to send the documented case fields, adding no semantic value beyond the schema. For a tool with 5 parameters, the description should compensate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool returns a structured, audited deliverable for building an automated knowledge base for C-suite expertise (COO). It provides a reference case (Klarna) but does not explicitly differentiate from sibling tools like content_engine or content_taxonomy. The purpose is specific and actionable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description only mentions that inputs are validated server-side and to send the case fields. It provides no guidance on when to use this tool versus alternatives, no when-not-to-use conditions, and no prerequisites. This lacks explicit usage direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kyc_screenerC

Read-only

Inspect

Screening KYC / AML / Sanctions — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Q4 2026 onboarding — 8 entités (UBO chain LLC + SPV offshore), sanctions/PEP/adverse media. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entities`	Yes
`riskAppetite`	Yes		standard
`screeningScope`	Yes
`onboardingPacket`	Yes

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint: true and openWorldHint: true, which already signal safety and unpredictability. The description adds minimal behavioral context: 'Returns a structured, audited deliverable' aligns with read-only, and 'Inputs are validated server-side' hints at server processing. However, the async parameter behavior is not mentioned in the description, and the reference case is overly specific. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short (four sentences), but the first sentence contains opaque marketing jargon ('Gapup agent-payable C-suite expertise (RISK)') that wastes space. The reference case is specific but may not be universally useful. Overall, it is moderately concise but could be clearer and more to the point.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, nested objects, no output schema, low schema coverage), the description is insufficient. It does not explain return values, the effect of the async parameter, or how results are delivered (e.g., structured audit report format). The reference case offers a concrete scenario but doesn't cover general use. Sibling tools like kyc_screener_batch suggest batch processing, but the description does not clarify the scope (single vs batch).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, so the description must compensate for missing parameter details. The description only says 'send the documented case fields', without explaining which fields are important or how to structure the input (e.g., entities array format, nested objects). The reference case provides a high-level example but no parameter-level guidance. The schema itself has some field descriptions, but the description adds little value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description opens with 'Screening KYC / AML / Sanctions', clearly stating the verb ('screening') and resource (KYC/AML/sanctions). It mentions 'Returns a structured, audited deliverable', reinforcing the output. However, the phrase 'Gapup agent-payable C-suite expertise (RISK)' is jargon that obscures meaning. It does not explicitly differentiate from sibling tools like kyc_screener_batch or sanctions_screener_multi, but the core purpose is discernible.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives (e.g., kyc_screener_batch for batch processing). It states 'Inputs are validated server-side — send the documented case fields', which is a procedural note but lacks context about when this tool is appropriate. No when-not-to-use or alternative references are present.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kyc_screener_batchA

Read-only

Inspect

Async batch variant of kyc_screener. Accepts 1-100 names and returns immediately (<300ms) with a job_id. The screening runs in the background (up to 10 parallel KYC calls). Poll the result with kyc_screener_batch_result(job_id) after the eta_seconds hint. Each entry can specify name, type (person/company/any), and an optional birthdate hint. Use for bulk client onboarding, UBO list screening, or periodic AML refresh batches. Async tool — register a webhook via webhooks_manage(register, url, [job.completed]) to receive callbacks instead of polling. Faster + lighter.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`names`	Yes	List of entities to screen (1-100). Each entry requires at minimum a name.

Output Schema

ParametersJSON Schema

Name	Required	Description
`job_id`	Yes	Unique job identifier — pass to kyc_screener_batch_result
`status`	Yes
`batch_size`	Yes	Number of names queued for screening
`eta_seconds`	Yes	Estimated seconds until result is ready
`submitted_at`	Yes	ISO-8601 submission timestamp

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses async behavior, background processing, parallel KYC calls, and how to get results. No contradiction with annotations (readOnlyHint=true is acceptable for a submission tool).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise yet comprehensive. Front-loaded with key purpose, each sentence adds value. Well-structured with clear ordering.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all essential aspects: purpose, parameters, async behavior, result retrieval options. References sibling tools for webhook registration. Complete for a batch submission tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds value by explaining usage of birthdate for disambiguation and default type 'any', going beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it is an async batch variant of kyc_screener, accepts 1-100 names, returns job_id immediately. Distinguishes from sibling kyc_screener by specifying batch and async nature.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit use cases (bulk onboarding, UBO screening, AML refresh), and alternatives for result retrieval (polling via kyc_screener_batch_result or webhook via webhooks_manage). No missing guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kyc_screener_batch_resultA

Read-onlyIdempotent

Inspect

Poll the result of a kyc_screener_batch job. Returns status=pending while running, status=completed with the full array of KYC results once done, status=failed on error, or status=not_found if the job_id is unknown or expired (TTL 24h). Call this after the eta_seconds hint returned by kyc_screener_batch.

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	Yes	The job_id returned by kyc_screener_batch (prefix: kycb_)

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate safe, idempotent read. Description adds value by detailing statuses (pending, completed, failed, not_found), TTL of 24h, and job_id prefix. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, no redundant information. Every sentence serves a clear function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple polling tool with one parameter and an output schema, the description covers all behavior, links to parent, and mentions TTL. No gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with a description for job_id. Description adds the prefix hint 'kycb_', providing extra context beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it polls the result of a KYC screener batch job, listing all possible statuses. It distinguishes from sibling tools by explicitly referencing kyc_screener_batch and noting to call after the eta_seconds hint.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit when-to-use: 'Poll the result of a kyc_screener_batch job' and 'Call this after the eta_seconds hint returned by kyc_screener_batch.' Does not mention alternatives like job_result for other async jobs, but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

labor_law_alert_geoA

Read-onlyIdempotent

Inspect

Provides CHROs with daily alerts on new labor law changes by jurisdiction (state/country). Inputs include jurisdiction (ISO country/state code) and optional date range. Outputs structured legislative updates with summaries, effective dates, and source links. Useful for compliance monitoring, risk assessment, and policy adjustments. Keywords: labor law, compliance, legislation, jurisdiction, CHRO, HR policy.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`since`	No	Optional start date for changes (YYYY-MM-DD). Defaults to 7 days ago.
`until`	No	Optional end date for changes (YYYY-MM-DD). Defaults to today.
`jurisdiction`	Yes	ISO 3166-1 alpha-2 country code or ISO 3166-2 state/province code (e.g., 'US-CA', 'FR')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`changes`	Yes
`sources`	Yes
`warnings`	Yes
`last_updated`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, idempotentHint. The description adds behavioral context: outputs structured legislative updates with summaries, effective dates, and source links, and mentions daily alerts. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences plus keyword line. Front-loaded with main purpose. No wasted words. Efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Output schema exists, so return values are documented. Description covers input, output structure, and typical use case. Async parameter is not mentioned but schema handles it. Overall complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are already well-documented. The description mentions jurisdiction and date range but adds no new meaning beyond the schema descriptions. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides daily alerts on new labor law changes by jurisdiction, with specific verb 'provides' and resource 'daily alerts on new labor law changes'. It distinguishes from siblings as no other sibling tool explicitly targets labor law alerts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description suggests use cases: 'compliance monitoring, risk assessment, and policy adjustments'. It does not explicitly mention alternatives or when not to use, but the context is clear and adequate for an agent to decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ld_architectC

Read-only

Inspect

Architecte formation & développement — Gapup agent-payable C-suite expertise (CHRO). Returns a structured, audited deliverable. Reference case: Pennylane (180 FTE) — Catalogue 8 formations · 3 parcours individuels · ROI €480k · Payback 7 mois. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`team`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`budget`	Yes
`company`	Yes
`learningNeeds`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context by stating that inputs are validated server-side and that the tool returns a structured, audited deliverable. Annotations indicate readOnlyHint=true, which aligns with the non-mutating nature of generating a report. However, it does not disclose potential side effects like billing or data usage, and the term 'agent-payable' is ambiguous.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise but contains jargon and a mix of French and English, which may reduce clarity. It front-loads the title but the metrics in the reference case could be distracting. Every sentence serves a purpose, but the structure could be more streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of 5 parameters with nested objects and no output schema, the description is insufficient. It lacks details on the deliverable's content, required prerequisites, or how to interpret results. Sibling tools exist but no differentiation is provided, leaving the agent without enough context to correctly invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 20% schema description coverage (only the 'async' parameter has a description), the description fails to compensate. It mentions 'send the documented case fields' but does not explain the meaning or constraints of parameters like company, team, learningNeeds, or budget. The reference case provides some context but no direct parameter elaboration.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is for learning and development architecture targeting C-suite HR executives. It specifies that it returns a structured, audited deliverable and provides a concrete reference case. However, it does not explicitly differentiate itself from sibling architect tools like abm_architect or recruiting_architect.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lacks explicit guidance on when to use this tool versus alternatives. It mentions sending 'the documented case fields' but does not clarify when this tool is appropriate or when other tools might be better. There is no exclusion of cases or mention of alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lead_magnetsC

Read-only

Inspect

Aimants à leads — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Spendesk — Guide trésorerie startup SaaS B2B FR/EU (2024). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`icp`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`brand`	Yes
`leadMagnet`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true and openWorldHint=true. The description adds that the tool returns a structured, audited deliverable and validates inputs server-side. However, it does not disclose other behavioral aspects like authentication needs, rate limits, or what 'audited' entails. The description provides some added value but not substantial.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but mixes languages and includes jargon ('Gapup agent-payable C-suite expertise'). It is not fully streamlined and could be more front-loaded with the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (4 parameters, nested objects, no output schema) and limited annotations, the description is incomplete. It does not specify the output format, how to structure the input fields, or provide any success/failure scenarios. The reference case helps but is insufficient for full completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (only the 'async' parameter is described). The description does not explain the purpose or usage of the nested objects (icp, brand, leadMagnet) or their fields. With low coverage, the description should compensate but fails to do so, relying on the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly indicates the tool creates lead magnets ('Aimants à leads') with C-suite expertise and returns a structured deliverable. A reference case is provided. However, the description lacks a simple verb+resource format and does not explicitly differentiate from sibling marketing tools like 'brand_builder' or 'content_engine'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states that inputs are validated server-side and to send documented case fields, but gives no guidance on when to use this tool vs alternatives, no exclusions, and no context on prerequisites. The reference case is the only usage hint.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

legal_clause_extractorA

Read-onlyIdempotent

Inspect

Structured extraction of clauses, obligations and deadlines from legal documents (SaaS contracts, NDAs, employment agreements, loan agreements, leases, M&A deals, IP licences). Complements contract_risk_scanner with granular per-clause output.

ICP: legal ops, M&A lawyers, paralegals, contract managers, compliance officers.

Capabilities: • Auto-detects document type (7 types) and language (EN/FR/DE/ES/PT) • Extracts parties with roles (buyer, seller, licensor, employee, etc.) • Splits document into sections and classifies 16+ clause types • Per-clause: 20 obligation patterns (EN/FR/DE), 10 deadline patterns, 18 risk detectors • Document-level: red flags (liability cap, auto-renewal, IP overreach, etc.), missing clauses per doc type • Global deadline calendar with P0/P1/P2 severity • Cross-reference map between sections • Cache: 7 days (legal docs stable once provided)

100% pure compute — no external fetch required. Accepts 10k–100k char documents.

ParametersJSON Schema

Name	Required	Description
`lang`	No	Optional. Language hint (e.g. 'en', 'fr', 'de'). Defaults to auto-detection.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`document_text`	Yes	Full text of the legal document (10k–100k chars typical). Plain text or lightly HTML-formatted. EN/FR/DE/ES/PT supported.
`document_type`	No	Optional. Document type hint. Defaults to auto-detection. Use "auto" or omit to let the tool detect from content.
`target_clauses`	No	Optional. Filter extraction to specific clause types. E.g. ["term", "termination", "liability", "ip", "confidentiality", "governing_law", "indemnification"]. If omitted or empty, all clauses are extracted.

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`red_flags`	Yes
`word_count`	Yes
`jurisdiction`	No
`governing_law`	No
`lang_detected`	Yes
`quality_score`	Yes
`effective_date`	No
`cross_references`	Yes
`parties_detected`	Yes
`clauses_extracted`	Yes
`key_deadlines_global`	Yes
`document_type_detected`	Yes
`missing_clauses_expected`	Yes

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description extensively details behavioral traits beyond annotations: auto-detection of document type and language, 16+ clause types, 20 obligation patterns, deadline patterns, risk detectors, red flags, missing clauses, deadline calendar, cross-reference map, cache duration, and character limits. Annotations (readOnlyHint, idempotentHint, destructiveHint) are consistent, and no contradictions exist.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately long but well-structured: it starts with the core purpose, lists ICP, then uses bullet points for capabilities. Each bullet adds distinct value, and the length is justified by the tool's complexity. It is front-loaded with the key action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, output schema present), the description is comprehensive. It covers document type detection, language support, clause classification, risk detection, caching behavior, and complements the sibling tool. No critical gaps are evident.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are already well-documented. The description adds value by explaining typical character range (10k-100k), supported languages, auto-detection behavior, and how to filter by target clauses. This context enhances understanding beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs 'structured extraction of clauses, obligations and deadlines from legal documents' and lists specific document types. It distinguishes itself from the sibling tool 'contract_risk_scanner' by noting it provides granular per-clause output, making its purpose specific and well-differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies its use case for legal document extraction and identifies the ICP (legal ops, M&A lawyers, etc.). It mentions complementing contract_risk_scanner, implying when to use this tool over alternatives. However, it does not explicitly state when not to use it, though the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lgpd_data_subject_rights_automatorB

Read-onlyIdempotent

Inspect

Automates LGPD Data Subject Access Requests (DSARs) for legal teams, handling Brazil-specific data retention, erasure, and access workflows. Accepts user identifiers, request type (access/rectification/deletion), and optional scope filters. Returns structured response with compliance status, warnings, and source references to Brazilian LGPD and CNIL decisions.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`scope`	No	Optional list of data categories to limit the request
`urgency`	No	Priority level for processing
`requestType`	Yes	Type of LGPD request
`userIdentifier`	Yes	CPF, email, or other unique identifier for the data subject

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`dataCategories`	No
`erasureDeadline`	No
`complianceStatus`	No
`retentionPeriodDays`	No

Tool Definition Quality

B3.4/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description contradicts the `readOnlyHint` annotation by stating it handles 'erasure and access workflows', which imply mutation, while the annotation asserts read-only behavior. This is a clear annotation contradiction. The description does not clarify this discrepancy, undermining the agent's ability to understand the tool's side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences that front-load the primary purpose, then detail inputs and outputs. No filler or redundancy. Every sentence adds value, making it efficiently informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool complexity (5 parameters, 2 required, output schema exists), the description covers the main workflow: accepting user identifiers and request types, returning compliance status. The async behavior is also noted. However, the omission of the urgency parameter and the readOnly contradiction slightly reduce completeness, though the output schema mitigates the need for return value explanation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description briefly enumerates the parameters (user identifiers, request type, scope) without adding meaning beyond the schema. No additional semantics about formats, constraints, or relationships are provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool automates LGPD DSARs for legal teams, handling Brazil-specific data retention, erasure, and access workflows. It specifies accepted inputs (user identifiers, request type, optional scope filters) and the structured output. This distinguishes it from sibling tools like 'dpdp_consent_artifact_generator' which focus on consent artifacts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for legal teams handling DSARs under LGPD but does not provide explicit guidance on when to use this tool versus alternatives. No exclusions or when-not-to-use criteria are mentioned, leaving the agent to infer context from the purpose.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lnd_ai_skill_forecastA

Read-onlyIdempotent

Inspect

Forecasts AI skill demand trends for CHROs by analyzing patent filings (USPTO PatFT) and job postings (BLS API). Returns 12-month skill demand projections with confidence scores, helping HR leaders prioritize workforce upskilling. Inputs: target AI skills (e.g., 'machine learning', 'NLP'), geographic focus (US state/country), and forecast horizon. Outputs include skill growth rates, patent filing trends, and job posting volumes. Keywords: AI workforce planning, skill gap analysis, talent strategy, patent trends, labor market data.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	Yes	Geographic focus (US state code or 'US' for national, e.g., 'CA', 'US')
`skills`	Yes	List of AI-related skills to forecast (e.g., ['machine learning', 'computer vision'])
`horizon_months`	No	Forecast horizon in months (3-24, default 12)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`forecast`	No
`metadata`	No
`warnings`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint. Description adds value by specifying data sources (USPTO PatFT, BLS API), output components (growth rates, patent trends, job volumes), and confidence scores. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise 4-sentence description with front-loaded purpose, clear structure (purpose, inputs, outputs, keywords). No unnecessary words; every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 4 parameters and an output schema (not shown), description provides sufficient overview: inputs, outputs, data sources, and use case. Complete for a forecasting tool with good structured metadata.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline 3. Description lists inputs with examples (e.g., 'machine learning', 'CA') adding marginal value beyond the schema. Does not significantly enhance understanding beyond structured fields.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it forecasts AI skill demand trends for CHROs using patent filings and job postings, with specific outputs. Distinguishes from siblings like 'job_postings_intelligence' and 'patent_landscape' by combining both data sources.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Mentions target users (CHROs) and context (workforce upskilling prioritization), but does not explicitly state when not to use or compare to alternatives. Implied usage, but no clear exclusions or sibling differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lnd_skill_taxonomy_builderA

Read-onlyIdempotent

Inspect

Generates a dynamic skill taxonomy for CHROs by cross-referencing patent filings (USPTO), job postings (BLS), and learning & development data (OECD). Inputs include industry codes, job roles, or skill clusters; outputs structured skill hierarchies with demand trends and competency gaps. Essential for workforce transformation, talent pipeline optimization, and future-proofing organizational capabilities. — pass async:true REQUIRED to avoid x402 timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`jobRole`	No	Target job role or occupation (e.g., 'Data Scientist')
`industry`	Yes	NAICS industry code or sector name (e.g., '541511' for IT services)
`timeRange`	No	Time range for trend analysis
`skillCluster`	No	Optional skill cluster to focus taxonomy (e.g., 'AI/ML')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`skillTaxonomy`	No
`industryTrends`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint. The description adds value by noting the async requirement to avoid timeouts, which is a behavioral constraint beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded and informative, but the third sentence ('Essential for...') adds marketing fluff. The async note is separate and clear. Could be slightly more concise, but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists (not shown but indicated), the description sufficiently covers the tool's purpose, inputs, outputs, and a critical usage constraint. It is complete enough for an agent to understand when and how to invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description mentions inputs that map to parameters (industry, jobRole, skillCluster) but does not add new meaning beyond the existing schema descriptions. The async requirement is already documented in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description specifies a concrete verb ('Generates') and resource ('dynamic skill taxonomy'), cites data sources (USPTO, BLS, OECD), and clearly defines inputs and outputs. It distinguishes the tool from generic taxonomy builders, though not explicitly from its sibling 'lnd_ai_skill_forecast'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear use cases ('workforce transformation, talent pipeline optimization') and a critical usage requirement ('pass async:true REQUIRED'). However, it lacks explicit guidance on when to avoid the tool or alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

logistics_esg_incident_trackerA

Read-onlyIdempotent

Inspect

Tracks real-time ESG incidents in logistics networks for COOs, including supply chain disruptions, regulatory violations, and sustainability risks. Inputs: geographic region, incident type (e.g., emissions, labor, deforestation), and time range. Outputs: structured incident data with severity, location, and source verification. Uses CDP open data and UNCTAD STAT for comprehensive coverage. Keywords: ESG, logistics, supply chain, sustainability, compliance, risk management.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	Yes	Geographic region filter (e.g., 'Europe', 'Asia', 'Global')
`endDate`	No	End date for incident search (ISO 8601)
`severity`	No	Minimum severity level to include
`startDate`	No	Start date for incident search (ISO 8601)
`incidentType`	Yes	Type of ESG incident to track

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`summary`	No
`warnings`	No
`incidents`	No

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and openWorldHint, so the description does not need to cover safety. It adds context on data sources (CDP, UNCTAD STAT) and output structure, but does not mention rate limits, error handling, or specific behavioral traits beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences, front-loading purpose then inputs, outputs, and data sources. The keyword list at the end is unnecessary but does not significantly bloat the text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, full parameter descriptions, and annotations covering safety, the description is fairly complete. It adds context on data sources and target users (COOs). Missing are performance characteristics or error cases, but these are not critical for completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with each parameter described in the input schema. The description reiterates some parameters (region, incidentType, time range) but adds no new semantic information beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool tracks real-time ESG incidents in logistics networks, with specific examples and data sources. However, it does not differentiate itself from sibling tools like 'esg_audit_multi' or 'supplier_esg_audit', which may have overlapping purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for COOs concerned with logistics ESG incidents, but it provides no explicit guidance on when to use this tool versus alternatives, nor does it mention when not to use it. The context is clear but lacks exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ma_arbitrage_hunterA

Read-onlyIdempotent

Inspect

As a CFO, identify cross-border M&A arbitrage opportunities by comparing target company valuations across different jurisdictions. Inputs include target company ticker, primary and secondary jurisdictions, and valuation metrics. Outputs include valuation gaps, FX-adjusted multiples, and jurisdiction-specific premiums/discounts. Uses real-time ECB FX rates, Yahoo Finance market data, and SEC EDGAR filings for public companies. Ideal for quick assessment of potential arbitrage in M&A scenarios.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`sector`	No	Industry sector for peer comparison (e.g., 'Technology')
`targetTicker`	Yes	Target company ticker symbol (e.g., 'AAPL')
`valuationMetric`	No	Valuation multiple to use for comparison
`primaryJurisdiction`	Yes	Primary jurisdiction for valuation comparison (e.g., 'US')
`secondaryJurisdiction`	No	Secondary jurisdiction for valuation comparison (e.g., 'DE')

Output Schema

ParametersJSON Schema

Name	Required	Description
`fxRate`	No
`status`	Yes
`sources`	No
`warnings`	No
`valuationGap`	No
`peerMultiples`	No
`targetCompany`	No
`primaryValuation`	No
`secondaryValuation`	No
`jurisdictionPremium`	No

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, openWorldHint, and idempotentHint. The description adds valuable context beyond these, detailing data sources (ECB FX rates, Yahoo Finance, SEC EDGAR) and the async behavior via the 'async' parameter, fully disclosing how the tool operates.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, consisting of three sentences that front-load the purpose, then list inputs/outputs, and finally mention data sources and use case. No redundant words, every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multi-jurisdictional M&A arbitrage, multiple data sources, async option, output schema present), the description covers purpose, inputs, outputs, data sources, and ideal use case comprehensively. The presence of an output schema covers return values, so the description is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All parameters have descriptions in the schema (100% coverage), so baseline is 3. The description adds meaning beyond the schema by explaining outputs and the overall purpose, but does not elaborate on individual parameter usage. Thus, a score of 4 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it identifies cross-border M&A arbitrage opportunities by comparing valuations across jurisdictions, specifying inputs (ticker, jurisdictions, valuation metrics) and outputs (gaps, multiples, premiums/discounts). This is a specific verb-resource pair that distinguishes it from sibling tools like ma_deal_screener or ma_tax_efficiency_mapper.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions 'Ideal for quick assessment of potential arbitrage in M&A scenarios,' providing clear context. However, it does not explicitly state when not to use the tool or suggest alternative tools for different scenarios, which would improve guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ma_deal_screenerC

Read-only

Inspect

M&A Deal Screener — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: Salesforce M&A targets — 12 cibles screened · fit score + valuation + integration risk. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`acquirer`	Yes
`criteria`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint and openWorldHint. The description adds that the deliverable is audited and inputs are validated server-side, but does not disclose other behaviors like response structure or potential errors.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with some jargon ('Gapup agent-payable C-suite expertise'). Could be more concise and front-loaded. The reference case adds context but could be shortened.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the nested input schema and no output schema, the description lacks detail on what the deliverable contains, how results are presented, or any error handling. The reference case helps but is insufficient for a tool with 4 parameters and complex objects.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (25%, only async described). Description mentions 'documented case fields' but does not explain the required nested properties (acquirer, criteria) or their semantics, leaving the agent without sufficient guidance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool screens M&A deals and returns a structured deliverable with fit score, valuation, and integration risk. It provides a reference case. However, it does not explicitly differentiate from siblings like ma_arbitrage_hunter.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives (e.g., ma_arbitrage_hunter). The description only mentions server-side validation but no context on prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

manufacturing_esg_compliance_mapperA

Read-onlyIdempotent

Inspect

As a COO, quickly identify ESG compliance gaps across manufacturing facilities using EPA TRI emissions data and GRI sustainability standards. Input facility identifiers or geographic regions to receive a prioritized remediation roadmap with risk scores, regulatory violations, and suggested corrective actions. Ideal for sustainability reporting, regulatory risk assessment, and operational improvement planning. Keywords: ESG compliance, manufacturing facilities, EPA TRI, GRI standards, sustainability reporting, regulatory risk.

ParametersJSON Schema

Name	Required	Description
`year`	No	Reporting year (default: current year - 1)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	No	Geographic region (state, county, or ZIP code) for facility search
`includeGri`	No	Include GRI standards analysis (default: true)
`facilityIds`	Yes	List of EPA facility identifiers (e.g., TRIFID)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`summary`	No
`warnings`	Yes
`facilities`	Yes

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, openWorld, and idempotent. The description adds value by disclosing the output format ('prioritized remediation roadmap with risk scores, regulatory violations, and suggested corrective actions') and the async behavior (returning a job_id). No contradictions with annotations. Could mention any side effects (likely none) but overall strong.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is compact (2 sentences plus keyword list), front-loaded with the key action and value proposition. Every sentence carries weight: the first sentence defines the tool, the second specifies input and output, and keywords aid discoverability. No redundant or filler content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description explains what the tool does, its inputs (facility IDs or regions), and its output (roadmap with risk scores, violations, corrective actions). Given the presence of an output schema, the description is sufficiently complete for an agent to decide when to invoke it. Minor gaps include lack of detail on regional vs facility-specific behavior, but overall strong.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the schema already documents all 5 parameters. The description restates the concepts ('facility identifiers or geographic regions') but adds no new syntax, format, or constraints beyond what the schema provides. Baseline 3 is appropriate as description does not expand meaning significantly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('identify ESG compliance gaps'), the specific resource ('manufacturing facilities'), and the data sources ('EPA TRI emissions data and GRI sustainability standards'). It differentiates from sibling tools like 'esg_audit_multi' or 'supplier_esg_audit' by focusing on manufacturing facilities and specific standards. The keywords further reinforce the scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use cases ('sustainability reporting, regulatory risk assessment, operational improvement planning') but does not explicitly state when to use this tool over similar siblings such as 'esg_audit_multi' or 'carbon_footprint_calculator'. No exclusions or alternative tool names are provided. The guidance is adequate but lacks comparative context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

manufacturing_waste_heatmapA

Read-onlyIdempotent

Inspect

Generates manufacturing waste heatmaps for COOs using EPA TRI and FAOSTAT data. Input manufacturing site identifiers or geographic regions to analyze waste streams, emissions, and resource inefficiencies. Outputs include waste intensity maps, circular economy opportunity rankings, and cost-saving potential. Ideal for sustainability strategy and operational efficiency improvements. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`year`	Yes	Analysis year (2010-2023)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	No	Geographic region (country code or sub-national region) for aggregated analysis
`site_ids`	No	List of manufacturing site identifiers (EPA TRI IDs or FAO facility codes)
`waste_types`	No	Specific waste types to analyze (e.g., ['metals', 'chemicals', 'energy'])

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`heatmap_data`	No
`opportunities`	No
`benchmark_data`	No

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent behavior. The description adds that it uses specific data sources, can be slow (prompting async usage), and outputs specific artifacts. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at 5 sentences, front-loaded with the primary purpose, and includes a practical tip. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the existence of an output schema and detailed input schema, the description covers the main use case, data sources, and async handling. It lacks mention of prerequisites or error scenarios but is sufficient for a tool with good annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description mentions 'site identifiers or geographic regions' but adds no additional meaning beyond the schema descriptions for parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates manufacturing waste heatmaps using EPA TRI and FAOSTAT data for COOs. It specifies input types (site identifiers or regions) and outputs. However, it does not explicitly differentiate from sibling tools like manufacturing_esg_compliance_mapper.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says it's 'Ideal for sustainability strategy and operational efficiency improvements' and provides an async tip. But it does not specify when to use this tool over alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

margin_doctorC

Read-only

Inspect

Marge par deal — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — 8 deals pipeline · €28k ARR sous-marge détecté · Récupération €4.2k/an · Playbook 4 scénarios. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`deals`	Yes
`company`	Yes
`product`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint=true and openWorldHint=true, indicating a safe, read-only operation. The description adds that inputs are validated server-side and returns a deliverable, which is consistent with readOnlyHint. No further negative behaviors are disclosed, but the annotations already cover the key behavioral aspects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short and to the point, though it includes a reference case that may not be essential. It could be more structured but is not overly verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of nested objects and no output schema, the description is insufficient. It omits details on what the 'structured, audited deliverable' contains, how results are returned, or how to interpret the output. The reference case provides some context but does not fully specify the tool's behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (only async described). The description does not explain any of the main parameters (company, product, deals). It only vaguely mentions 'send the documented case fields', which fails to compensate for the lack of schema descriptions or clarify parameter meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description mentions 'Marge par deal' and 'Returns a structured, audited deliverable', with a reference case indicating margin gap detection and recovery. This gives a fairly clear purpose, but it could be more explicit about the analysis action. The title from annotations 'Marge par deal' helps clarify.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No usage guidelines are provided. The description does not indicate when to use this tool versus alternatives like margin_doctor_finance, nor does it state any prerequisites or contexts where it is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

margin_doctor_financeC

Read-only

Inspect

Médecin des Marges — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Alan — ARR €60M · marge brute 68% → 79% · €3,2M fuites identifiées · Rule of 40 : 14→38. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`costBreakdown`	Yes
`marginTargets`	Yes
`unitEconomics`	Yes
`incomeStatement`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds that inputs are validated server-side and the output is an audited deliverable, which is consistent. It does not contradict annotations but adds only minor behavioral context beyond them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise at about 70 words and front-loaded with a title. However, the inclusion of a detailed reference case adds some verbosity that could be streamlined. Overall, it's efficient but not maximally concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple nested objects, no output schema), the description lacks crucial context such as the structure of the deliverable, how inputs map to outputs, or the meaning of the reference case metrics. It leaves gaps for an agent attempting to invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, and the description does not explain any of the required parameters (company, incomeStatement, etc.). It merely says 'send the documented case fields', which provides no additional meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool returns a structured, audited deliverable for CFO-level margin expertise, with a reference case illustrating its purpose. However, it does not distinguish itself from the sibling tool 'margin_doctor', leaving ambiguity about when to choose this version.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage by stating 'send the documented case fields' and provides a reference case, but it offers no explicit guidance on when to use this tool versus alternatives (e.g., margin_doctor, financial_model_3statement). There is no mention of exclusions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

market_entry_strategistB

Read-only

Inspect

Stratégie d'entrée marché — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: OpenAI Inde 2026 — entrée marché 1.4Md utilisateurs · 5 forces Porter + 4 entry modes + 18-month roadmap + risk register. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`preferences`	Yes
`targetMarket`	Yes

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that it returns an 'audited deliverable', which is consistent but does not significantly enhance transparency beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences, reasonably concise. However, the reference case example is somewhat lengthy and may not be necessary for understanding the tool's core function, slightly reducing efficiency.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex input schema (nested objects, 5 parameters) and no output schema, the description is incomplete. It does not explain the deliverable's structure, how parameters relate, or expected output format, leaving significant gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, with only the 'async' parameter having a description. The description does not explain the other four parameters or their roles, despite mentioning 'documented case fields' - which is unhelpful without explicit details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as a market entry strategy tool that returns a structured deliverable. However, the French language may cause ambiguity for English-centric agents, and the description lacks a specific verb like 'analyzes' or 'generates', slightly reducing clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a reference case and mentions server-side validation, implying usage context. However, it does not specify when to use this tool over siblings like 'market_sizing' or 'geographic_expansion', nor does it exclude scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

marketing_roi_dashboardC

Read-only

Inspect

Dashboard ROI marketing — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Gapup Hub — H1 2026 · 5 canaux · ROI 3.2× · Attribution W-shaped · Budget €60k. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`arpuEur`	Yes
`channelData`	Yes
`companyName`	Yes
`periodLabel`	Yes
`totalRevenueAttribEur`	Yes
`targetAttributionModel`	Yes
`currentAttributionModel`	Yes
`totalMarketingBudgetEur`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description aligns with annotations: readOnlyHint (true) and openWorldHint (true) indicate no destructive actions, and the tool returns a deliverable. It adds that inputs are validated server-side, which is useful context. However, it does not disclose any other behavioral traits beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (3 sentences) and front-loaded with the tool's purpose. It includes a concrete example. However, the structure could be improved by separating the core functionality from the reference case.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (9 parameters, 8 required, nested objects) and no output schema, the description is insufficient. It does not describe the output format, the meaning of the deliverable, or how the inputs map to results. The reference case provides some context but is not comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 11% schema description coverage (only async parameter documented), the description does little to clarify the 9 parameters. It provides a reference case hinting at fields like companyName and periodLabel, but does not systematically explain each parameter's meaning or constraints. The schema's enum values for channels and attribution models are not explicated.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it is a dashboard for marketing ROI and returns a structured audited deliverable. It includes a reference case that clarifies its purpose. However, it could be more explicit about the specific computations (e.g., attribution modeling) and does not differentiate from sibling tools like programmatic_attribution_calibrator.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description mentions inputs are validated server-side and to send 'documented case fields,' but it does not specify prerequisites or compare to related sibling tools, leaving the agent without clear selection criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

market_research_briefA

Read-only

Inspect

Generate a structured, sourced market research brief on any market, sector or industry. Returns a machine-readable note with six sections: an executive overview, a market-size estimate (with assumptions and sources — no invented figures), key players, demand & technology trends, risk factors, and a traceable source list. When to use this tool: an agent needs to assess a new market, validate a business opportunity, prepare a pitch, or benchmark a sector before a strategic decision. Data is assembled live from keyless public sources: Wikipedia (sector context), World Bank (macro GDP/population for market sizing), REST Countries (geo context). Fields that cannot be sourced are marked 'unavailable' rather than estimated. Inputs: topic (required), geo and sector (optional refinements).

ParametersJSON Schema

Name	Required	Description
`geo`	No	Optional geography to scope the brief (country name, region, or continent — e.g. 'France', 'Southeast Asia')
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`topic`	Yes	Market or sector to research (e.g. 'electric vehicle batteries', 'B2B SaaS CRM Europe', 'telemedicine Africa')
`sector`	No	Optional parent sector to disambiguate the topic (e.g. 'healthcare', 'energy', 'software')

Output Schema

ParametersJSON Schema

Name	Required	Description
`geo`	Yes
`risks`	Yes
`topic`	Yes
`sector`	Yes
`trends`	Yes
`sources`	Yes	All sources consulted, with URL and retrieval status
`overview`	Yes	Executive summary of the market
`key_players`	Yes
`generated_at`	Yes	ISO-8601 timestamp of generation
`market_size_estimate`	Yes	Market size estimate with hypotheses. All figures sourced or marked unavailable.

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds valuable context: live data sources (Wikipedia, World Bank, REST Countries) and the policy of marking unavailable fields as 'unavailable' rather than estimating. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph of 4-5 sentences, front-loading the purpose and structure. Every sentence adds value, and it is appropriately sized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description does not need to detail return values. It covers six sections, sources, and data honesty policies, providing a complete picture for a research tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents parameters. The description lists inputs but does not add new semantic information beyond the schema, earning a baseline 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates a structured market research brief with specific sections. It is a specific verb+resource combination, but does not explicitly differentiate from sibling tools like market_sizing or competitive_deep_dive.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit 'When to use this tool' section provides clear context for assessment, validation, pitch prep, or benchmarking. It does not include when-not-to-use or alternative tools, but the guidance is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

market_sizingC

Read-only

Inspect

Dimensionnement marché TAM/SAM/SOM — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Gapup Hub — TAM/SAM/SOM IA décisionnelle C-suite Europe · TAM €48Md · SOM €280M Year-3. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`target`	Yes
`horizon`	No
`product`	Yes
`approach`	No
`competitorComps`	No

Tool Definition Quality

C2.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description states it is read-only (via annotations) and that inputs are validated server-side, which is consistent. It adds little beyond the annotation readOnlyHint=true. No contradictions, but the openWorldHint annotation is not elaborated, leaving ambiguity about what inputs are accepted.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (two sentences plus a case reference) and relatively front-loaded with the tool's purpose. However, the first sentence is in French, which may require translation. Some text (e.g., the specific case details) might be more appropriate for documentation than the tool description.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested objects, no output schema, rich annotations), the description is insufficient. It does not explain the return structure, the meaning of the output (e.g., what 'audited deliverable' entails), or how to interpret the async flag. The reference case is helpful but not comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 17%, meaning most parameters lack descriptions. The description does not compensate; it merely says 'send the documented case fields,' providing no meaning for the parameters (product, target, horizon, etc.) or their constraints. The AI agent gets minimal guidance on how to fill these fields.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a structured, audited deliverable for TAM/SAM/SOM market sizing, and provides a reference case. However, the mixed French and English (e.g., 'Dimensionnement marché TAM/SAM/SOM') reduces clarity for an AI agent. The purpose is discernible but not straightforward.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines1/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

There is no guidance on when to use this tool versus alternatives like market_entry_strategist or market_research_brief. The only usage hint is 'send the documented case fields,' which presupposes familiarity with a shared workflow. No exclusions or when-not-to-use are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ma_tax_efficiency_mapperA

Read-onlyIdempotent

Inspect

For CFOs evaluating cross-border M&A deals: analyzes tax efficiency by mapping withholding tax rates, transfer pricing regulations, and permanent establishment risks across specified jurisdictions. Inputs include acquirer/target jurisdictions, deal structure, and transaction value. Outputs jurisdiction-specific tax exposure, efficiency scores, and risk flags. Uses World Bank Tax Rates API, IMF SDR data, and SEC EDGAR filings for corporate tax disclosures. — pass async:true REQUIRED to avoid x402 timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`deal_structure`	No	Type of M&A transaction structure
`transaction_value`	No	Deal value in USD millions
`target_jurisdiction`	Yes	ISO 3166-1 alpha-3 country code of the target entity
`acquirer_jurisdiction`	Yes	ISO 3166-1 alpha-3 country code of the acquiring entity
`include_transfer_pricing`	No	Whether to analyze transfer pricing risks

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`tax_treaties`	No
`efficiency_score`	No
`target_tax_rates`	No
`acquirer_tax_rates`	No
`transfer_pricing_risk`	No
`permanent_establishment_risk`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate idempotent, read-only, open-world behavior. The description adds context about data sources and the async requirement to avoid timeouts, which is behavioral. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise yet covers purpose, inputs, outputs, data sources, and a usage tip. It is front-loaded with the primary purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the 6 parameters, high schema coverage, existing output schema, and annotations, the description provides sufficient context: purpose, inputs, outputs, data sources, and a critical usage note.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description mentions key inputs but does not add significant semantics beyond the schema. The note about async is already a parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it analyzes tax efficiency for cross-border M&A deals, specifying inputs and outputs. It differentiates from siblings like ma_arbitrage_hunter and ma_deal_screener by focusing on tax mapping.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Target audience (CFOs) and context are clear. The note about passing async:true to avoid timeout provides practical guidance. However, it does not explicitly contrast with alternatives or state when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

meddic_scoringC

Read-only

Inspect

Scoring MEDDIC du pipeline — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Pipeline 8 deals · €2.1M · MEDDIC score moyen 62/100 · 3 deals at-risk. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`deals`	Yes
`company`	Yes
`product`	Yes
`salesCycle`	No
`targetWinRate`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds that it returns a structured, audited deliverable and that inputs are validated server-side, but does not disclose additional behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is mostly concise with two sentences and a reference case. It front-loads the purpose, but the reference case could be considered extraneous.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex nested schema and lack of output schema, the description is insufficient. It does not explain the return format, scoring methodology, or how to use the async parameter, leaving significant gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 17% schema description coverage, the description fails to explain most parameters. It mentions 'documented case fields' but does not clarify what they are, leaving the agent without guidance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool scores MEDDIC for a pipeline and returns a structured deliverable. It provides a reference case, but does not differentiate from siblings like deal_coach or sales_pipeline_forecast.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use or when not to use this tool. It only implies usage for scoring MEDDIC, lacking alternatives or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

model_behavior_drift_monitorA

Read-onlyIdempotent

Inspect

Monitors AI model output drift by comparing current model responses against MLCommons safety benchmarks. Designed for risk and compliance personas to detect behavioral deviations that may indicate safety or alignment issues. Accepts model outputs or identifiers and returns structured drift metrics with statistical significance. Sources data from MLCommons public benchmark APIs.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`threshold`	No	Drift threshold for alerting
`currentOutputs`	No	Recent model outputs to analyze for drift
`baselineMetrics`	No
`modelIdentifier`	Yes	Unique identifier for the model being monitored

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`driftMetrics`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, openWorldHint, idempotentHint. The description adds that data sources from MLCommons public benchmark APIs, which is useful context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences, each serving a purpose: what it does, who for, what it accepts, and data source. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given output schema exists, the description covers the core use, data source, and output type. It could mention when to provide baselineMetrics versus using defaults, but overall adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 80% (4 of 5 params described). The description adds that it 'accepts model outputs or identifiers', providing a bit more context, but does not detail parameter semantics beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'monitors' and the specific resource 'AI model output drift' by comparing against MLCommons safety benchmarks, distinguishing it from sibling monitoring tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies the target personas (risk and compliance) and the purpose (detect safety/alignment issues), but does not explicitly state when not to use or compare to alternatives like bias_amplification_tracker or safety_guardrail_breach_analyzer.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

model_safety_certification_checkerA

Read-onlyIdempotent

Inspect

Verifies AI model safety certifications against MLCommons and IEEE 7000 standards. Designed for risk management personas to assess model compliance with established safety benchmarks. Accepts model identifiers or certification IDs and returns structured verification results with source references.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`model_id`	Yes	Unique identifier for the AI model
`standard`	No	Safety standard to check against
`certification_id`	No	Specific certification ID to verify

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`compliance`	No
`last_verified`	No

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds that it returns structured verification results with source references, providing some behavioral context but not deeply. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: first on purpose, second on persona, third on inputs/outputs. No wasted words, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, the description adequately covers the tool's function. It mentions structured results with references but omits mention of the async parameter (a common pattern). Overall complete for its complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and the description mentions accepting model identifiers or certification IDs, aligning with the schema. It does not add significant new meaning beyond the schema, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool verifies AI model safety certifications against specific standards (MLCommons and IEEE 7000). This distinguishes it from sibling tools, none of which directly address safety certification verification. The verb 'verifies' is specific and the scope is well-defined.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description identifies the target persona (risk management) and purpose (assess compliance), providing clear context for when to use. However, it does not explicitly state when not to use or mention alternative tools for similar tasks.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

monte_carlo_portfolioA

Read-only

Inspect

Pure-compute Monte Carlo portfolio simulation using Geometric Brownian Motion (GBM). Models a multi-asset portfolio across time with contributions, withdrawals, and annual rebalancing. Returns full probability distribution of terminal wealth, percentile paths, drawdown stats, and Sharpe ratio. Modes: simulate (full Monte Carlo) | glide_path (lifecycle 110-age target-date allocation) | stress_test (4 historical crises: 2008 GFC / 2000 dotcom / 1970s stagflation / 2020 COVID). No external data needed — all computed from asset assumptions. Ticker defaults built-in: SPY/VOO/VTI 7%/15%, QQQ 9%/20%, TLT/BND 3%/6%, GLD 5%/18%, BTC 30%/70%. ICP: asset managers, family offices, retail wealth advisors, robo-advisor agents, retirement planners. 10k simulations × 30 years runs in <3s on V8 JIT.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	simulate = full Monte Carlo GBM \| glide_path = lifecycle target-date allocation \| stress_test = 4 historical crisis scenarios
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`assets`	Yes	Portfolio assets. Weights must sum to 1.0 (auto-normalized if not).
`simulations`	No	Number of Monte Carlo simulations (1000-100000). Default 10000.
`horizon_years`	Yes	Investment horizon in years (1-50).
`target_value_eur`	No	Target terminal portfolio value in EUR. Used to compute probability_target_achieved.
`confidence_intervals`	No	Percentiles to compute in the output distribution. Default [5, 25, 50, 75, 95].
`initial_investment_eur`	Yes	Initial capital in EUR (e.g. 100000 for €100k).
`withdrawals_annual_eur`	No	Annual withdrawal amount in EUR for decumulation phase (e.g. 50000 for €50k/yr).
`contributions_annual_eur`	No	Annual contribution in EUR (e.g. 12000 for €1000/month).

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds substantial behavioral context beyond annotations: it explains performance (<3s for 10k×30yrs), auto-normalization of weights, built-in ticker defaults, and the stochastic nature. The annotations (readOnlyHint true, openWorldHint true) are partially contradicted by the description's 'no external data needed' claim.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured, starting with the core function, then outputs, modes, features, and audience. Every sentence provides useful information, though it is slightly verbose (e.g., listing all ticker defaults) and could be tightened.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with no output schema, the description covers key outputs (distribution, paths, drawdown stats, Sharpe ratio) and performance. It lacks explanation of the async mode and some parameters, but overall is quite complete given the tool's sophistication.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema documents all 10 parameters well. The description adds value by explaining modes with concrete crises for stress_test, and notes that weights are auto-normalized. However, it does not elaborate on async or target_value_eur beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs Monte Carlo portfolio simulation using GBM, and lists what it models and returns. It distinguishes itself from a large set of unrelated siblings by specifying a unique financial simulation function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use the tool (pure-compute simulation, no external data, for various financial professionals) and details three modes (simulate, glide_path, stress_test). However, it does not explicitly mention when not to use it or suggest alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

mttr_breakdown_analyzerA

Read-onlyIdempotent

Inspect

As a CTO, analyze your team's incident response efficiency by breaking down Mean Time To Recovery (MTTR) into root causes: code defects, infrastructure failures, or process bottlenecks. This tool ingests GitHub issue and pull request data alongside Snyk vulnerability reports to provide a detailed breakdown of MTTR components, helping you identify systemic weaknesses in your incident resolution pipeline. Input your GitHub repository details and time range to receive a structured analysis of MTTR contributors with actionable insights.

ParametersJSON Schema

Name	Required	Description
`repo`	Yes	Full GitHub repository name (owner/repo)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`since`	Yes	Start date for analysis (ISO 8601)
`until`	Yes	End date for analysis (ISO 8601)
`snykToken`	No	Snyk API token for vulnerability data (optional)
`githubToken`	Yes	GitHub personal access token for API access

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`breakdown`	No
`topContributors`	No

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and openWorldHint=true, indicating this is a safe, read-only analysis tool. The description adds that it 'ingests GitHub issue and pull request data alongside Snyk vulnerability reports' and provides 'actionable insights', but does not elaborate on behavioral traits beyond what annotations convey. The description is consistent with annotations, adding minor context about output format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loading the purpose: 'As a CTO, analyze your team's incident response efficiency by breaking down Mean Time To Recovery (MTTR) into root causes'. Each sentence adds value: purpose, data sources and output, inputs. No redundant or verbose phrasing. Ideal conciseness for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple data sources, root cause analysis) and the presence of an output schema (not shown but indicated), the description covers the main aspects: purpose, data sources, and required inputs. It does not explain the output schema but that is handled externally. It is sufficiently complete for an AI agent to decide when to use this tool, though mentioning the output schema's role would further enhance completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with clear parameter descriptions (e.g., 'Full GitHub repository name (owner/repo)'). The description reiterates that inputs include 'GitHub repository details and time range' and mentions 'Snyk vulnerability reports' for the optional snykToken parameter. This adds context but does not provide new meaning beyond the schema. Baseline 3 is appropriate as schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'analyze your team's incident response efficiency by breaking down Mean Time To Recovery (MTTR) into root causes'. It specifies the verb 'analyze', the resource 'MTTR breakdown', and the scope 'root causes: code defects, infrastructure failures, or process bottlenecks'. This distinguishes it from siblings like dora_metrics_deep_dive which focuses on broader DORA metrics, and change_failure_root_cause_classifier which may analyze change failures rather than MTTR.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use: 'As a CTO, analyze your team's incident response efficiency... identify systemic weaknesses in your incident resolution pipeline'. It specifies the inputs: 'GitHub repository details and time range'. However, it does not explicitly mention when not to use or compare to alternatives such as sre_slo_breach_predictor or incident_response_evidence_collector. The context is sufficient but lacks explicit exclusion guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

nis2_supply_chain_dependency_mapA

Read-onlyIdempotent

Inspect

Generates a visual dependency map of supply chain relationships under the NIS2 Directive, scoring criticality based on regulatory sources like EUR-Lex and CNIL decisions. Designed for legal and compliance teams to identify high-risk third-party dependencies. Inputs include organization identifiers and optional scope filters. Outputs structured dependency data with criticality scores and regulatory references.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`depth`	No	Dependency chain depth to analyze
`scope`	No	Analysis scope: full supply chain or critical dependencies only
`sector`	No	NIS2 sector classification (e.g., 'energy', 'transport')
`organizationId`	Yes	Unique identifier for the organization (e.g., VAT number or LEI)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`dependencies`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint. The description adds context about generating visual maps, scoring criticality from regulatory sources, and outputting structured data with references. No contradictions, and it enriches the behavioral profile beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three concise sentences, front-loaded with the primary function, followed by target audience and inputs/outputs. No wasted words, efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description need not detail return values. It covers the tool's purpose, regulatory sources, and expected output format. It does not mention the 'async' parameter, but that is a common cross-tool parameter. Overall complete for its target audience.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with all parameters described. The description groups 'organization identifiers' and 'optional scope filters' but does not add significant meaning beyond the schema. Baseline 3 is appropriate as the schema already does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates a visual dependency map of supply chain relationships under NIS2, with criticality scoring based on regulatory sources. It targets legal and compliance teams, distinguishing it from sibling tools like supplier_esg_audit or supply_chain_fx_exposure_dashboard.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool is for NIS2 supply chain analysis by legal/compliance teams, but it does not explicitly state when to use this tool versus alternatives or provide exclusion criteria. Usage is implied but not fully guided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

observability_log_pattern_minerA

Read-onlyIdempotent

Inspect

As a CTO, extract anomalous log patterns from public breach reports (e.g., Verizon DBIR) and MITRE ATT&CK techniques to optimize SIEM rules and observability pipelines. Inputs include threat actor groups, MITRE tactics (e.g., 'TA0005'), or log sources (e.g., 'AWS CloudTrail'). Outputs structured patterns with MITRE mappings, prevalence scores, and detection recommendations. Ideal for reducing false positives and improving breach detection coverage. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`tactic`	Yes	MITRE ATT&CK tactic ID (e.g., 'TA0005')
`technique`	No	MITRE ATT&CK technique ID (e.g., 'T1059')
`log_source`	No	Log source type (e.g., 'AWS CloudTrail', 'Windows Event Log')
`max_results`	No
`threat_actor`	No	Threat actor group name (e.g., 'APT29')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`metadata`	No
`patterns`	Yes
`warnings`	Yes

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds behavioral context beyond annotations: mentions async to prevent timeout, and describes outputs. Annotations already declare readOnly, idempotent, openWorld, so description complements without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise paragraph with front-loaded purpose, no redundant sentences. Efficiently covers purpose, inputs, outputs, and usage hint.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Output schema is present (implied), and the description mentions structured outputs with MITRE mappings, prevalence scores, detection recommendations. Adequate for a tool with 6 parameters and clear annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 83%, and the description adds useful context about async usage to avoid timeout. While not detailing every parameter, it connects inputs to the overall purpose.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool extracts anomalous log patterns from public breach reports and MITRE ATT&CK techniques, with specific inputs and outputs. This distinguishes it from sibling tools which cover diverse domains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides guidance on when to use (ideal for reducing false positives) and suggests async:true to avoid timeout. However, does not explicitly contrast with alternatives or specify when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

observability_metric_anomaly_detectorA

Read-onlyIdempotent

Inspect

As a CTO, quickly identify anomalous cloud metrics (CPU, latency, memory) by comparing your infrastructure against AWS public benchmarks and CVE-linked hardware risks. Input your observed metrics (e.g., CPU utilization, request latency) and receive a risk assessment with potential root causes. Ideal for performance troubleshooting, security hardening, and capacity planning. Keywords: cloud observability, anomaly detection, CVE hardware risks, AWS benchmark comparison.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	No
`metricType`	Yes
`instanceType`	No
`observedValue`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`cveRisks`	No
`warnings`	No
`anomalyScore`	No
`benchmarkValue`	No
`deviationPercent`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true. The description adds behavioral context by mentioning risk assessment with root causes and comparison against benchmarks/CVE risks. It does not contradict annotations and provides useful insight into what the tool does.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (2 sentences + keywords) and front-loaded with the main purpose. The 'As a CTO' opening is slightly unnecessary but does not harm clarity. Overall efficient with minimal waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, the description need not detail return values. It covers inputs and use cases well. However, it omits guidance on using the async parameter and polling, which is relevant given the tool has an async mode.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (20%), so the description must compensate. It explains metricType and observedValue with examples (CPU utilization, request latency), but does not clarify region, instanceType, or async parameter. This is adequate but leaves room for improvement.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool identifies anomalous cloud metrics (CPU, latency, memory) by comparing to AWS benchmarks and CVE-linked risks. It uses specific verbs and resources, and distinguishes itself from sibling tools like observability_log_pattern_miner by focusing on metrics rather than logs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear use cases: performance troubleshooting, security hardening, and capacity planning. However, it does not explicitly state when not to use it or mention alternatives like log pattern mining for log-related anomalies.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

onboarding_salariesC

Read-only

Inspect

Onboarding opérationnel des salariés — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Pennylane (FR fintech SaaS, ~250 FTE) — 5 parcours 30/60/90 jours · Engineering / Sales / CS / Design / People Ops. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`roles`	Yes
`company`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true and openWorldHint=true. Description adds 'Inputs are validated server-side' which offers minimal extra behavioral context, but doesn't detail return format or processing time.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is short (two sentences plus example) but mixes French and English, and lacks clear structure. Could be more front-loaded with purpose and key constraints.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given complex nested parameters, no output schema, and many sibling tools, the description fails to cover what the deliverable contains, how to interpret results, or any additional required context beyond inputs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 25% and description does not explain any parameters. Reference case is provided but no detail on how 'async', 'focus', 'roles', or 'company' should be populated.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states verb 'Returns a structured, audited deliverable' and mentions onboarding with specific departments and timelines. It distinguishes itself from HR siblings by focusing on onboarding salary plans, but could be more explicit about output type.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No when-to-use or when-not-to-use guidance provided. Does not mention alternative tools or contexts where this tool is appropriate versus others like comp_benchmark_geo_delta.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

operational_dashboardsC

Read-only

Inspect

Dashboards opérationnels — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Qonto (5 départements · 12 KPIs) — 4 dashboards live en 3 semaines · time-to-décision -55%. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`techStack`	Yes
`departments`	Yes
`kpiRequests`	Yes
`primaryDashboardTool`	No

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds that inputs are validated server-side and returns a structured deliverable, which aligns with the readOnlyHint annotation. It does not contradict annotations, but it adds little beyond what annotations already provide (e.g., no details on rate limits, auth needs, or what happens after input).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (3 sentences) but includes an unnecessary marketing reference case. It is not well-structured and mixes French and English. While concise, it could be more focused and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested objects, no output schema), the description is very incomplete. It does not explain the return format, error handling, or how the output relates to inputs. Annotations provide some context but not sufficient for proper invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (only async has a description). The description does not compensate by explaining any of the other parameters (company, departments, kpiRequests, etc.). It only says 'send the documented case fields', which is unhelpful. The description fails to add meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is about 'Dashboards opérationnels' and returns a 'structured, audited deliverable', with a reference case. However, it lacks an active verb (e.g., 'generates', 'creates') to precisely define the action, and does not differentiate from siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It mentions a reference case but does not specify prerequisites, exclusions, or scenarios where another tool would be more appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

oss_dependency_velocity_trackerA

Read-onlyIdempotent

Inspect

As a CTO, track the update velocity of your project's open-source dependencies to assess their impact on DORA metrics like deployment frequency and lead time. This tool fetches release history and version adoption data from npm registry and libraries.io, providing insights into dependency freshness, update frequency, and potential risks. Input a list of package names and optional version ranges to analyze. Outputs structured dependency velocity metrics and warnings about stale or rapidly changing packages.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`packages`	Yes
`lookbackDays`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`metrics`	No
`sources`	No
`warnings`	No

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and openWorldHint. Description adds that it fetches from npm and libraries.io and outputs structured metrics and warnings, which provides moderate additional context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with audience and purpose. Every sentence adds distinct value: audience, data sources, input format, output type. No fluff or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (3 params, output schema exists), the description covers purpose, input, data sources, and output type. It omits lookbackDays and async details but these are in the schema/annotations. The output schema further reduces need to detail return values. Adequately complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 33% (only async described). Description explains packages as a list of names with optional version ranges, adding value. However, lookbackDays is not mentioned, leaving it partially uncovered. Baseline is 3 for low coverage, and description provides some compensation but not full.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool tracks update velocity of open-source dependencies, fetches release history and version adoption data from specific registries, and provides insights on freshness and risks. It differentiates from siblings like dependency_vulnerability_scan or ossf_scorecard_trend_analyzer by focusing on velocity metrics rather than security or scorecards.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context (CTO assessing DORA metrics) and data sources, but does not explicitly exclude alternatives or state when not to use it. It gives clear context for use, but lacks direct comparison to sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ossf_scorecard_trend_analyzerA

Read-onlyIdempotent

Inspect

As a CTO, analyze OSSF Scorecard trends for your top 10-50 dependencies to identify security regressions or deteriorating project health. Input GitHub repository names (owner/repo), get structured trend data including score deltas, check failures, and risk flags. Uses OSSF Scorecard API and GitHub Archive for historical context. Ideal for proactive dependency management and risk assessment.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`lookbackDays`	No	Number of days to analyze trends for
`repositories`	Yes	List of GitHub repositories in owner/repo format
`minScoreThreshold`	No	Minimum acceptable score to flag as risky

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`results`	No
`sources`	No
`warnings`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, and idempotentHint. The description adds value by disclosing data sources (OSSF Scorecard API, GitHub Archive) and output structure (score deltas, check failures, risk flags). No contradictions with annotations. The behavioral traits are well-covered without redundancy.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (4 sentences) with front-loaded purpose. Every sentence adds value: purpose, input/output, data sources, and ideal use case. No wasted words or repetition. Structure is clear and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, async option, output schema), the description covers purpose, inputs, and outputs. It mentions structured trend data and risk flags. The existence of an output schema reduces the need for detailed return value descriptions. However, it could briefly mention the async option to enhance completeness, but it's not a significant gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not add significant new meaning beyond the schema; it mentions 'Input GitHub repository names (owner/repo)' which matches the schema pattern. It does not elaborate on parameter details beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes OSSF Scorecard trends for dependencies to identify security regressions and deteriorating project health. It specifies the verb (analyze), resource (OSSF Scorecard trends), and target audience (CTO), distinguishing it from sibling tools like dependency_vulnerability_scan and oss_dependency_velocity_tracker by focusing on trend analysis over time.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage context: 'proactive dependency management and risk assessment' for top 10-50 dependencies. It identifies the target user (CTO) and use case. However, it does not explicitly state when to avoid using this tool or mention alternatives, though the sibling list implies alternatives exist.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

outbound_sequencerD

Read-only

Inspect

Séquences outbound — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub → CFO + CRO B2B SaaS France — Séquence 6 touches multi-canal · Taux réponse +180%. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`icp`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`offer`	Yes
`excludedAngles`	No
`targetAccounts`	No

Tool Definition Quality

D1.5/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description states 'Returns a structured, audited deliverable', implying generation or mutation, but annotations declare readOnlyHint=true. This is a direct contradiction. Additionally, no other behavioral traits (auth needs, side effects) are disclosed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description includes a verbose reference case ('Gapup Hub → CFO + CRO B2B SaaS France ...') which is not general guidance. French phrases and jargon reduce clarity. It is not concise; the space could be used for clearer purpose and parameter explanations.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 parameters with nested objects, no output schema, low schema coverage), the description is severely incomplete. It does not explain the output format, how to use parameters, or any contextual details beyond a vague reference case.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 20% schema description coverage, the description should compensate but does not. It only says 'send the documented case fields' without adding meaning to parameters like 'icp', 'offer', or 'excludedAngles'. The async parameter is mentioned in schema but not in the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose2/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description mentions 'Séquences outbound' and 'returns a structured, audited deliverable', but it is vague and jargon-heavy (e.g., 'Gapup agent-payable C-suite expertise (CRO)'). It does not clearly state in plain English what the tool does, and it fails to distinguish it from siblings like 'sales_enablement_architect'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The sibling list includes many related tools (e.g., 'battle_plan', 'sales_enablement_architect'), but the description offers no contrast or when-not-to-use advice.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

paid_ads_optimizerC

Read-only

Inspect

Optimiseur de publicités payantes — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Spendesk (Google + LinkedIn · €45k/mo) — €9k/mo gaspillés identifiés · ROAS LinkedIn ×2.4. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`campaigns`	Yes
`targetMetric`	Yes
`audienceDescription`	Yes
`totalMonthlyBudgetEur`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

La description mentionne que les entrées sont validées côté serveur et qu'un livrable est renvoyé, mais elle ne divulgue pas de détails comportementaux supplémentaires au-delà des annotations. Aucune information sur la latence, les limites de débit, ou le format exact du livrable.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

La description est modérément concise, mais elle contient un langage marketing ('Gapup agent-payable C-suite expertise (CMO)') qui n'est pas essentiel pour l'agent. Elle pourrait être plus directe et se concentrer sur l'essentiel.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Compte tenu de la complexité de l'outil (objets imbriqués, 6 paramètres, pas de schéma de sortie), la description est incomplète. Elle ne précise pas le contenu du livrable ni la manière d'interpréter les résultats, ce qui limite l'utilité pour l'agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

La couverture de description du schéma est très faible (17%) et la description textuelle n'ajoute pratiquement aucune signification aux paramètres. Elle ne fait que dire d'envoyer les champs de cas documentés, sans décrire aucun paramètre individuellement.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

La description indique qu'il s'agit d'un optimiseur de publicités payantes qui renvoie un livrable structuré et audité. Cela donne un objectif clair, bien qu'il manque une distinction explicite par rapport aux outils marketing similaires. Le cas de référence fournit un contexte supplémentaire.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Aucune indication sur quand utiliser cet outil par rapport à d'autres outils. Le cas de référence est donné mais ne constitue pas une directive directe pour l'agent. Aucune mention des cas où il ne faut pas l'utiliser.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

partnership_synergiesA

Read-onlyIdempotent

Inspect

Identify and rank strategic partnership opportunities for a company. Returns 5-12 high-fit partnership targets, each scored on revenue lift, time-to-impact, integration complexity and regulatory risk, with a rationale and a recommended first-step outreach playbook. When to use this tool: the user wants business-development or alliance ideas, or M&A target screening before deeper due diligence. Inputs: the user's own company and the strategic axis to unlock through partnership (e.g. enter a new market via distribution, add AI infrastructure without rebuilding). Delivered by Antoine, the AI CSO of the Gapup portfolio.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`constraints`	No
`selfCompany`	Yes
`strategicAxis`	Yes	What strategic axis to unlock through partnership (e.g. 'enter US market via distribution', 'leverage AI infra without rebuild')
`currentPartnerships`	No	Existing alliances to factor in

Output Schema

ParametersJSON Schema

Name	Required	Description
`kpis`	No	3-5 headline KPI bubbles
`sources`	No
`recommendations`	No	Prioritised next steps
`executiveSummary`	Yes	Board-ready partnership opportunity overview
`partnershipTargets`	Yes	5-12 ranked partnership targets

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint, openWorldHint, idempotentHint, and no destructive hint. The description adds behavioral details beyond annotations: it returns a specific number of targets (5-12), scores on multiple dimensions, and includes a rationale and outreach playbook. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured, with a single paragraph that efficiently conveys the tool's purpose, outputs, usage context, and key inputs. Every sentence adds value, and there is no redundant or extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested objects, output schema), the description covers the main return format, scoring criteria, and use cases. It lacks explanation for optional parameters like constraints and focus, but the schema fills those gaps. The description is sufficiently complete for an agent to decide when and how to invoke it.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 50% (3 of 6 parameters have schema descriptions: async, strategicAxis, currentPartnerships). The tool description adds value by explaining the two required inputs (selfCompany and strategicAxis) but does not detail optional parameters like constraints, focus, or async. It partially compensates for the schema gaps but could be more thorough.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool 'identifies and ranks strategic partnership opportunities' and specifies it returns 5-12 targets with scoring on revenue lift, time-to-impact, integration complexity, and regulatory risk. It also distinguishes from siblings like ma_deal_screener by explicitly mentioning BD/alliance ideas and M&A screening before deeper due diligence.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear guidance on when to use the tool: for business-development or alliance ideas, or M&A target screening before deeper due diligence. It also gives example strategic axes. However, it does not explicitly state when not to use it or contrast with specific alternatives, though the context is adequate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

patent_landscapeA

Read-only

Inspect

Search, analyze and map patent landscapes across major jurisdictions (US, EP, WO, CN, JP, KR). Three modes: (1) search — find patents by keywords, company name or inventor name; (2) landscape — aggregate distributions: top assignees, top inventors, CPC class breakdown, filings by year, citation leaders, white-space innovation opportunities; (3) lookup — retrieve a specific patent by number (e.g. US10000000B2, EP3456789A1, WO2023/123456). Primary source: WIPO PatentScope (WO PCT, keyless). Optional sources: USPTO PatentsView (US, env PATENTSVIEW_API_KEY), EPO OPS (EP/WO, env EPO_OPS_CONSUMER_KEY + EPO_OPS_CONSUMER_SECRET), Lens.org (global, env LENS_API_TOKEN). Use cases: freedom-to-operate (FTO) analysis, R&D gap identification, VC due diligence IP audit, competitor patent portfolio mapping, inventor network analysis. SLA: <=24s p95 (parallel fetches, 8s per source). Cache: 24h TTL (patent data stable). Quality score: 30 pts per retrieved source (max 90), +10 if >=10 patents, +10 bonus for landscape mode with non-empty top_assignees.

ParametersJSON Schema

Name	Required	Description
`mode`	No	search: keyword/inventor/assignee search; landscape: aggregate distributions; lookup: fetch by patent number. Default: "search"
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Keywords, company/inventor name, or patent number (e.g. "machine learning", "Tesla Inc", "US10000000B2")
`date_to`	No	ISO date YYYY-MM-DD — latest filing date
`date_from`	No	ISO date YYYY-MM-DD — earliest filing date
`max_results`	No	Max patents to return (5-50). Default: 20
`jurisdictions`	No	Jurisdictions to include. Default: ["US","EP","WO"]

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`status`	Yes
`patents`	Yes
`sources`	Yes
`landscape`	No
`quality_score`	Yes

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations readonlyHint and openWorldHint are present and consistent. The description adds significant behavioral context: SLA (<=24s p95), cache TTL (24h), quality scoring formula, parallel fetches from multiple sources with environment variable requirements. This goes well beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately long but well-structured: it opens with the core purpose, then covers modes, use cases, SLA, sources. Every sentence adds value, though some details (like quality score formula) could be in a separate notes field. It is front-loaded with the most important information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, 3 modes, multiple sources, async option, output schema present), the description covers all critical aspects: modes, use cases, SLA, caching, authentication requirements, and quality scoring. No obvious gaps exist.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed descriptions for all 7 parameters. The description adds meaning by explaining the three modes and their use cases, and elaborates on jurisdiction options and async behavior. It does not repeat schema details but adds context that aids parameter choice.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states a specific verb 'Search, analyze and map' with a clear resource 'patent landscapes' and lists three distinct modes (search, landscape, lookup). It distinguishes from siblings like patent_landscape_async and patent_landscape_result by covering both sync and async capabilities (async parameter).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit use cases (FTO, R&D gap, VC due diligence, competitor mapping, inventor network) and mentions alternative sources and modes. It does not explicitly state when NOT to use the tool or compare with siblings like patent_ownership_audit, but the context is clear enough for an agent to decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

patent_landscape_asyncA

Read-only

Inspect

Async extended variant of patent_landscape. Supports max_results up to 200 (vs 50 in sync mode) and an optional include_citation_graph flag that enriches each patent with its 2-level citation graph (parent patents that cite this one + child patents cited by this one). Returns immediately (<300ms) with a job_id. Poll the result with patent_landscape_result(job_id) after eta_seconds (~180s). Use for deep R&D white-space analysis, freedom-to-operate (FTO) audits, VC due diligence IP mapping, or large-scale competitor portfolio analysis. Async tool — register a webhook via webhooks_manage(register, url, [job.completed]) to receive callbacks instead of polling. Faster + lighter.

ParametersJSON Schema

Name	Required	Description
`mode`	No	search / landscape / lookup. Default: "search"
`query`	Yes	Keywords, company/inventor name, or patent number (e.g. "machine learning", "Tesla Inc")
`date_to`	No	ISO date YYYY-MM-DD — latest filing date
`date_from`	No	ISO date YYYY-MM-DD — earliest filing date
`max_results`	No	Max patents to return (5-200). Default: 20
`jurisdictions`	No	Jurisdictions to include. Default: ["US","EP","WO"]
`include_citation_graph`	No	If true, enriches each patent with a 2-level citation graph (parents + children). Adds significant processing time — use for deep analysis only. Default: false.

Output Schema

ParametersJSON Schema

Name	Required	Description
`job_id`	Yes	Unique job identifier — pass to patent_landscape_result
`status`	Yes
`eta_seconds`	Yes
`submitted_at`	Yes

Tool Definition Quality

A4/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description itself is transparent about async behavior, polling, and webhook. However, the annotations declare readOnlyHint=true, which contradicts the fact that this tool submits a job and is not a read-only operation. According to rules, a score of 1 is required when description contradicts annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph but packs substantial information efficiently. It could be slightly more structured, but it is not overly verbose and every sentence adds value. Conciseness is good.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (async, multiple parameters, output schema exists), the description is remarkably complete: covers behavior, use cases, polling, webhook, and key parameter differences. The output schema handles return values, so no need for further detail there.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds significant meaning: explains max_results difference from sync mode (200 vs 50), the citation graph flag (2-level, parents+children), and recommends use for deep analysis. This goes beyond the schema's descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is an async extended variant of patent_landscape, with specific features (max_results up to 200, optional citation graph) and lists concrete use cases (R&D white-space analysis, FTO audits, etc.). It distinguishes from its sync sibling and the result polling tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use this tool vs alternatives: for deep analysis, large-scale portfolio analysis, and mentions the async pattern with polling or webhook registration. Provides clear alternatives like patent_landscape_result and webhooks_manage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

patent_landscape_resultA

Read-onlyIdempotent

Inspect

Poll the result of a patent_landscape_async job. Returns status=pending while running, status=completed with the full patent landscape report once done, status=failed on error, or status=not_found if the job_id is unknown or expired (TTL 24h). Call this after the eta_seconds hint returned by patent_landscape_async (~180s).

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	Yes	The job_id returned by patent_landscape_async (prefix: patl_)

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint, idempotentHint, destructiveHint, which the description supports. Description adds polling behavior, status transitions, and TTL, which are valuable beyond annotations. Could mention idempotency explicitly, but already implied.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose, no wasted words. Efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a polling tool with output schema present, the description covers all necessary context: statuses, TTL, and timing advice. No gaps given the complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema already fully describes the single parameter job_id (100% coverage), including prefix. The description adds no new information, meeting the baseline for covered schemas.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it polls the result of an async job, enumerates all possible statuses (pending, completed, failed, not_found), and specifies TTL. The name 'result' differentiates it from the async submission tool. Highly precise.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises calling after the eta_seconds hint (~180s) from patent_landscape_async, and mentions TTL 24h for expiration. This provides clear when-to-use and implied when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

patent_ownership_auditA

Read-onlyIdempotent

Inspect

Audits patent ownership for employees or contractors, identifying gaps where inventors may not have properly assigned patent rights to the company. Designed for CHROs to ensure IP compliance and mitigate legal risks. Inputs: employee/contractor names or IDs, optional date range. Outputs: list of patents, ownership status, flagged gaps, and assignment details. Sources: USPTO PatFT and EPO Espacenet public records. Keywords: patent audit, IP compliance, employee inventions, contractor agreements, CHRO.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`dateRange`	No	Optional date range for patent filings
`employeeIds`	No	List of employee or contractor IDs (optional if names provided)
`employeeNames`	Yes	List of employee or contractor full names to audit

Output Schema

ParametersJSON Schema

Name	Required	Description
`gaps`	No
`status`	Yes
`patents`	No
`sources`	No
`warnings`	No

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds value beyond annotations by disclosing the data sources (USPTO PatFT and EPO Espacenet public records) and the output components (list of patents, ownership status, flagged gaps, assignment details). Annotations already indicate read-only, open-world, and idempotent behavior, which the description does not contradict.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph but well-structured with purpose, target user, inputs, outputs, sources, and keywords. It is information-dense without being verbose, though minor trimming could improve conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's low complexity (0 enums, has output schema) and good annotations, the description covers all necessary aspects: purpose, target user, inputs, outputs, and data sources. It is complete for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage, so the description's mention of 'Inputs: employee/contractor names or IDs, optional date range' adds no new semantics beyond the schema. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'audits' and the resource 'patent ownership', with the specific goal of identifying gaps in patent rights assignment. It distinguishes itself from siblings like 'patent_landscape' which focuses on broader landscape analysis.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies the target user (CHROs) and the context (IP compliance, legal risk mitigation), providing clear guidance on when to use. However, it does not explicitly state when not to use or list alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

payment_rails_cost_analyzerA

Read-onlyIdempotent

Inspect

As a CFO, compare cross-border payment rail costs (SWIFT, SEPA, local ACH, stablecoins) with FX conversion fees and settlement times. Input source/destination countries and amount, receive cost breakdown, FX rates, and settlement time estimates. Uses ECB FX rates and World Bank remittance price data for accurate cost analysis. Ideal for optimizing international payment strategies and reducing transaction expenses.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`amount`	Yes	Transaction amount in source currency
`source_country`	Yes	ISO 3166-1 alpha-2 country code of payment origin
`source_currency`	No	ISO 4217 currency code of source amount
`destination_country`	Yes	ISO 3166-1 alpha-2 country code of payment destination
`destination_currency`	No	ISO 4217 currency code of destination amount

Output Schema

ParametersJSON Schema

Name	Required	Description
`amount`	No
`status`	Yes
`fx_rate`	No
`sources`	No
`warnings`	No
`total_cost`	No
`source_country`	No
`settlement_time`	No
`source_currency`	No
`destination_country`	No
`destination_currency`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds value beyond annotations by disclosing data sources (ECB FX rates, World Bank data) and output types (cost breakdown, FX rates, settlement time). Annotations already indicate read-only and idempotent behavior, so description enriches without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is three sentences, front-loaded with purpose, and every sentence adds value. No fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Description covers expected inputs and outputs, mentions data sources, and provides sufficient context for a cost analysis tool. Minor omission of limitations (e.g., currency support) but overall complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description mentions source/destination countries and amount but does not add new semantic detail beyond the schema's existing descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it compares cross-border payment rail costs with FX fees and settlement times, using specific verbs and resources. It distinguishes itself from sibling tools by its unique focus on cost analysis.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description indicates ideal use for CFOs optimizing international payment strategies but does not explicitly state when not to use it or mention alternative tools. The context is clear but lacks exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pentest_scope_estimatorA

Read-only

Inspect

Estimateur de scope pentest — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Answers: For a pentest on with assets, what is the effort and cost estimate? · How much should I budget for a web application + API penetration test for SOC 2 Type II compliance? · What is the standard engagement plan (PTES phases + deliverables) for a pentest? · Which engagement type (black-box/grey-box/white-box/red-team) is recommended for my context? · What are the prerequisites and risks for a pentest engagement on my cloud infrastructure? Reference case: Acme SaaS Inc — Fintech B2B EU · web-app + API REST · 12 microservices Node.js AWS · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`scope_type`	Yes
`tech_stack`	Yes
`asset_count`	No
`target_geos`	No
`engagement_type`	No
`retest_included`	No
`business_context`	Yes
`compliance_frameworks`	No

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, so the tool is known to be read-only and may reference external knowledge. The description adds that it returns a structured, audited deliverable and answers questions, but does not disclose additional behavioral traits beyond these annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately sized at several sentences with multiple example questions. It front-loads the core purpose but includes verbose bullet points that could be trimmed. While informative, it lacks the conciseness of a high-scoring tool description.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (9 parameters, no output schema), the description provides enough context to understand its purpose and use via examples. However, it fails to detail the input format (e.g., how to use the async parameter) or the structure of the returned deliverable, leaving gaps for an agent to fill.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 11%, and the description adds minimal meaning to the parameters. It mentions scope_type and tech_stack implicitly via the reference case, but does not explain required fields like business_context or optional ones like compliance_frameworks. The instruction 'Inputs are validated server-side — send the documented case fields' is vague.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is an 'Estimateur de scope pentest' that returns a structured, audited deliverable. It answers specific questions about effort, cost, budgeting, engagement plans, and risk assessment for penetration testing. This distinguishes it from sibling tools like cyber_risk_auditor or cve_security_lookup, which focus on other aspects of cybersecurity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides example questions ('How much should I budget for...', 'Which engagement type is recommended...') that imply when to use the tool. However, it does not explicitly specify when not to use it or mention alternative tools, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pitch_deck_storylineA

Read-onlyIdempotent

Inspect

Build a complete investor pitch-deck storyline for a company. Returns an 8-20 slide narrative tailored to the target audience (seed-vc / series-a-vc / growth-vc / strategic / bank / grant) — each slide carrying a title, key points, a speaker note and a visual hint — plus a Q&A bank of 10-15 likely board questions and traps to avoid. Output is deck JSON ready to export to Google Slides, Notion or Pitch.com. When to use this tool: the user is preparing a fundraise, a board meeting, or an investor presentation. Inputs: the company profile and the target audience type. Delivered by Sarah, the AI Fundraising lead of the Gapup portfolio.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`audience`	Yes	Target audience — adapts tone + emphasis + Q&A bank
`keyFacts`	Yes	Hard facts to weave into the deck (traction numbers, milestones, awards)
`slideCount`	Yes	12 = standard VC deck, 15 = bank-friendly with annexes, 20 = growth/strategic

Output Schema

ParametersJSON Schema

Name	Required	Description
`kpis`	No	3-5 headline KPI bubbles surfaced from keyFacts
`slides`	Yes	8-20 slide objects ready to export to Google Slides / Notion / Pitch.com
`qaBanks`	Yes	10-15 anticipated investor questions with recommended answers
`recommendations`	No	Fundraising preparation actions
`executiveSummary`	Yes	One-paragraph elevator pitch distilled from the deck

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds useful details about the output structure (slides with key points, speaker notes, visual hints) and Q&A bank, but does not disclose potential limitations or side effects. Since the annotations cover the core behavioral traits, the description provides moderate additional value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at four sentences, with the main purpose front-loaded. It includes minor fluff ('Delivered by Sarah...'), but overall it is well-structured and free of redundancy. It earns a 4 rather than 5 due to the slight fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (five parameters, nested objects, and a stated output schema), the description covers the main use case, output format, and when to use. It does not address error handling or edge cases, but with annotations providing idempotency and read-only hints, it is sufficiently complete for most scenarios.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 80%, and the description adds context for the 'company' parameter as 'the company profile'. However, it does not elaborate on the specific fields within the company object (name, pitch, stage) beyond what the schema provides. The description helps slightly but does not significantly enhance understanding of parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: building a complete investor pitch-deck storyline. It specifies the output as an 8-20 slide narrative with a Q&A bank, and the domain of fundraising or investor presentations. This distinguishes it from the many sibling tools, which cover diverse areas like cybersecurity, HR, and marketing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool: 'the user is preparing a fundraise, a board meeting, or an investor presentation.' It provides clear context but does not explicitly mention when not to use or alternatives, which would elevate the score to 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

positioning_strategistC

Read-only

Inspect

Stratège de positionnement — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Gapup Hub vs Tableau/Pigment/Looker — Angle de différenciation + 5 piliers messaging + battle plan. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`market`	Yes
`company`	Yes
`product`	Yes
`aspirations`	No
`competitors`	Yes
`customerPains`	Yes
`currentWeaknesses`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds the fact that inputs are validated server-side and that output is a structured, audited deliverable. However, it does not mention the async option (present in schema), rate limits, or what happens to data. Given annotations, this is adequate but lacks additional behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short (two sentences) but mixes French and English, and the opening phrase 'Stratège de positionnement — Gapup agent-payable C-suite expertise (CMO)' is not immediately clear. It is somewhat front-loaded with the purpose but could be more concise and universal.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested objects, 8 parameters, no output schema), the description lacks completeness. It does not describe the structure of the output, how to handle the async parameter, or specify input format details beyond a vague reference to 'documented case fields'. This is insufficient for an agent to reliably invoke and interpret the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (13%)—only the async parameter has a description. The description says 'send the documented case fields' but does not explain any parameter in detail. It does not compensate for the lack of schema documentation, and the description adds little to no meaning for the individual parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description identifies the tool as a positioning strategist for C-suite (CMO) and mentions it returns a structured, audited deliverable with a reference case. However, it does not clearly differentiate from numerous sibling tools (e.g., pricing_strategist, market_entry_strategist) that have similar strategic focus. The verb/action is implied but not explicitly stated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, no prerequisites, and no exclusions. The only usage-related note is about server-side validation, which is technical but not contextually helpful for tool selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

press_influencerC

Read-only

Inspect

Presse & influenceurs — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Agicap (levée Série C €70M) — CP + 12 contacts presse Tier-1 · plan de diffusion 14 jours. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`budget`	No
`company`	Yes
`targetMedia`	Yes
`announcement`	Yes
`targetAudience`	Yes

Tool Definition Quality

C2.4/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description implies creation of a deliverable (mutation), contradicting the 'readOnlyHint': true annotation. This is a serious inconsistency, and no additional behavioral details are provided.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (4 sentences) but includes a confusing first sentence ('Gapup agent-payable C-suite expertise (CMO)') that does not earn its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite complex nested schema and no output schema, the description omits details on deliverable contents, async behavior, budget, and other parameters, making it incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 17% schema description coverage, the description should compensate but does not mention any parameters. It merely states 'send the documented case fields', adding no meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly indicates the tool is about press and influencers ('Presse & influenceurs') and returns a structured, audited deliverable, which differentiates it from sibling tools like 'social_influencer_fake_follower_detector'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description provides a reference case (Agicap) but does not compare with siblings or state prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pricing_in_dealC

Read-only

Inspect

Pricing en Deal — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Agicap × Groupe Rocher — Deal €38k · stade négociation · contre-offre -30% · 3 scénarios pricing · ROI 12×. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`deal`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`redLines`	Yes
`negotiationContext`	Yes

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds minimal value beyond annotations (readOnlyHint, openWorldHint). It mentions 'returns a structured, audited deliverable' but does not clarify side effects, authentication needs, or performance implications. Annotations already indicate read-only behavior, so the description does not significantly enhance transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is very concise (two sentences plus a reference case). However, the brevity sacrifices clarity and completeness. It front-loads the tool name but lacks structured information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, nested objects, no output schema), the description is insufficient. It does not describe the output format, the meaning of the deliverable, or any usage constraints, making it hard for an agent to fully understand the tool's capabilities.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only async parameter described). The description does not add meaning for parameters like company, deal, negotiationContext, redLines beyond their names and types. While names are intuitive, the description should provide more context, especially for nested objects.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as 'Pricing en Deal' for generating a structured, audited deliverable with a reference case. However, it does not explicitly state the verb (e.g., 'generates', 'calculates'), and the purpose is somewhat implied rather than directly stated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lacks any guidance on when to use this tool versus alternatives (e.g., pricing_strategist, deal_coach). No context on prerequisites or scenarios where the tool is inappropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pricing_strategistB

Read-only

Inspect

Stratège de pricing — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Vercel Pricing 2026 — 4 tiers + usage metering · 3 scenarios pricing chiffrés · ARPU +28% target. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`competitors`	Yes
`currentPricing`	Yes
`valueProposition`	Yes

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint and openWorldHint, which are consistent with the description's claim of returning a deliverable without mutation. However, the description does not elaborate on async behavior (despite the async parameter in schema) or other behavioral traits beyond what annotations already provide. No contradictions are present.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at three sentences, with the purpose stated upfront. However, the reference case example is specific and may not be universally helpful. The structure is efficient but could be more general.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of six parameters, nested objects, and no output schema, the description lacks essential context. It does not explain the deliverable's format, how to use the async option, or the purpose of the focus parameter. Validation hints are present but insufficient for complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 17% schema description coverage, the description adds no meaningful detail about parameters beyond what the schema provides. It vaguely says 'send the documented case fields' but does not explain the focus parameter, nested object semantics (e.g., company.arrEur, competitors.anchorPriceEur), or the significance of async. Schema descriptions are minimal, so the description fails to compensate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a pricing strategist tool that returns a structured, audited deliverable for C-suite expertise. It includes a specific reference case (Vercel Pricing 2026) and distinguishes itself from sibling tools like competitor_pricing_radar and pricing_in_deal by emphasizing strategic scenario planning.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions that inputs are validated server-side and provides a reference case, but does not explicitly state when to use this tool versus alternatives such as competitor_pricing_radar or pricing_in_deal. No when-not or direct comparisons are given, leaving the agent without clear guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

privacy_compliance_auditC

Read-only

Inspect

Audit conformité vie privée — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Lemlist SAS — SaaS outreach B2B, transferts UE→US Schrems II, RGPD + CCPA + LGPD + UK GDPR. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`presenterScript`	No
`targetFrameworks`	Yes
`processingActivities`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true and openWorldHint=true, which the description does not explicitly reinforce but also does not contradict. The description says it 'returns a structured, audited deliverable,' but does not elaborate on read-only behavior or acceptance of extra fields, adding minimal value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short and front-loaded with the French title, which is concise. However, the mix of French and English and the brief mention of a reference case may reduce clarity for non-French speakers.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, 6 parameters, no output schema), the description is inadequate. It fails to explain return values, behavior, or how to structure the complex input, leaving significant gaps for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is very low (17%), and the description only generically refers to 'the documented case fields' without explaining the purpose or constraints of parameters like company, processingActivities, or targetFrameworks. This does not compensate for the sparse schema documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a privacy compliance audit tool with a French title and mentions a structured deliverable. However, it does not strongly differentiate itself from sibling privacy tools like ai_act_* or lgpd_data_subject_rights_automator, which handle specific regulations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description only notes that inputs are validated server-side and gives a reference case example, but does not specify prerequisites, context, or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

process_mappingC

Read-only

Inspect

Mapping des process opérationnels — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Decathlon France — process Retour produit en magasin · 1700 magasins · 200 retours/j/magasin · -30 à -50% temps cible. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`processes`	Yes
`presenterScript`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true, so the description correctly implies a read-only operation. It adds that inputs are validated server-side, but does not disclose return format or any limitations beyond what annotations already convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes a lengthy reference case example that may not be essential. It could be more structured and focused.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 5 parameters, nested objects, low schema coverage, and no output schema, the description is insufficient. It does not describe the deliverable's structure, response format, or success criteria, leaving the agent underinformed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, yet the description does not explain any parameters beyond 'send the documented case fields'. The schema itself contains descriptions for many properties, but the low coverage metric indicates gaps that the description should address.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it maps operational processes and returns a structured deliverable, with a concrete reference case. However, it does not differentiate from the sibling tool 'process_mining', which likely has similar purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description mentions targeting 'C-suite expertise (COO)' and gives a reference case, but lacks clear context for when-not or comparisons to other tools like process_mining.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

process_miningC

Read-only

Inspect

Mining des process — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Gapup Hub — 4 process · €320k gaspillage identifié · 3 quick wins · 5 automations. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`objectives`	Yes
`companyName`	Yes
`mainSystems`	Yes
`topProcesses`	Yes
`employeeCount`	Yes
`revenueLostEstimateEur`	No

Tool Definition Quality

C2.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, so no destructive behavior. Description adds that inputs are validated server-side and returns a deliverable, but does not mention async capability or output format. Adequate but not rich.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is wordy and includes a case study reference that is not essential. Mixes languages and lacks clear structure. Could be more concise and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 7 parameters and no output schema, the description is incomplete. It omits details on what the deliverable contains, async behavior, and parameter meanings.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 14% (async param described). The description fails to explain any of the 7 input parameters, including required ones. Does not compensate for low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it performs process mining and returns a structured deliverable, citing a reference case. However, it mixes French and English, and does not clearly differentiate from sibling 'process_mapping'. The purpose is somewhat clear but lacks specificity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives like 'process_mapping'. There is no mention of prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

procurement_okr_esg_alignerA

Read-onlyIdempotent

Inspect

Aligns procurement OKRs with ESG targets for COOs using GRI standards and EU TED procurement benchmarks. Inputs include procurement objectives and ESG focus areas (e.g., carbon reduction, supplier diversity). Outputs structured alignment scores, gap analysis, and actionable recommendations. Essential for COOs integrating sustainability into procurement strategy. Keywords: procurement, ESG, GRI, EU TED, OKR alignment, sustainability metrics.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`esgFocusAreas`	Yes
`industrySector`	No
`procurementObjectives`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`alignmentScores`	No
`recommendations`	No
`benchmarkComparison`	No
`overallAlignmentScore`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint. Description adds output format (scores, gap analysis, recommendations) but does not mention async behavior despite the parameter. Still, good coverage beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, then inputs/outputs, then audience/keywords. No redundant information, each sentence serves a clear purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers main purpose and outputs, but misses async processing guidance. With output schema existing, return values are covered. Lacks prerequisites or process explanation. Adequate but not comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage at 25%, only async is described. Description mentions 'procurement objectives' and 'ESG focus areas' but lacks details on the structured procurementObjectives object (id, description, weight) and industrySector. Partially compensates with standards reference but not fully.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it aligns procurement OKRs with ESG targets using GRI and EU TED standards. Specific verb 'aligns', resource 'procurement OKRs with ESG targets', and distinct from siblings (e.g., supplier_esg_audit focuses on suppliers, not OKRs).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies use for COOs integrating sustainability, but no explicit when-to-use vs sibling tools like 'procurement_spend_optim' or 'manufacturing_esg_compliance_mapper'. Lacks exclusions or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

procurement_six_sigma_waste_hunterA

Read-onlyIdempotent

Inspect

Analyzes procurement waste for COOs using Six Sigma DMAIC framework and EU TED tender data. Identifies non-value-added activities, overprocessing, and inefficiencies in procurement workflows. Inputs include procurement category, time period, and organizational unit. Outputs waste classification, cost impact estimates, and process improvement recommendations. — pass async:true REQUIRED to avoid x402 timeout.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`time_period`	Yes	Time period for analysis (e.g., '2023-01-01/2023-12-31')
`six_sigma_tool`	No		DMAIC
`include_ted_data`	No
`organizational_unit`	No	Specific business unit or department (e.g., 'EMEA', 'Global Operations')
`procurement_category`	Yes	Specific procurement category to analyze (e.g., 'IT hardware', 'facilities')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`ted_data_coverage`	No
`cost_impact_estimate`	No
`waste_classification`	No
`process_improvement_recommendations`	No

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and openWorldHint. The description adds critical behavioral context: the async flag is required to avoid x402 timeouts, indicating the tool is slow and supports asynchronous execution. It also specifies the data source (EU TED tender data) and output types, which go beyond the annotations. This adds significant value beyond the structured metadata.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences plus a brief note, all front-loaded. The first sentence introduces the purpose and framework, the second lists inputs and outputs, and the note provides a critical usage constraint. Every sentence adds value without redundancy. Excellent conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, output schema, many siblings), the description covers the essential purpose, data source, and async requirement. However, it lacks detail on the six_sigma_tool enum values (e.g., when to use SIPOC vs ValueStreamMapping) and the role of include_ted_data. The output schema exists but is not referenced. Overall adequate but with gaps in parameter context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 67%, so the description should add meaning for undocumented parameters. However, it only repeats the three listed inputs (procurement category, time period, organizational unit) already documented in the schema. It does not explain the enum parameter six_sigma_tool or the boolean include_ted_data, leaving those parameters underdocumented. The async note is a usage guideline rather than parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: it analyzes procurement waste for COOs using the Six Sigma DMAIC framework and EU TED tender data. It specifies the types of waste identified (non-value-added activities, overprocessing, inefficiencies) and the outputs (waste classification, cost impact estimates, improvement recommendations). This is a specific verb-resource combination that distinguishes it from sibling procurement tools like procurement_spend_optim or procurement_okr_esg_aligner.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for waste analysis but does not explicitly state when to use this tool versus alternatives. There is no mention of when not to use it or comparative guidance with other procurement tools. The async note provides a usage constraint but not contextual alternatives. Moderate guidance: it is implied for waste analysis but lacks explicit exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

procurement_spend_optimC

Read-only

Inspect

Optimisation des achats / Spend strategy — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Tech SaaS €60M ARR — 200 fournisseurs analysés · 20 leviers chiffrés · -€2.4M opex/an target. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`topSuppliers`	Yes
`spendCategories`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint: true and openWorldHint: true. The description adds that inputs are validated server-side and it returns a report, which is consistent but adds minimal additional behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief with two sentences and a reference case. It is front-loaded with the purpose, though the reference case could be considered extraneous noise for a tool description.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 5 parameters, nested objects, no output schema, the description is insufficient. It doesn't explain the deliverable structure, the 'focus' field, or how results are obtained, leaving the AI agent with many unknowns.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, leaving most parameters undocumented. The description does not explain any parameters, nor does it compensate by describing the required fields (company, spendCategories, topSuppliers).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it is for 'Optimisation des achats / Spend strategy' and returns a 'structured, audited deliverable', but the action verb is implicit. It doesn't clearly differentiate from similar procurement tools like procurement_okr_esg_aligner or procurement_six_sigma_waste_hunter.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description only says 'send the documented case fields' without explaining when to use this tool versus alternatives. No exclusions or explicit context are provided, which is a significant gap given the large number of sibling procurement tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

programmatic_attribution_calibratorA

Read-onlyIdempotent

Inspect

For ad_revenue_ops persona: calibrates marketing mix models (MMM) by ingesting OpenRTB impression-level data from FreeWheel Marketplace and other programmatic sources. Accepts model parameters, date ranges, and impression IDs as input, returning structured calibration metrics and attribution adjustments. Useful for improving model accuracy with real-time bidding data and validating revenue attribution across programmatic channels.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`endDate`	Yes	End date for impression data (ISO 8601)
`modelId`	Yes	Identifier of the MMM model to calibrate
`startDate`	Yes	Start date for impression data (ISO 8601)
`impressionIds`	No	List of OpenRTB impression IDs to include in calibration
`confidenceThreshold`	No	Confidence threshold for calibration metrics

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`calibrationMetrics`	No

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, openWorld, idempotent. The description adds context on data sources and outputs but does not reveal additional behavioral traits like rate limits or state changes. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, all necessary, front-loaded with persona. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the existence of an output schema, the description adequately covers purpose, input, and output types. Could be slightly improved by mentioning relationship to related tools, but overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description only highlights model parameters, date ranges, and impression IDs as input, without adding meaning beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it calibrates MMM models using programmatic data and mentions inputs and outputs. However, it does not differentiate from sibling tools like 'retail_media_attribution_bridge' which may have similar functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Specifies the target persona and use case (improving model accuracy, validating attribution), but lacks explicit guidance on when not to use this tool or when to use alternatives among the many sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

programmatic_brand_safety_auditorA

Read-onlyIdempotent

Inspect

Evaluates programmatic ad inventory for brand safety risks using IAB Tech Lab's standards and GDPR-compliant tracking methods. Designed for ad revenue operations teams to assess inventory quality before bidding. Inputs include domain, page URL, and optional contextual signals. Outputs a structured brand safety score with risk categorization and compliance warnings.

ParametersJSON Schema

Name	Required	Description
`url`	Yes	Full page URL being evaluated
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`domain`	Yes	Root domain of the inventory (e.g., 'example.com')
`categories`	No	Optional IAB content categories for contextual analysis
`gdprConsent`	No	GDPR consent string (TCF v2.0)

Output Schema

ParametersJSON Schema

Name	Required	Description
`flags`	No
`score`	No	Brand safety score (0-100)
`status`	Yes
`sources`	No
`warnings`	No
`riskLevel`	No
`gdprCompliant`	No

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, openWorldHint. Description adds GDPR compliance and IAB standards, plus outputs structured scores and warnings, providing additional context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise 3-sentence paragraph, front-loaded with purpose, no wasted words. Each sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and presence of output schema, the description covers purpose, inputs, outputs, and usage context completely. Agent can confidently select and invoke.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions like 'GDPR consent string (TCF v2.0)'. Description adds context by associating categories with IAB standards and gdprConsent with GDPR compliance, enhancing parameter understanding beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool evaluates programmatic ad inventory for brand safety risks using IAB Tech Lab standards and GDPR compliance. It includes target users and timing (before bidding), effectively distinguishing it from sibling tools like programmatic_attribution_calibrator.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Designed for ad revenue ops teams to assess inventory before bidding, providing clear context. Does not explicitly list when not to use or alternative tools, but the context is sufficient for most use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

proposal_generatorC

Read-only

Inspect

Générateur de propositions commerciales — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Spendesk × Gapup Hub — Proposition 7 sections · ROI 3Y €1.8M · Payback 4 mois. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`offer`	Yes
`company`	Yes
`prospect`	Yes
`dealContext`	No

Tool Definition Quality

C2.4/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description states it returns a deliverable, contradicting the readOnlyHint=true annotation which implies no side effects. No additional behavioral traits are disclosed beyond what annotations provide, and the contradiction undermines transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise, front-loaded with the tool's purpose. However, the reference case includes extraneous details that could be omitted for brevity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description fails to describe the return format beyond 'structured, audited deliverable'. It also lacks parameter details, leaving the tool under-specified for reliable invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 20%; the description adds no meaning to parameters beyond the schema. It only vaguely instructs to 'send the documented case fields' without detailing specific parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates commercial proposals and returns a structured deliverable. However, it does not differentiate from sibling tools, all of which have distinct names and functions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a reference case but provides no explicit guidance on when to use this tool versus alternatives, nor any conditions or prerequisites for use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

qa_pre_flightC

Read-only

Inspect

Préparation Q&A investisseurs — Gapup agent-payable C-suite expertise (FUNDRAISING). Returns a structured, audited deliverable. Reference case: Agicap Série C €70M — 30 Q&A stratégiques · 8 questions pièges · Plan de préparation 21 jours. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`round`	Yes
`company`	Yes
`founderContext`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, and the description confirms it returns a deliverable. It adds that inputs are validated server-side. However, no details are given about processing time, error handling, or rate limits beyond the async parameter in the schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise with two main sentences plus a reference case. It front-loads the purpose efficiently, though the case example adds extra detail that could be considered non-essential.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, 4 parameters, no output schema), the description is insufficient. It does not explain the return format, content of the deliverable, or how to use the async parameter. Important contextual details are missing.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (only async has a description). The description does not add meaning to any parameters, leaving the nested objects and their properties largely undocumented. This fails to compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is for investor Q&A preparation with return of a structured deliverable. The mention of 'FUNDRAISING' and a reference case adds clarity, but it does not explicitly differentiate from sibling tools like 'audit_pre_flight' or 'pitch_deck_storyline'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description implies it is for fundraising Q&A prep but does not provide conditions, prerequisites, or mention sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

qbr_autoC

Read-only

Inspect

QBR automatique CSM — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub × Alan — QBR Q1 2026 · Health score 82/100 · Upsell €18k détecté · Renewal low risk. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`wins`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`period`	Yes
`company`	Yes
`metrics`	Yes
`customer`	Yes
`challenges`	Yes
`nextQuarterGoals`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, which are consistent with the description. The description adds that inputs are validated server-side and returns a structured deliverable, but does not detail server-side processing or AI generation aspects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short and front-loaded, with two sentences plus a reference case. While the reference case may be slightly extraneous, it does not detract significantly from conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 parameters, nested objects, no output schema), the description lacks details on return format, error handling, and typical usage. The output is only described as 'structured, audited deliverable', which is insufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 13% schema description coverage, the description adds minimal parameter context. The reference case mentions some fields (e.g., health score, upsell) but does not systematically explain each parameter's meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates an automated QBR report for CSM, with a specific reference case. However, it does not explicitly distinguish from similar auto-report tools like enps_auto.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions server-side validation but provides no guidance on when to use this tool over alternatives, when not to use it, or any prerequisites. The phrase 'send the documented case fields' is vague.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

real_estate_intelA

Read-onlyIdempotent

Inspect

Real estate intelligence aggregator with a best-in-class French dataset (DVF — Demandes de Valeurs Foncières — 100% of FR transactions since 2019, public, keyless) plus UK Land Registry Price Paid (all UK transactions 1995+). Four modes: (1) property — full transaction history for a specific address; (2) comparables — median/std price/m² within a radius (default 500m); (3) market — annual price series, YoY change, volume, trend by commune; (4) valuation — two-method estimate (comparables median + hedonic regression if n≥30) with confidence scoring (high/medium/low). All sources are free and require no API key. ICP: PropTech agents, REITs, fund managers, family offices, insurance. SLA: ≤25s p95 (sources fetched in parallel, 8s budget each). Cache: 24h TTL (DVF data is stable). Quality score: 30 pts DVF retrieved, 20 pts geocoding, 20 pts UK LR retrieved, 15 pts if comparables count ≥10, 15 pts if method quality achieved. Status: failed/<60/≥60 → failed/partial/final. No env vars required.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	property: transactions at an address \| comparables: sample around a point \| market: commune/neighbourhood market stats \| valuation: price estimate for a given surface
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`date_to`	No	ISO date YYYY-MM-DD — latest transaction date
`location`	Yes	Location descriptor. One of: {address, city?, country?} \| {lat, lon, radius_m?} \| {insee_code} for FR communes.
`date_from`	No	ISO date YYYY-MM-DD — earliest transaction date
`max_results`	No	Maximum number of results to return (5–50, default 20)
`surface_max`	No	Maximum surface in m² (±20% tolerance applied for comparables)
`surface_min`	No	Minimum surface in m² (±20% tolerance applied for comparables)
`property_type`	No	Filter by property type (default: all)

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`market`	No	mode=market — commune-level market stats
`status`	Yes
`sources`	Yes
`property`	No	mode=property — transactions at the location
`valuation`	No	mode=valuation — price estimate
`comparables`	No	mode=comparables — aggregated comp stats
`quality_score`	Yes
`location_resolved`	Yes

Tool Definition Quality

A4.2/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds extensive behavioral context beyond annotations, including SLA (≤25s p95), cache TTL (24h), quality scoring methodology, status values (failed/partial/final), and that no API keys are required. This meaningfully extends what readOnlyHint, idempotentHint, and destructiveHint convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is structurally well-organized with clear sections and bullet points. It front-loads the core purpose and data sources. However, it is somewhat lengthy (several sentences) and could be slightly more concise without losing essential details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (9 parameters, 4 modes, nested location object, output schema), the description is thorough. It covers performance, caching, quality scoring, target users, and data sources, leaving no obvious gaps. The presence of an output schema reduces the need to explain return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description repeats some mode definitions but does not add significant new parameter-level information beyond what the schema already provides. The mode descriptions in the description are more narrative but not essential for parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a real estate intelligence aggregator with a specific verb ('aggregator') and resource ('French DVF and UK Land Registry data'). It explicitly lists four modes (property, comparables, market, valuation), making the purpose unmistakable. While it doesn't mention sibling tools, the domain is unique enough that differentiation is inherent.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states the ICP (PropTech agents, REITs, etc.) but does not provide explicit guidance on when to use this tool versus alternatives, nor does it include when-not-to-use conditions. The usage is implied by the tool's purpose, but no comparative guidance is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

realtime_data_streamsA

Read-only

Inspect

High-frequency real-time market data for trading agents, market-making bots and fintech analysts. Returns FX ticks (bid/ask/spread), intraday OHLCV candles, crypto orderbook snapshots (depth 5-50), recent trades with VWAP, and sovereign bond yields. All sources are keyless public REST APIs (Binance, Coinbase, Kraken, OKX, open FX feeds, worldgovernmentbonds.com). Ultra-short cache: 10s for ticks/trades, 60s for orderbook. Use when an agent needs live market data as precise numeric inputs for trading logic, arbitrage detection, or portfolio valuation.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Data stream type: fx_tick (latest FX bid/ask/mid/spread), fx_history_intraday (OHLCV candles), crypto_orderbook (order book snapshot), crypto_trades_recent (last 50 trades + VWAP), bond_yields (sovereign yield %)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`depth`	No	Orderbook depth (levels each side) for crypto_orderbook mode (default: 20)
`period`	No	Candle period for fx_history_intraday mode (default: 5m)
`symbol`	Yes	Market symbol. FX: EURUSD, GBPUSD, USDJPY. Crypto: BTCUSDT, ETHUSDT, BTC-USD. Bonds: US10Y, US2Y, DE10Y, FR10Y, UK10Y, JP10Y, IT10Y

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`symbol`	Yes
`fx_tick`	No
`sources`	Yes
`fx_history`	No
`bond_yields`	No
`crypto_trades`	No
`quality_score`	Yes
`crypto_orderbook`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, openWorldHint), the description adds cache durations (10s/60s) and states all sources are keyless public APIs. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with multiple sentences, front-loaded with main purpose. No fluff, but could be slightly shortened. Adequate structure.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Output schema exists, so return values are covered. However, the description omits mention of the 'async' parameter behavior, which is a significant operational trait. This gap reduces completeness for an agent invoking the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description repeats some parameter context (e.g., 'deep 5-50') but does not add significant new meaning beyond what the schema already provides for each parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides high-frequency real-time market data for trading agents, listing specific data types (FX ticks, OHLCV, orderbook, trades, bond yields) and sources. It distinguishes from sibling tools like fx_rate or historical_price_series by emphasizing real-time nature and specific use cases.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use when an agent needs live market data as precise numeric inputs for trading logic, arbitrage detection, or portfolio valuation.' This provides clear context but does not compare directly to alternatives or state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recruiting_architectC

Read-only

Inspect

Architecte du recrutement — Gapup agent-payable C-suite expertise (CHRO). Returns a structured, audited deliverable. Reference case: Stripe France — 12 postes Q3 2026 · sourcing multi-canaux + employer brand + frameworks d'entretien + parcours candidat · time-to-hire -45%. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`roles`	Yes
`budget`	Yes
`company`	Yes
`preferences`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The readOnlyHint annotation (true) and the description stating 'Returns a structured, audited deliverable' align, indicating no state mutation. The description adds that inputs are validated server-side but does not disclose the async parameter behavior or provide details on deliverable format or latency. With annotations covering key behavioral traits, the description offers minimal additional transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes a lengthy reference case that may distract from core functionality. While not verbose, it could be more front-loaded with essential usage details. The mixed language (French/English) may reduce clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of the input schema (6 parameters, nested objects) and absence of output schema, the description is insufficient. It only vaguely states 'structured, audited deliverable' without specifying format, content, or how to interpret the return value. The reference case provides context but does not substitute for complete documentation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is very low (17%), and the description does not explain any parameters beyond noting inputs are validated. Critical parameters like focus, roles, and preferences are left entirely to the schema's minimal descriptions, which are insufficient for correct invocation. The description adds no value to parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a structured, audited deliverable for recruitment architecture, targeting C-suite expertise. It provides a concrete reference case (Stripe France) that illustrates the scope. However, it does not explicitly differentiate from sibling tools like candidate_screening_ranking or talent_intelligence, which share the recruiting domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The description does not mention prerequisites, exclusions, or typical scenarios beyond the reference case. An agent must infer usage from the title and reference, leaving ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

re_deal_screenerA

Read-only

Inspect

Screener deal immobilier (EU) — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Answers: Screen this real estate deal: , , asking € — give me cap rate vs market, location score, risk flags, and deal recommendation. · Should I pursue this hotel investment at for € with keys? Run an EU deal screener with DVF comparables and Géorisques risk data. · What is the real estate market valuation for a at based on recent French DVF transactions? · Run a due diligence deal screen on this property: , €, sqm — flood risk, cap rate, price vs comparables. · Evaluate this commercial real estate deal for an investment committee: at , €, NOI €. Reference case: Hôtel boutique 45 keys · 12 rue de la Paix 75002 Paris · €12.5M · €277k/key · comp DVF €250-380k/key · location 92/100 · score 72 · pursue-with-conditions. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`address`	Yes
`deal_type`	Yes
`country_iso2`	Yes		FR
`units_or_keys`	No
`gross_area_sqm`	No
`current_noi_eur`	No
`asking_price_eur`	Yes
`investment_thesis`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds value beyond annotations by explaining the async behavior (returns job_id immediately), input validation ('Inputs are validated server-side'), and that the output is a 'structured, audited deliverable'. It does not contradict annotations (readOnlyHint=true). Some details like error handling or rate limits are missing.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose with multiple example queries and a reference case. While it front-loads the main purpose, the examples add length. It could be more concise without losing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (9 parameters, no output schema), the description provides a good overall understanding of use cases and expected output fields (cap rate, location score, risk flags, recommendation). However, it lacks details on return format, error handling, and full parameter documentation. The output schema is missing, so the description should be more complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (11%). The description compensates somewhat by mapping example queries to parameters (address, deal_type, asking price, units/keys, sqm, NOI, investment thesis) and mentioning the 'async' parameter (though not in description). However, required parameters like 'country_iso2' and optional ones like 'units_or_keys' are not fully explained.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is an EU real estate deal screener that returns a structured, audited deliverable with cap rate, location score, risk flags, and recommendation. It distinguishes from siblings like 'ma_deal_screener' and 'real_estate_intel' by specifying EU focus and French data sources (DVF, Géorisques).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides multiple example queries (e.g., 'Screen this real estate deal:', 'Should I pursue this hotel investment?') that implicitly guide when to use the tool. It also mentions a reference case for further illustration. However, it does not explicitly state when not to use it or suggest alternative tools for non-EU deals.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

renewal_optimizerC

Read-only

Inspect

Optimiseur de renouvellements — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Renewals 10 comptes · €89k ARR à 90j · 3 comptes at-risk · Playbook 6 scénarios. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`horizon`	No
`product`	Yes
`accounts`	Yes
`targetRenewalRatePct`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds that inputs are validated server-side and a deliverable is returned, but does not disclose more detailed behaviors such as rate limits, data retention, or result format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short and front-loaded with the core purpose. It efficiently conveys the tool's value but could be more structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, multiple parameters, no output schema), the description is incomplete. It lacks details on the deliverable's format, how to interpret results, and any prerequisites beyond input validation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 17% (only 'async' has a description). The description says 'send the documented case fields' without explaining each parameter. It adds minimal value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is an 'Optimiseur de renouvellements' that returns a structured, audited deliverable, and provides a reference case. However, it does not differentiate from sibling tools with similar purposes like churn_defender or save_plays.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives no guidance on when to use this tool versus alternatives. It mentions server-side input validation and a reference case but lacks explicit usage context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

repo_rate_arbitrage_scannerA

Read-onlyIdempotent

Inspect

Scans for arbitrage opportunities between repo rates (ECB) and short-term funding markets (Treasury Direct). Designed for CFOs to identify cost-effective funding strategies. Inputs include optional date ranges and currency filters. Outputs structured arbitrage opportunities with rate differentials and confidence scores.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`endDate`	No
`currency`	No
`startDate`	No
`minDifferential`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`opportunities`	No

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint, which cover safety and idempotency. The description adds that the tool scans for opportunities and provides structured outputs, but it does not disclose behavioral details such as data freshness, limitations, or that the async parameter can return a job_id immediately. The additional context is useful but not substantial beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, concise and to the point. It effectively communicates the tool's purpose, target audience, inputs, and outputs without unnecessary words. The structure is front-loaded with the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 5 optional parameters and an output schema. The description mentions the output structure but omits the async parameter's behavior (returning a job_id), which is a notable gap. Given that the tool is not highly complex, the description is adequate but misses a key invocation detail.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only async described). The description mentions 'optional date ranges and currency filters', partially covering startDate, endDate, and currency. However, it does not explain minDifferential or async behavior beyond the schema's minimal description. It adds some meaning but does not fully compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool scans for arbitrage opportunities between repo rates and short-term funding markets, targeted at CFOs. It specifies inputs (optional date ranges and currency filters) and outputs (structured opportunities with rate differentials and confidence scores). This distinguishes it from sibling tools like tariff_arbitrage_finder or ma_arbitrage_hunter by focusing on a specific market pair.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for CFOs seeking cost-effective funding strategies, but it does not explicitly state when to use this tool over alternatives. There is no mention of scenarios where it is inappropriate or comparisons to sibling tools. The guidance is only implicitly derived from the purpose, not directly addressing trade-offs.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

reputation_engineC

Read-only

Inspect

Moteur de réputation — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: PayShield SaaS — Monitoring réputation Q2 2026. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`brand`	Yes
`channels`	Yes
`industry`	Yes
`keywords`	Yes
`historicalCrises`	No

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and open-world, so the description's mention of returning a structured deliverable and async behavior adds some context but does not contradict annotations. It does not provide additional behavioral traits beyond what annotations offer.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is reasonably short but includes jargon ('Gapup agent-payable C-suite expertise') and mixes languages (French and English), reducing clarity. It front-loads the purpose but could be more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 6 parameters, 4 required, and no output schema, the description is insufficient. It does not specify what the structured deliverable contains, how to interpret results, or error handling. The async behavior is already in schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, and the description fails to explain the meaning or usage of parameters like brand, keywords, channels, industry, and historicalCrises. It merely references 'documented case fields' without elaboration.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description indicates it's a reputation engine for C-suite expertise, returning a structured deliverable, but does not differentiate it from sibling tools like sentiment_news_pulse or brand_builder. The reference case provides some context but is vague.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states inputs are validated server-side and mentions async, but gives no guidance on when to use this tool versus alternatives, nor any exclusions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

research_paper_qaB

Read-only

Inspect

Synthèse littérature scientifique (PaperQA2) — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Answers: Conduct a literature review on — what does the evidence show across recent papers? · Evaluate the current hypothesis that — supporting and contradicting evidence with citations. · Map contradictions in the literature on — which camps exist, how many papers per side? · What is the state-of-the-art understanding of as of ? · Perform an interdisciplinary synthesis on — findings from and . Reference case: Gut-brain axis · Cognitive performance in healthy adults · OpenAlex+SemanticScholar+CORE · Evidence synthesis · DOI-verified citations · Contradictions + gaps mapped. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`max_papers`	Yes
`year_range`	No
`focus_domain`	Yes		all
`include_preprints`	Yes
`research_question`	Yes
`evidence_grade_required`	Yes		standard

Tool Definition Quality

B3.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds value by detailing the tool uses OpenAlex+SemanticScholar+CORE, returns DOI-verified citations, and maps contradictions. This contextualizes the annotation without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose and includes extraneous details (e.g., 'Reference case: Gut-brain axis'), while the opening line is cryptic. It lacks a clear structure with headings or bullet points, making it harder to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 7 parameters (5 required), no output schema, and low schema coverage, the description is incomplete. It does not explain return value structure, error behavior, or parameter constraints beyond minimal examples.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only 14% of parameters have descriptions in the schema. The description does not clarify the meaning of parameters like evidence_grade_required, focus_domain, or year_range. It mentions some parameter names in examples but provides no formal explanations, failing to compensate for low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs literature synthesis and evidence review, listing specific example questions (e.g., literature review, hypothesis evaluation, contradiction mapping). This makes its purpose highly specific and distinguishable from siblings like sci_literature_search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides example queries that imply usage scenarios, but it lacks explicit guidance on when to use this tool over alternatives (e.g., sci_literature_search) and does not state prerequisites or exclusions. The 'Gapup agent-payable C-suite expertise (RISK)' line is cryptic and unhelpful.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

retail_media_attribution_bridgeA

Read-onlyIdempotent

Inspect

Provides unified attribution insights for retail media and programmatic campaigns by analyzing MMM signals from FreeWheel Marketplace and Common Crawl. Designed for ad revenue operations teams to bridge cross-channel performance gaps. Accepts campaign IDs, date ranges, and channel filters as input. Returns structured attribution data with source provenance and confidence scores.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`endDate`	Yes	End date for attribution window (YYYY-MM-DD)
`channels`	No	Channels to include in analysis
`startDate`	Yes	Start date for attribution window (YYYY-MM-DD)
`campaignIds`	Yes	List of campaign identifiers to analyze
`confidenceThreshold`	No	Minimum confidence score for included signals

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`attribution`	No

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint. The description adds meaningful context about data sources (FreeWheel, Common Crawl) and output structure (provenance, confidence scores), enhancing transparency beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, audience, and input/output summary. No extraneous information. Each sentence contributes value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With output schema present and strong annotations, the description covers core aspects. It could mention async behavior or performance expectations, but overall it is adequate for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and parameter descriptions are complete. The description only summarizes inputs (campaign IDs, date range, channels) without adding new meaning. For high schema coverage, baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides unified attribution insights for retail media and programmatic campaigns, with specific data sources and outputs. However, it does not explicitly distinguish itself from sibling tools like programmatic_attribution_calibrator, which slightly lowers clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for cross-channel attribution by ad revenue teams, but provides no explicit when-to-use or when-not-to-use guidance, nor does it mention alternatives. This leaves the agent without clear decision criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

retail_media_esg_complianceA

Read-onlyIdempotent

Inspect

Audits retail media networks for ESG compliance by analyzing ad placements, tracking cookies, and verifying ethical advertising standards. Designed for ad_revenue_ops teams to ensure GDPR and sustainability compliance across digital retail platforms. Accepts domain lists or network identifiers as input and returns structured compliance reports with warnings and source references. Requires async:true to avoid timeout errors.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`domains`	No	List of retail media network domains to audit
`checkESG`	No	Enable ESG advertising standards compliance check
`checkGDPR`	No	Enable GDPR cookie tracking compliance check
`networkIds`	No	List of retail media network identifiers

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`summary`	No
`warnings`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint (true), openWorldHint (true), and idempotentHint (true). The description adds value by disclosing the async requirement to avoid timeouts, the scope of analysis (ad placements, cookies), and that it returns structured reports with warnings and source references. This complements the annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at four sentences, with each sentence serving a distinct purpose: main action, target audience, input format, operational requirement. No redundant or extraneous information is present.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 5 parameters, an output schema, and exists among 200+ siblings, the description adequately covers input, output, target audience, and operational notes. It omits specifics about the compliance report structure, but the presence of an output schema compensates for this.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with each parameter already described clearly. The description merely summarizes the inputs as 'domain lists or network identifiers' and reiterates the async recommendation already in the schema. It adds no new semantic meaning beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the verb 'audits' and the resource 'retail media networks for ESG compliance', detailing the analysis of ad placements, tracking cookies, and ethical advertising standards. It distinguishes itself from sibling tools by focusing solely on retail media networks, a unique niche.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context by stating it is designed for ad_revenue_ops teams and mentions the requirement for async:true to avoid timeouts. However, it does not explicitly contrast with alternative tools like vendor_esg_audit or manufacturing_esg_compliance_mapper, nor does it specify when not to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

revops_architectC

Read-only

Inspect

Architecte RevOps — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Qonto — ARR €200M · 200 reps · forecast ±35% · fuite €4,2M/an identifiée · plan RevOps 12 semaines. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`keyMetrics`	Yes
`objectives`	Yes
`revenueTeam`	Yes
`currentStack`	Yes
`horizonMonths`	Yes
`currentPainPoints`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description aligns with annotations: readOnlyHint (true) is consistent with 'returns a structured, audited deliverable' as a non-mutating action, and openWorldHint (true) is not contradicted. However, the description adds little beyond the annotations, such as execution time or side effects, and does not explain the async parameter's behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at three sentences, but it mixes French and English ('Architecte RevOps', 'fuite €4,2M/an'), which may confuse the agent. It is not optimally front-loaded; the most actionable instruction ('send the documented case fields') comes last. The reference case adds context but is not essential.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 parameters, 7 required, nested objects, no output schema), the description is insufficient. It does not describe the deliverable format, expected runtime, or how to handle the async parameter. The low schema coverage (13%) further burdens the description, but it fails to provide meaningful guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%, meaning the input schema provides sparse descriptions for most parameters. The tool description does not compensate; it merely says 'send the documented case fields' without explaining any specific parameter's purpose, format, or relationship. Parameters like 'company' with nested fields remain undocumented.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states it is an 'Architecte RevOps' that returns a structured, audited deliverable, targeting C-suite expertise (CRO). It provides a reference case (Qonto) to illustrate its scope. However, it does not differentiate from sibling tools like abm_architect or ld_architect, which also have 'architect' in their names.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description only instructs to 'send the documented case fields' and notes that inputs are validated server-side. It provides no explicit guidance on when to use this tool versus alternatives, nor does it describe prerequisites or exclusions. The complex input schema with 7 required parameters suggests high specificity, but no usage context is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rfp_tender_architectC

Read-only

Inspect

Architecte d'appels d'offres — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: AO DINUM — Plateforme IA souveraine. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`rfpType`	Yes
`rfpScope`	Yes
`budgetRange`	Yes
`deadlineISO`	Yes
`clientCompany`	Yes
`ourPositioning`	Yes
`compliancePoints`	No
`competitorsLikely`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds minimal behavioral context (e.g., async support via parameter schema), but does not detail what 'audited deliverable' entails or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences plus a reference, no fluff. However, the first sentence is jargon-heavy and could be clearer.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 9 parameters (7 required) and no output schema, the description is insufficient. It does not explain how to construct inputs or what the deliverable looks like, leaving critical gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 11% (only async param has description). The description fails to explain the seven required fields or their meaning, relying on 'send the documented case fields' which is vague.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a structured audited deliverable for tenders, referencing a specific case. However, the verb action is missing (e.g., 'analyze', 'architect'), and the purpose is vague compared to siblings like proposal_generator.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description does not mention exclusions or contextual triggers, leaving the agent to infer from the name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rse_policy_builderC

Read-only

Inspect

Architecte de politique RSE — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: TechCorp SAS — Politique RSE 2025-2028 (500 FTE, €60M CA, SaaS B2B France). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`values`	Yes
`company`	Yes
`ambitions`	Yes
`targetLabels`	No
`currentInitiatives`	No
`targetStakeholders`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds that inputs are validated server-side, which goes beyond the readOnlyHint and openWorldHint annotations. However, it does not disclose the output format, potential side effects, or behavior on invalid inputs, so it provides only modest additional transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences and relatively concise, but it mixes French and English ('Architecte de politique RSE') and uses jargon ('Gapup agent-payable C-suite expertise'), which slightly reduces clarity. Still, it is efficient with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 parameters, nested objects, no output schema), the description is incomplete. It does not specify what the deliverable contains or its format (text, JSON, PDF), nor does it mention how to interpret the openWorldHint. This leaves significant gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema description coverage at only 13%, the description must compensate, but it only vaguely references 'the documented case fields' without explaining any specific parameters. For a complex schema with 8 parameters and nested objects, this is insufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as an architect for CSR policies (politique RSE) and states it returns a structured, audited deliverable, with a reference case for context. However, it does not explicitly differentiate from sibling tools like sustainability_report or esg_audit_multi, though the domain specificity helps.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It mentions C-suite expertise and a reference case but lacks explicit when-to-use or when-not-to-use instructions, leaving the agent to infer usage from context alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sabbatical_policy_comparatorA

Read-onlyIdempotent

Inspect

Enables CHROs to benchmark their company's sabbatical policies against peer organizations using data from SHRM, Payscale, and Mercer. Inputs include company size, industry, and current policy details. Outputs structured comparison with cost impact analysis, eligibility criteria, and duration benchmarks. Ideal for strategic HR planning and policy optimization.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`industry`	Yes	Industry classification code (NAICS)
`peerGroup`	No	List of peer company names for direct comparison
`companySize`	Yes	Number of employees in the company
`currentPolicy`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`benchmark`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and openWorldHint. The description adds value by detailing that the tool outputs structured comparison with cost impact analysis, eligibility criteria, and duration benchmarks, and uses external data sources. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences, concise, and front-loaded with purpose. Every sentence adds relevant information about target user, data sources, inputs, outputs, and use case, with no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and annotations, the description covers inputs, outputs, and strategic use case. It does not mention limitations or potential inaccuracies, but for a benchmarking tool with openWorldHint, this level of detail is sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 80% and the description mentions key input parameters (company size, industry, current policy) covered in the schema. The description adds minimal meaning beyond the schema; the 'peerGroup' and 'async' parameters are not elaborated. Thus, baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool benchmarks sabbatical policies against peers, specifies the target user (CHROs) and data sources (SHRM, Payscale, Mercer). It distinctively focuses on sabbatical policies, differentiating from sibling tools like comp_benchmark_geo_delta or executive_comp_peer_benchmark.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use the tool (strategic HR planning, policy optimization) and lists required inputs (company size, industry, current policy). However, it does not explicitly state when not to use it or mention alternative tools for other benchmarking needs.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

safety_guardrail_breach_analyzerA

Read-onlyIdempotent

Inspect

Analyzes potential LLM guardrail breaches against IEEE 7000 ethical compliance standards. Designed for risk persona to evaluate safety violations in AI outputs. Accepts raw LLM responses or structured breach reports, returns compliance analysis with severity scoring and mitigation recommendations.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`context`	No	Contextual information about the prompt or conversation
`llmOutput`	Yes	Raw text output from LLM to analyze for guardrail breaches
`severityThreshold`	No	Minimum severity score to report (0-10 scale)
`includeMitigations`	No	Whether to include mitigation recommendations

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`breaches`	No
`warnings`	No
`complianceScore`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations confirm readOnly, openWorld, and idempotent, reducing the burden. Description adds behavioral context: it returns compliance analysis with severity scoring and mitigation recommendations, and accepts flexible input formats. No contradictions or missing critical aspects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences efficiently convey purpose, standard, persona, input types, and output. No redundant or extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With output schema present, description adequately covers input types, main function, and intended use. Missing details about prerequisite knowledge of IEEE 7000 are acceptable for technical users. Mostly complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds high-level context (severity scoring relates to severityThreshold, mitigation recommendations to includeMitigations) but does not elaborate on parameter details beyond what schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool analyzes LLM guardrail breaches against IEEE 7000 ethical compliance standards, specifying the risk persona as target user. It distinguishes from siblings like jailbreak_attempt_detector or bias_amplification_tracker by focusing on IEEE 7000 compliance and severity scoring.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description mentions it is 'designed for risk persona' and accepts both raw LLM responses and structured breach reports, providing input type guidance. However, it does not explicitly state when to use this tool over alternatives or when not to use it, lacking comparative usage advice.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

safety_violation_incident_loggerC

Read-onlyIdempotent

Inspect

Logs AI safety violations for compliance reporting, targeting risk management personas. Accepts incident details such as violation type, severity, description, and timestamp. Returns structured data with compliance categorization based on NIST AI RMF guidelines. Ideal for automated incident tracking and regulatory reporting workflows.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`metadata`	No
`severity`	Yes
`timestamp`	Yes
`description`	Yes
`violationType`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`incidentId`	No
`nistReference`	No
`complianceCategory`	No

Tool Definition Quality

C2.3/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description claims the tool logs violations (a write operation), but the annotations mark it as readOnlyHint=true, creating a direct contradiction. This severely undermines transparency. Additionally, the idempotentHint contradicts typical logging behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences with reasonable front-loading of purpose and target audience. However, the phrase 'targeting risk management personas' is slightly extraneous and could be removed to improve conciseness without losing value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having an output schema, the description fails to explain how the structured compliance categorization works or what fields are returned. With low schema coverage and contradictions, the description leaves agents with an incomplete and potentially misleading understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (descriptions for async and metadata only). The description lists parameter names but adds no semantic value about constraints like format, allowed values, or the meaning of severity levels. For a tool with 4 required params and enums, this is insufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool logs AI safety violations for compliance reporting, but the verb 'logs' conflicts with the readOnlyHint annotation, introducing ambiguity about the tool's core purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Only a vague recommendation for automated incident tracking and regulatory reporting workflows is provided. No explicit guidance on when to avoid using this tool or how it compares to sibling tools like ai_incident_response or safety_guardrail_breach_analyzer.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sales_enablement_architectB

Read-only

Inspect

Architecte Sales Enablement — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Spendesk — 45 reps · attainment 67% · ramp 5 mois → 3 mois · programme 8 modules · +€2,1M ARR. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`gaps`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`salesTeam`	Yes
`objectives`	Yes
`currentEnablement`	Yes

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds context by stating the deliverable is 'structured' and 'audited' and inputs are validated server-side. However, it does not disclose the deliverable's format, side effects, or limitations, offering only marginal added value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (about 3 lines) and front-loaded with the tool's role. However, it lacks clear structure (e.g., bullet points) and could be more organized, though every sentence contributes useful information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, required parameters, no output schema) and low schema coverage, the description is insufficient. It omits details about the deliverable's structure, possible outputs, and how to interpret results, leaving an agent with significant ambiguity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 17% schema description coverage and 6 parameters including nested objects, the description fails to explain the parameters. It merely says 'send the documented case fields' without elaborating on the required fields (e.g., company, salesTeam, currentEnablement), adding no semantic value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as an architect for sales enablement, targeting C-suite executives (CRO), and states it returns a structured, audited deliverable. A reference case with specific metrics (e.g., Spendesk, +€2,1M ARR) distinguishes it from similar tools like comp_plan_architect or revops_architect, confirming a unique purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for designing a sales enablement program but does not provide explicit guidance on when to use this tool versus alternatives. It mentions inputs are validated server-side but lacks exclusions or comparative context, leaving the agent to infer applicability.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sales_pipeline_forecastB

Read-only

Inspect

Prévision de pipeline commercial — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Doctolib Enterprise — pipeline Q2 2026 · 50 deals enterprise/mid-market · forecast confidence par deal + commit/best-case/worst-case. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`pipeline`	Yes
`historicalConversionByStage`	No

Tool Definition Quality

B3.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint and openWorldHint. Description adds that the deliverable is audited and premium (C-suite expertise), providing context beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences and a reference case. Purpose is front-loaded and overall concise for a complex tool. The reference case adds helpful context without excessive length.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite complexity (nested objects, no output schema), the description omits deliverable structure, audit meaning, and return format. The reference case helps but does not replace a general explanation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (20%). The description does not explain any parameters, merely stating to 'send the documented case fields.' Required nested objects (company, pipeline) lack descriptions for their fields, leaving the agent uninformed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a structured, audited deliverable for pipeline forecasting. Title and reference case reinforce purpose. However, it does not distinguish from sibling tools like deal_coach or battle_plan that may overlap.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description mentions server-side validation and the async parameter but lacks context for choosing this tool over others. No comparison with siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanctions_screener_multiB

Read-only

Inspect

Screening Sanctions Multi-listes — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Answers: For , run full OFAC + EU + UK HMT + UN + SECO + Canada SEMA + PEP + adverse media screening with composite risk score and evidence trail. · Is <company/individual> on any major international sanctions list? · What is the composite AML risk score for across all major watchlists? · Screen this M&A target / supplier / LP against all major sanctions lists and give me a compliance recommendation. · Is a PEP or associated with a PEP? What Enhanced Due Diligence is required? Reference case: Veridian Trading Co. LLC (Cyprus) — 7 listes · PEP check · adverse media 2 ans · composite 52/100 · escalate-to-compliance → EDD requis. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`address`	No
`aliases`	No
`entity_name`	Yes
`entity_type`	Yes
`context_note`	No
`date_of_birth`	No
`jurisdiction_focus`	Yes		all
`country_of_registration`	No
`adverse_media_lookback_days`	Yes

Tool Definition Quality

B3.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds behavioral context beyond annotations, such as 'returns a structured, audited deliverable', mentions of composite risk scoring, and server-side validation. The readOnlyHint=true annotation is not contradicted (the tool likely reads existing data to generate reports). The description also includes a reference case illustrating outcome format. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise but includes unnecessary phrases like 'Gapup agent-payable C-suite expertise (RISK)' and a reference case that may not be universally helpful. The bullet-like questions are clear, but the overall structure could be tightened. It is not excessively long but could be more focused.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (10 parameters, 4 required, no output schema), the description covers the core use cases and what the tool returns (composite risk score, evidence trail, compliance recommendation). However, it omits details on the output structure, which forces the agent to infer. Considering the lack of output schema, the description should provide more concrete information about the deliverable format.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 10% schema description coverage, the description partially compensates by listing required fields (entity_type, entity_name, jurisdiction_focus, adverse_media_lookback_days) and mentioning others (address, aliases, etc.). However, it does not explain the meaning or acceptable values for most parameters beyond what the schema provides. The description's added value is minimal, leaving the agent with insufficient guidance for parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool's purpose: screening entities against multiple sanctions lists (OFAC, EU, UK HMT, UN, etc.) and returning a composite risk score with evidence. It lists specific questions the tool answers, which helps the agent understand its function. However, it does not explicitly differentiate from sibling tools like kyc_screener or kyc_screener_batch, which may have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides example queries and a reference case, giving some context on when to use the tool. However, it lacks explicit guidance on when not to use it or how to choose among alternatives (e.g., kyc_screener). No comparison to siblings or exclusion criteria are provided, leaving the agent to infer usage boundaries.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

save_playsC

Read-only

Inspect

Plans de sauvetage clients — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Kyriba — Plan sauvetage 30j · ARR €11.988 · Champion parti · Script 6 actions · 3 concessions. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`account`	Yes
`company`	Yes
`product`	Yes

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, indicating no side effects. The description adds 'Returns a structured, audited deliverable' and 'Inputs are validated server-side' which are consistent but do not provide additional behavioral insight beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a few sentences but contains mixed languages and jargon (e.g., 'Gapup agent-payable C-suite expertise (CRO)') that could be streamlined. It is not overly long but lacks clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex nested input schema with low coverage and no output schema, the description fails to explain what the deliverable contains or how to use the parameters effectively. The reference case provides some context but is insufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 25% schema description coverage, the description fails to clarify the nested parameters. It only says 'send the documented case fields' and gives a reference case, but does not explain what each parameter means or how to structure them.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description mentions 'Plans de sauvetage clients' and says it returns a structured deliverable, but the tool name 'save_plays' is ambiguous and the description mixes French and English with jargon like 'Gapup agent-payable C-suite expertise (CRO)', making the exact purpose unclear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description does not mention when not to use or any prerequisites, leaving the agent without context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sci_literature_searchA

Read-only

Inspect

Recherche bibliographique multi-sources sur la litterature scientifique. Sources : OpenAlex (200M+ works) · Semantic Scholar · arXiv · PubMed · CrossRef. Modes : search | meta_analysis | citation_network | expert_finder. Keyless / free tier. Cache LRU 12h.

ParametersJSON Schema

Name	Required	Description
`mode`	No	Mode de recherche. Defaut: search
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Keywords, titre, auteur, DOI (ex: 10.xxxx/xxxx accepte)
`domain`	No	Domaine scientifique. Defaut: all
`date_to`	No	Date ISO fin (YYYY-MM-DD)
`date_from`	No	Date ISO debut (YYYY-MM-DD)
`max_results`	No	5-50. Defaut: 20
`min_citations`	No	Nombre minimal de citations

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`papers`	Yes
`status`	Yes
`experts`	No
`sources`	Yes
`meta_analysis`	No
`quality_score`	Yes
`citation_network`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare the tool read-only and open-world. The description adds non-obvious details: keyless/free tier, LRU cache (12h). This goes beyond annotations, though rate limits or throttling are not mentioned.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (two sentences plus a line of formatted lists) and front-loaded with purpose. Every sentence adds value, no extraneous text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multi-source, multiple modes) and that an output schema exists, the description covers key aspects but lacks details on result aggregation, error handling, or pagination. Adequate but not comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good descriptions in the schema itself. The tool description adds no parameter-level detail beyond the schema, so it meets the baseline without further enhancement.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs multi-source bibliographic search on scientific literature, listing specific sources and modes. This verb+resource combination distinguishes it from siblings, which are mostly business-oriented tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context (sources, modes, free tier) but does not explicitly state when to use this tool versus alternatives. There are no exclusions or usage heuristics.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sec_filing_decoderB

Read-only

Inspect

Décodeur de filing SEC — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Answers: Read the 10-K of and give me the material red flags, KPI movements, and a board-ready executive summary. · What has materially changed in 's risk profile in its latest annual filing? Flag any going-concern or auditor-change signals. · Is there any M&A signal or strategic review hint in 's most recent SEC filings? What's the evidence? · Prepare a due-diligence SEC filing brief for : financial snapshot, red flags, governance changes, and recommended next actions. · What is the sentiment of 's latest 10-K compared to its most recent 10-Q — bullish, neutral, or bearish? Reference case: SHOP · 10-K FY2024 · 4 red flags (1 critical: merchant concentration) · Revenue +24.7% YoY · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`cik`	No
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	Yes		all
`ticker`	No
`filing_types`	Yes
`lookback_months`	Yes

Tool Definition Quality

B3.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds value beyond the annotations (readOnlyHint, openWorldHint) by detailing the async behavior (returns job_id immediately) and stating that it returns a structured, audited deliverable. There is no contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is overly verbose, containing numerous example questions that could be moved to documentation. It lacks a concise, front-loaded structure; key information is buried in lengthy examples.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the tool's output and async behavior, and provides example outputs. However, given the moderate complexity (6 parameters, no output schema), it does not fully compensate for missing parameter documentation or detail on return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema description coverage at only 17%, the description should compensate by explaining parameters. It only mentions the async parameter indirectly and relies on examples to imply ticker and filing_types usage. No explicit details on each parameter are provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool decodes SEC filings (10-K, etc.) and returns structured analyses including red flags, KPIs, and executive summaries. It also provides example queries that illustrate the purpose. However, it does not explicitly differentiate from sibling tools like earnings_reviewer, which might handle similar SEC filing analyses.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes example questions that implicitly guide usage, and mentions the async parameter. However, it does not provide explicit guidance on when to use this tool versus alternatives, nor does it state prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sentiment_news_pulseB

Read-only

Inspect

Pulse Média & Sentiment — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: What is the current PR / brand sentiment for over the last 7 days? Show top headlines, trend signals, and recommended actions. · Is there a crisis building for ? Detect early-warning signals in press coverage and flag emerging negative narratives. · Track launch media coverage for — what is the press sentiment and which topics dominate the conversation? · Compare media sentiment between and its competitors over the past week. · What should our communications director prioritize in the next 48h based on current press coverage of ? Reference case: Velora Payments — Pulse média 7j · sentiment neutre (score +5) · crise émergente détectée · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entity_name`	Yes
`entity_type`	Yes		company
`sentiment_lens`	Yes		reputation
`date_range_days`	Yes
`language_filter`	Yes		en
`include_competitors`	No

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that the output is a 'structured, audited deliverable' and mentions server-side input validation, but does not disclose rate limits, auth requirements, or behavior on invalid inputs. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is long, disorganized, and includes jargon ('Gapup agent-payable C-suite expertise (CMO)') and a reference to a specific case study ('Velora Payments'). It is not front-loaded; the main purpose is buried in the middle. Excessive verbosity without clear structure.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 7 parameters, no output schema, and moderate complexity, the description is incomplete. It lacks systematic parameter documentation, output format details, error handling, and behavior for edge cases. The examples help but do not compensate for missing structured information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 14% (only 'async' has description). The description implicitly covers entity_name, date_range_days, and sentiment_lens through examples, but language_filter and include_competitors are not mentioned. The description partially compensates for low schema coverage but is not comprehensive.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides media and sentiment analysis for entities like companies, products, etc., and lists example queries. However, it does not differentiate from sibling tools like reputation_engine, trend_watcher, or press_influencer, which may have overlapping functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides several example use cases (brand sentiment, crisis detection, launch tracking, competitor comparison) but does not explicitly state when to use this tool versus alternatives. No exclusions or conditions are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

seo_cro_auditA

Read-only

Inspect

Full SEO + CRO audit of any public URL. Analyses technical SEO (HTTP status, HTTPS, title/meta/canonical/robots, H1-H2, JSON-LD structured data, sitemap, robots.txt, OG/Twitter cards), content SEO (word count, keyword density top-10, readability estimate, image alt coverage, internal/external links), performance signals (page size, estimated render time, inline scripts/styles, unoptimised images), and CRO (CTA detection, above-fold CTAs, forms, social proof, trust signals, pricing visibility). Optionally compares up to 5 competitor URLs. Returns 0-100 scores per dimension plus a prioritised (P0/P1/P2) recommendation list. ICP: marketing managers, SEO/CRO consultants, e-commerce ops, agency teams. Budget: 8s per URL. Cache TTL: 1h.

ParametersJSON Schema

Name	Required	Description
`url`	Yes	Fully-qualified URL to audit (e.g. https://stripe.com/pricing)
`mode`	No	Audit scope — defaults to 'full'
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`compare_competitors`	No	Optional list of competitor URLs to compare (max 5)

Output Schema

ParametersJSON Schema

Name	Required	Description
`url`	Yes
`status`	Yes
`sources`	Yes
`audit_modes`	Yes
`content_seo`	Yes
`cro_signals`	Yes
`quality_score`	Yes
`technical_seo`	Yes
`overall_scores`	Yes
`recommendations`	Yes
`performance_signals`	Yes
`competitor_comparison`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint) declare non-destructive, and description reinforces this by detailing safe audit outputs. It adds context on budget and cache, though could mention rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with purpose first, then listing components, optional features, and context. Slightly verbose but all sentences are informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and the presence of an output schema, the description covers key inputs, modes, async usage, and target audience. Could mention whether it handles redirects or authentication.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so description adds minimal extra meaning beyond parameter descriptions. It reiterates competitor compare limit but doesn't elaborate on how mode or async affect behavior.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description specifies a comprehensive SEO + CRO audit of any public URL, listing exact components (technical, content, performance, CRO) and distinguishing it from sibling tools like 'seo_keyword_research' which focuses on keywords only.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Target audience (marketing managers, SEO/CRO consultants) and constraints (8s budget, 1h cache) are clear. It suggests async for slow operations, but lacks explicit when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

seo_keyword_researchA

Read-only

Inspect

SEO keyword research from a seed keyword or topic. Uses Google Suggest (public, keyless) to discover related queries at 2 expansion levels, then clusters them by intent: informational / commercial / transactional / navigational — via heuristic pattern matching. Search volume is bucketed (very_high / high / medium / low / very_low) and clearly labelled as ESTIMATED — no fabricated precise numbers. Returns all keywords, intent clusters, quality scores (0-100), and top 10 opportunities. Supports country (gl) and language (hl) targeting. 100% keyless. Cache TTL 6h. ICP: SEO managers, content strategists, SaaS founders, agency teams.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`country`	No	ISO 3166-1 alpha-2 country code for Google Suggest (e.g. 'US', 'FR', 'DE'). Defaults to 'US'.
`language`	No	BCP-47 language code for suggestions (e.g. 'en', 'fr', 'de', 'es'). Defaults to 'en'.
`seed_keyword`	Yes	The seed keyword or topic to research (e.g. 'invoice software', 'project management tool')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`country`	Yes
`clusters`	Yes
`language`	Yes
`warnings`	Yes
`all_keywords`	Yes
`seed_keyword`	Yes
`quality_score`	Yes
`total_keywords`	Yes
`top_opportunities`	Yes

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, etc.), the description details the methodology (Google Suggest, two expansion levels, heuristic clustering), volume bucketing with ESTIMATED labels, cache TTL, and keyless operation. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Five sentences, front-loaded with purpose, each sentence adding unique value (methodology, volume, returns, targeting, constraints, ICP). No redundant or irrelevant content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers key outputs (keywords, clusters, scores, opportunities), targeting, and constraints. With an output schema present, it does not need to list all return fields; however, the '2 expansion levels' are not fully explained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds shorthand references (gl, hl) and examples of seed keywords, providing additional context beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'SEO keyword research from a seed keyword or topic' with specific verbs and resources. It distinguishes itself from sibling tools like seo_cro_audit by focusing solely on keyword research, not broader SEO audits.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description identifies target users (SEO managers, etc.) but does not explicitly state when to use this tool versus alternatives like seo_cro_audit or specify exclusions. Usage is implied through purpose but lacks direct guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sharia_compliance_screenerA

Read-only

Inspect

Sharia compliance screening engine for Islamic banks, Sukuk issuers, Gulf sovereign funds, halal investment managers and MENA family offices. Zero competing MCP on this vertical.

Standards supported: AAOIFI (default) | MSCI_Islamic | S&P_Sharia | DJIM

Four modes: • company — Full Sharia screen of a listed company: business activity (halal/haram/mixed) + AAOIFI financial ratios (debt/market-cap <30%, interest-assets <30%, non-compliant revenue <5%) • instrument — Sukuk / halal fund classification by ISIN or name. Maps to known Sharia boards. • sector_screen — Industry classification (halal/haram/mixed) with rationale + examples. Static AAOIFI-based map covering 40+ sectors. • financial_ratios — AAOIFI ratio computation on fetched or provided financials.

Prohibited activities screened: alcohol, gambling, pork, weapons, pornography, tobacco, conventional banking (riba), conventional insurance, adult entertainment, embryonic stem cells.

Output includes compliance_status (halal/haram/doubtful_mixed/purification_required), purification_pct when applicable, P0/P1/P2 signals, quality_score, and sources.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Screening mode. company=full listed company screen, instrument=Sukuk/fund classification, sector_screen=industry halal/haram classification, financial_ratios=AAOIFI ratio check.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Entity to screen. Company name, ticker or ISIN (e.g. "Aramco", "AAPL", "tobacco", "XS1234567890").
`standard`	No	Sharia standard to apply. Default "AAOIFI" (most conservative, widely accepted by Islamic banks).

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`company`	No
`signals`	Yes
`sources`	Yes
`instrument`	No
`quality_score`	Yes
`sector_screen`	No
`standard_used`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and destructiveHint=false, and the description fully aligns as a read-only screening tool. It goes beyond annotations by detailing the asynchronous option (async param), the output structure (compliance_status, purification_pct, P0/P1/P2 signals, quality_score, sources), and the prohibited activities screened, providing comprehensive behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured: overview, standards, four modes with brief explanations, prohibited activities, and output fields. It is front-loaded with the tool's purpose and target audience. While it is relatively long, every section adds necessary detail for a complex tool, so conciseness is not compromised.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers all essential aspects: target users, supported standards, all four modes with their specific use cases, prohibited activities, and the output fields. Given that there is an output schema (context signals indicate 'has output schema: true'), the description is complete without needing to re-explain return values; it still mentions key output components for clarity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage (all parameters described), so baseline is 3. The description adds significant meaning beyond the schema: it explains the differences between the four modes with concrete examples (e.g., 'AAOIFI financial ratios (debt/market-cap <30%)') and lists all prohibited activities, which the schema does not cover. This rich context helps the agent understand parameter effects.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a 'Sharia compliance screening engine' for Islamic finance entities. It lists four specific screening modes (company, instrument, sector_screen, financial_ratios) and supported standards (AAOIFI, MSCI_Islamic, etc.), distinguishing it from any sibling tool in the list (none are related to Sharia compliance).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states 'Zero competing MCP on this vertical', implying it is the go-to tool for Sharia compliance. It details when each mode is appropriate (e.g., 'company' for full listed company screen) and the default standard (AAOIFI). No explicit when-not guidance is needed due to the niche domain, but excluding alternatives makes it slightly less complete for a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

social_engagement_velocity_trackerA

Read-onlyIdempotent

Inspect

Tracks hourly social engagement velocity (likes, shares, comments) across Twitter, LinkedIn, and Reddit for CMOs. Inputs include platform handles/subreddits and time range. Outputs engagement metrics, velocity trends, and platform-specific insights. Ideal for real-time marketing performance monitoring and competitive benchmarking. Keywords: social media analytics, engagement tracking, marketing KPIs, CMO dashboard.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`hours`	No
`platforms`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`trends`	No
`sources`	No
`warnings`	No
`engagement`	No

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds that outputs include metrics and trends, which is consistent but not additional behavioral context. No mention of data freshness, rate limits, or other runtime characteristics.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences plus keywords, front-loading the core function. It is efficient with minimal redundancy, though the keywords section could be integrated or omitted without loss.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Output schema exists, so return details are handled. The description includes high-level outputs and use cases. However, it omits mention of the async parameter and its behavior, which is crucial for understanding long-running operations. Input structure is partially explained but not fully.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 33% (async param only). The description notes 'platform handles/subreddits and time range', adding meaning to the platforms and hours parameters. However, it does not explain the nested structure of platforms (name, handle, subreddit) or that name is required. More detail would help.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it tracks hourly social engagement velocity across three specific platforms for CMOs, with defined outputs. It distinguishes itself from siblings like social_influencer_fake_follower_detector by focusing on engagement metrics rather than influencer authenticity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions ideal use cases (real-time monitoring, competitive benchmarking) but provides no explicit guidance on when not to use this tool or alternatives. The agent is left to infer usage from context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

social_influencer_fake_follower_detectorA

Read-onlyIdempotent

Inspect

Analyzes up to 10 social media influencers for fake followers by checking engagement velocity patterns (Trends24) and RSS feed anomalies. Returns authenticity scores, follower growth spikes, and suspicious activity flags. Optimized for CMOs evaluating influencer partnerships. Includes keywords: influencer marketing, fake follower detection, engagement analysis, social media audit.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`platform`	Yes	Social media platform of the influencers
`influencerHandles`	Yes	Array of up to 10 social media handles (e.g., ['@influencer1', 'user2'])

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`results`	Yes
`sources`	Yes
`summary`	No
`warnings`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds behavioral details like the scope (up to 10 influencers) and methods (patterns, anomalies), enhancing transparency without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief (two sentences plus keywords) with no fluff. It front-loads the core action and value proposition, making it easy to scan.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, the description appropriately mentions return fields. It covers purpose, methods, scope, and target audience, providing sufficient context for an AI agent to invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are well-documented in the schema. The description reiterates the limit of 10 handles but adds no new semantic value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verbs ('analyzes', 'checking', 'returns') and clearly identifies the resource ('social media influencers for fake followers'). It distinguishes from sibling tools by detailing unique methods (Trends24, RSS anomalies) and outputs, and includes keywords for disambiguation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states the tool is 'optimized for CMOs evaluating influencer partnerships,' providing a clear context of use. However, it does not explicitly mention when not to use or suggest alternative tools, which slightly limits guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sovereign_data_breach_impactA

Read-onlyIdempotent

Inspect

Estimates financial impact of a data breach across three jurisdictions (US, EU, UK) for CFO strategic planning. Inputs include breach size, industry sector, and affected jurisdictions. Outputs include direct costs, regulatory fines, reputational damage, and cyber insurance premium adjustments. Ideal for cross-border risk assessment, financial contingency planning, and board-level reporting. Keywords: data breach cost, regulatory fines, cyber insurance, financial risk, cross-jurisdiction impact.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`industry`	No	Industry sector of the affected organization
`records_lost`	Yes	Number of records compromised in the breach
`jurisdictions`	Yes	Jurisdictions where the breach has legal or financial impact
`detection_time_days`	No	Time in days to detect the breach

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`total_cost_usd`	No	Estimated total financial impact in USD
`cost_per_record_usd`	No	Cost per compromised record in USD
`regulatory_fines_usd`	No
`cyber_insurance_impact`	No
`reputational_damage_usd`	No	Estimated reputational damage cost in USD

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. Description adds context that it 'estimates' impact, which is consistent. No additional behavioral traits needed beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three well-structured sentences: core function, inputs/outputs, ideal use cases. No fluff, front-loaded with key purpose. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given output schema exists, description adequately covers inputs and outputs. Missing mention of 'detection_time_days' and 'async' parameters but these are secondary. Overall sufficient for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions for each parameter. Description mentions 'breach size, industry sector, and affected jurisdictions' but does not add significant new meaning beyond schema. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states verb+resource: 'Estimates financial impact of a data breach across three jurisdictions (US, EU, UK) for CFO strategic planning.' It specifies unique scope (three jurisdictions) and distinguishes from sibling tools like 'cyber_risk_auditor' or 'incident_response_evidence_collector' which cover different aspects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Ideal for cross-border risk assessment, financial contingency planning, and board-level reporting,' giving clear context. However, it does not mention when NOT to use it or provide alternatives, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sre_slo_breach_predictorA

Read-onlyIdempotent

Inspect

As a CTO, predict potential SLO breaches 24 hours in advance by analyzing public incident reports and MITRE ATT&CK techniques. Input your service's critical components and reliability thresholds to receive breach probability scores, top contributing TTPs, and recommended mitigations. Uses MITRE ATT&CK, GitHub Advisories, and Cloudflare Radar data. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`time_window_hours`	No
`service_components`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`incident_reports`	No
`breach_probability`	No
`recommended_actions`	No
`top_ttp_contributors`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, openWorldHint. The description adds behavioral details like data sources used and async timeout behavior. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose, then input/output, then data sources and usage tip. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists and annotations provide safety hints, the description covers purpose, input, output, data sources, and async usage. Sufficient for a CTO to understand and invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 33% (only async described). The description explains service_components as 'critical components and reliability thresholds' and time_window_hours default, adding some meaning beyond schema. However, the 'tags' subfield is not explained.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool predicts SLO breaches 24 hours in advance using specific data sources. It specifies the output (breach probability scores, TTPs, mitigations) and differentiates from siblings by focusing on SLOs and reliability.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context (CTO, reliability engineering) and input requirements (critical components, thresholds). It also advises using async:true to avoid timeout, but does not explicitly state when not to use or compare with similar tools like cyber_risk_auditor.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

strategic_options_analyzerC

Read-only

Inspect

Analyseur d'options stratégiques — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: Aircall — 5 options stratégiques post-Série D (2023-2024). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`optionHypotheses`	Yes
`strategicContext`	Yes
`founderConstraints`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true and openWorldHint=true. The description states it returns a structured deliverable and is 'agent-payable,' adding some context. However, it does not elaborate on the nature of the analysis or potential external dependencies, so transparency is moderate but adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, efficiently conveying purpose, target audience, and input validation. Some redundancy exists ('Gapup agent-payable C-suite expertise' and 'CSO' repeat similar concepts), but overall it is concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is complex (5 parameters, nested objects, no output schema), but the description does not explain return values, deliverable structure, or typical usage context. The reference case helps but is insufficient for full understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema coverage is low (20%), yet the description provides no parameter explanations beyond 'send the documented case fields.' The schema has many nested objects without descriptions, so the description fails to compensate for the lack of parameter documentation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is an analyzer of strategic options for C-suite (CSO) and returns a structured, audited deliverable, referencing a case study. This distinguishes it from general tools, though some sibling tools like 'market_entry_strategist' or 'growth_path_architect' could overlap.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lacks explicit guidance on when to use this tool vs alternatives. It mentions 'Gapup agent-payable C-suite expertise' and input validation, but no exclusions or conditions for choosing this over similar strategic analysis tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

supplier_esg_auditC

Read-only

Inspect

Audit ESG des fournisseurs — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: TechCorp — Audit ESG fournisseurs 2025 (5 fournisseurs, €1.37M spend). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`suppliers`	Yes
`targetScore`	No
`auditCriteria`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint=true and openWorldHint=true, which are clear. The description adds that inputs are validated server-side, which is useful, but does not expand on behavioral traits like rate limits or result format. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (three sentences) and front-loaded with purpose. The reference case provides context but is not essential. It is concise overall.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex input schema with nested objects and no output schema, the description should explain what the deliverable contains or how to interpret results. It only says 'structured, audited deliverable' which is insufficient for an agent to understand the output.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, and the description does not explain any parameters beyond the generic 'send the documented case fields'. It does not add meaning to the complex nested parameters (company, suppliers, auditCriteria).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Audit ESG des fournisseurs' (supplier ESG audit) and mentions it returns a structured deliverable. However, it does not differentiate from sibling tools like esg_audit_multi or sustainability_report, which are similar ESG audit tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives. The description only mentions 'send the documented case fields' without explaining any prerequisites or exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

supply_chain_fx_exposure_dashboardA

Read-onlyIdempotent

Inspect

Provides real-time foreign exchange exposure dashboard for supply chain monitoring. Designed for COO persona to track currency risk across suppliers and regions. Inputs include supplier IDs, base currency, and target currencies. Outputs structured FX exposure data with risk indicators, exchange rates, and supplier impact analysis sourced from World Bank LPI and live FX rate APIs.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`supplierIds`	No	List of supplier identifiers to analyze
`baseCurrency`	Yes	Base currency code (ISO 4217) for exposure calculation
`riskThreshold`	No	Percentage threshold for high-risk exposure flagging
`targetCurrencies`	Yes	Target currency codes (ISO 4217) to compare against base

Output Schema

ParametersJSON Schema

Name	Required	Description
`data`	No
`status`	Yes
`sources`	No
`warnings`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read-only, open-world, and idempotent behavior. The description adds context about real-time data sources (World Bank LPI, live FX rates) and output structure (risk indicators, exchange rates). No contradictions, and it enriches the agent's understanding beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences with no filler. Front-loaded purpose, persona, inputs, and outputs. Each sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (5 params, 2 required) and presence of output schema, the description covers purpose, data sources, output types, and persona. It lacks mention of the async option, but that is covered in schema. Overall adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description mentions inputs like supplier IDs, base currency, and target currencies, but does not add significant new meaning beyond the schema descriptions. Output info is provided but not parameter-specific.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides a real-time FX exposure dashboard for supply chain monitoring, specifically designed for a COO persona. It distinguishes itself from siblings by focusing on supply chain currency risk tracking across suppliers and regions.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for COO persona tracking currency risk but does not explicitly state when to use this tool versus alternatives like fx_rate or working_capital_fx_hedge_optimizer. No when-not-to-use or alternative tooling is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sustainability_reportC

Read-only

Inspect

Rapport de durabilité — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: GreenLoop Solutions — rapport durabilité B-Corp 2025 (95 FTE, €18M CA). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`pillars`	Yes
`stakeholders`	Yes
`targetLabels`	No
`existingLabels`	No
`audienceProfile`	Yes

Tool Definition Quality

C2.5/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint. The description adds only that it returns a deliverable and that inputs are validated, but does not disclose async behavior (noted in schema), rate limits, or side effects beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (3 sentences) but includes unnecessary marketing jargon and lacks structured format. It is acceptable but not optimally concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 params, nested objects, no output schema) and low schema coverage, the description is incomplete. It does not explain return format, async usage, or required fields sufficiently.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema coverage at 13%, the description does not explain parameters' meaning or usage beyond 'send the documented case fields'. It provides a vague hint via example but no direct parameter mapping.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a 'structured, audited deliverable' for sustainability reporting, with a reference case. However, it does not differentiate from sibling tools like 'sustainability_reporting_pilot'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives. The description mentions a reference case but lacks context for selection among many similar tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sustainability_reporting_pilotC

Read-only

Inspect

Pilote de reporting durabilité — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: AlphaTech Industries SAS — premier rapport CSRD wave 2 (exercice 2025). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`dataInputs`	Yes
`materiality`	Yes
`targetFrameworks`	Yes

Tool Definition Quality

C2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description fails to add behavioral context beyond annotations (readOnlyHint, openWorldHint). It does not disclose side effects, auth needs, rate limits, or what 'audited deliverable' entails. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is 4 sentences but includes unnecessary details like a specific reference case (AlphaTech Industries SAS) and cryptic jargon, reducing clarity and conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested objects, no output schema), the description is grossly insufficient. It does not explain input constraints, output format, error handling, or how validation works.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, meaning most parameters lack descriptions. The description does not explain any parameter meaning, only vaguely saying 'send the documented case fields.'

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description indicates it is a sustainability reporting pilot returning a structured deliverable, but it does not differentiate from similar tools like sustainability_report or esg_audit_multi. The cryptic phrase 'Gapup agent-payable C-suite expertise (RISK)' adds confusion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. It only mentions input validation server-side, but lacks prerequisites, context, or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

syndicated_loan_covenant_breach_alertA

Read-onlyIdempotent

Inspect

Monitors syndicated loan covenants for potential breaches by analyzing Tradeweb market data. Designed for CFOs to proactively identify financial compliance risks in loan agreements. Accepts loan identifiers, covenant thresholds, and reporting period as inputs. Returns structured breach alerts with market context and severity indicators.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`loanId`	Yes	Unique identifier for the syndicated loan
`currency`	No	ISO currency code for financial values
`reportingPeriod`	Yes	Time period for covenant compliance check
`covenantThresholds`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`breaches`	No
`warnings`	No

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, idempotentHint. Description adds that it uses Tradeweb data and returns alerts, which is consistent and provides context but does not disclose new behavioral traits beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each serves a purpose: action, target user, inputs/outputs. No unnecessary words. Front-loaded with the core function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given moderate complexity with nested objects and output schema, description gives adequate high-level overview. Includes target user, inputs, and output type. Missing details like severity indicator types, but output schema likely covers that.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is high (80%). The description mentions 'loan identifiers, covenant thresholds, and reporting period' which maps to required parameters but adds no detail on individual fields like the ratio names. Does not enhance beyond schema significantly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it monitors syndicated loan covenants for breaches using Tradeweb data. 'Monitors' is a specific verb, resource is 'syndicated loan covenants'. Distinguishes from siblings like bond_covenant_monitor by specifying loan type and data source.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Designed for CFOs to proactively identify risks, which implies when to use. Does not explicitly exclude alternatives or state when not to use, but context is clear. No explicit comparison to siblings like bond_covenant_monitor.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

syndicated_loan_pricing_benchmarkA

Read-onlyIdempotent

Inspect

Provides CFOs with peer benchmarking for syndicated loan pricing by comparing current loan terms against market data from Tradeweb and FRED. Inputs include loan amount, tenor, credit rating, and currency. Outputs structured pricing benchmarks with spread, yield, and fee comparisons. Ideal for quick validation of loan competitiveness or negotiation preparation.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`tenor`	Yes	Loan tenor (e.g., '5Y', '3Y')
`region`	No	Region for benchmarking (e.g., 'US', 'EU')
`currency`	Yes	Currency code (e.g., 'USD', 'EUR')
`loanAmount`	Yes	Loan amount in millions
`creditRating`	Yes	Borrower credit rating (e.g., 'BBB', 'BB+')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`benchmarks`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, idempotentHint, and openWorldHint. The description adds value by naming data sources (Tradeweb, FRED) and specifying output (spread, yield, fee comparisons). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences: first sentence states purpose and inputs; second sentence describes output and use case. No wasted words, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Tool has 6 parameters including async, and an output schema. The description covers purpose, inputs, output type, and use case. It does not explain the async parameter, but that is common across tools and not critical given annotations cover safety.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with all parameters described. The description lists loan amount, tenor, credit rating, and currency but does not add meaning beyond what the schema already provides (e.g., no format details for tenor or examples). Thus baseline score.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool's purpose: providing CFOs with peer benchmarking for syndicated loan pricing using data from Tradeweb and FRED. It clearly identifies the verb (provides benchmarking), resource (syndicated loan pricing), and specific use case (validation, negotiation). This distinguishes it from siblings like 'syndicated_loan_covenant_breach_alert'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description notes the tool is 'Ideal for quick validation of loan competitiveness or negotiation preparation,' giving clear context for use. However, it does not explicitly state when not to use it or mention alternative tools, so it lacks exclusion guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

talent_contract_risk_mapperA

Read-onlyIdempotent

Inspect

For CHROs: analyzes employee contracts for non-compete, IP assignment, and confidentiality clauses, comparing against state labor laws and jurisdiction-specific precedents. Returns risk levels, conflicting statutes, and suggested revisions. Uses USPTO PatFT, CourtListener, and EUR-Lex for legal cross-referencing. Ideal for contract reviews, compliance audits, or policy updates.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`jurisdiction`	Yes	State or country jurisdiction (e.g., 'California', 'Germany')
`contract_text`	Yes	Full text of the employee contract or clause section to analyze
`employee_role`	No	Job title or role classification (e.g., 'Software Engineer', 'Executive')
`effective_date`	No	Contract effective date (YYYY-MM-DD)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`risk_summary`	No
`suggested_revisions`	No
`conflicting_statutes`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses data sources (USPTO PatFT, CourtListener, EUR-Lex) and output type (risk levels, conflicting statutes, suggested revisions). Annotations already indicate read-only and idempotent; description adds behavioral context beyond that.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise, front-loaded with audience and action. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and presence of an output schema, the description covers key aspects: audience, input types, analysis scope, data sources, and use cases. Could mention limits or prerequisites but adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so descriptions adequately document parameters. Description does not add significant new semantics beyond what is in the schema, earning a baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool analyzes employee contracts for non-compete, IP assignment, and confidentiality clauses, comparing against state labor laws and jurisdiction-specific precedents. Distinguishes from siblings like contract_risk_scanner by focusing on CHROs and specific clause types.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context: 'Ideal for contract reviews, compliance audits, or policy updates.' Does not explicitly state when not to use or compare to alternatives, but the description gives a clear sense of appropriate scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

talent_intelligenceA

Read-only

Inspect

HR tech intelligence for CHROs, recruiters, VC teams, comp & benefits leads and workforce planners. Four modes powered by ESCO, O*NET, BLS OES and crowd-sourced salary data:

• salary_benchmark — cash-only salary medians (p25/median/p75) for 54+ roles across US/EU/Asia. Covers tech, finance, compliance, healthcare, marketing, ops and C-suite. Data from BLS OES, Levels.fyi and StackOverflow Developer Survey 2024. • skills_taxonomy — maps a skill to its ESCO URI, O*NET codes, skill type (hard/soft/knowledge/cert), 8 related skills with similarity scores and typical roles. • job_market_trends — YoY growth %, open positions estimate, top employers and leading skills per job category × country. Static 2024 data with BLS baseline fallback. • adjacent_roles — up to 6 roles adjacent to a source role with ESCO taxonomy adjacency: similarity score, salary delta % and skills overlap %.

All salary data is cash-only (excludes equity/RSU/bonus). Cache TTL: 24h (stable labour market data). Optional env ONET_API_KEY for authenticated O*NET lookups (free registration at onetcenter.org).

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analysis mode: salary_benchmark=compensation data, skills_taxonomy=ESCO/O*NET mapping, job_market_trends=market growth and demand, adjacent_roles=career path recommendations.
`role`	No	Job title (required for salary_benchmark, job_market_trends, adjacent_roles). Examples: "Senior Software Engineer", "Compliance Officer", "Data Scientist", "CFO".
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`skill`	No	Skill to classify (required for skills_taxonomy mode). Examples: "Python", "transformer architecture", "GDPR", "Kubernetes", "leadership".
`country`	No	ISO 2-letter country code. Default: US. Examples: US, FR, DE, GB, SG.
`seniority`	No	Seniority level. Default: senior. Affects salary benchmark ranges.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`sources`	Yes
`quality_score`	Yes
`adjacent_roles`	No
`skills_taxonomy`	No
`salary_benchmark`	No
`job_market_trends`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description goes beyond by noting that salary data is 'cash-only (excludes equity/RSU/bonus)', cache TTL is 24h, and an optional API key is available. It also explains the static nature of 2024 data. This adds valuable behavioral context not present in annotations alone.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear introduction, bulleted mode details, and a closing note on limitations and configuration. It is front-loaded with the overall purpose and audience. While it is fairly long, every sentence serves a purpose and adds necessary detail. A slight reduction could be made without losing clarity, but overall it is efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 modes, 6 parameters, multiple data sources), the description is highly complete. It covers each mode's purpose and data sources, required parameters per mode, limitations (cash-only, static data), cache behavior, and optional API key setup. With an output schema present, the description adequately prepares an agent for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds meaning by explaining which parameters are required for each mode (e.g., 'role required for salary_benchmark, job_market_trends, adjacent_roles'), provides example values, and clarifies defaults (e.g., country defaults to US, seniority defaults to senior). This goes beyond the schema's basic parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'HR tech intelligence' as the tool's domain and enumerates four distinct modes (salary_benchmark, skills_taxonomy, job_market_trends, adjacent_roles) with specific data sources and usage context. It names the intended audience (CHROs, recruiters, etc.) and distinguishes the tool from siblings by detailing its unique capabilities and data coverage.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context for each mode but does not explicitly tell when to use this tool versus alternatives. Usage guidance is implied through the mode descriptions (e.g., 'salary_benchmark — cash-only salary medians'), but no direct comparison to sibling tools like 'comp_benchmark_geo_delta' or 'global_salary_inflation_adjuster' is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

talent_legal_dashboardA

Read-onlyIdempotent

Inspect

Generates a real-time legal risk dashboard for CHROs, covering contracts, intellectual property, and labor law compliance. Inputs include jurisdiction, employee count, and risk thresholds; outputs include risk scores, actionable alerts, and source citations. Ideal for proactive legal risk management and compliance monitoring. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`includeIP`	No
`jurisdiction`	Yes
`employeeCount`	Yes
`riskThreshold`	No
`includeLaborLaw`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`alerts`	No
`status`	Yes
`sources`	No
`warnings`	No
`riskScore`	No
`lastUpdated`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint (all true). The description adds value by noting real-time generation and async support to prevent timeouts, which goes beyond annotation scope. It also specifies output types (risk scores, alerts, source citations). However, it does not disclose potential limitations like data latency or accuracy, which would warrant a higher score.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the core function. Every sentence adds essential information (purpose, inputs/outputs, usage tip). No redundant or extra words. It is highly concise while still being informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the main inputs and outputs, target audience (CHROs), and use case (proactive risk management). With an output schema present (not shown), return values are already partially documented. The description lacks mention of boolean parameter defaults (includeIP, includeLaborLaw default true) and riskThreshold range, but overall it provides sufficient context for a dashboard tool with good annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (17%, only async documented). The description compensates by naming jurisdiction, employeeCount, and riskThreshold as inputs, and it mentions async usage. However, includeIP and includeLaborLaw are not described, leaving their purpose implied by the broader context. This partial coverage adds some value beyond schema defaults but is not comprehensive.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it generates a real-time legal risk dashboard for CHROs, covering contracts, IP, and labor law compliance. It specifies inputs (jurisdiction, employee count, risk thresholds) and outputs (risk scores, alerts, source citations). This differentiates it from sibling tools like talent_contract_risk_mapper (specific to contracts) and talent_litigation_exposure (focused on litigation), making the purpose and scope distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context ('Ideal for proactive legal risk management and compliance monitoring') and an async tip to avoid timeout. However, it does not explicitly state when to use this dashboard versus sibling alternatives like talent_contract_risk_mapper or legal_clause_extractor. No exclusions or when-not-to-use guidance is given, so usage guidance is implied rather than explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

talent_litigation_exposureA

Read-onlyIdempotent

Inspect

Estimates litigation exposure risk for CHROs by analyzing past employee lawsuits, settlement amounts, and industry benchmarks. Inputs include company location, industry code, and employee count range. Returns exposure score, average settlement amounts, lawsuit frequency trends, and risk factors. Ideal for legal risk assessment, HR strategy planning, and board-level reporting. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`industry_code`	Yes	NAICS industry code (e.g., '541511' for IT services)
`employee_count`	No	Current number of employees
`lookback_years`	No	Number of years to analyze
`company_location`	Yes	State or region where company operates (e.g., 'CA', 'New York')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`warnings`	Yes
`avg_settlement`	No	Average settlement amount in USD
`exposure_score`	Yes	Normalized risk score (0-100)
`historical_trend`	No
`top_risk_factors`	No
`lawsuit_frequency`	No	Lawsuits per 1000 employees per year
`industry_benchmark`	No	Industry average exposure score

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and openWorldHint. Description adds important behavioral context: the async parameter to avoid timeout, and the return fields. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is four sentences, front-loaded with purpose, followed by inputs, outputs, and usage. Efficient with no wasted words, though could be slightly more structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the annotations and output schema, the description covers purpose, inputs, outputs, use cases, and async behavior. It is sufficient for an agent to decide when and how to use the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so descriptions for all parameters exist. The tool description only lists parameter names without adding new meaning. Minor note: description says 'employee count range' but schema defines a single number.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool estimates litigation exposure risk for CHROs, specifying the verb 'estimates' and the resource 'litigation exposure risk'. It is distinct from sibling tools like talent_contract_risk_mapper or talent_poaching_risk.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description mentions ideal use cases (legal risk assessment, HR strategy, board reporting) but provides no explicit comparison to alternative tools. The async tip is helpful but does not exclude other contexts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

talent_poaching_riskA

Read-onlyIdempotent

Inspect

Analyzes employee poaching risk for CHROs by evaluating LinkedIn profile activity (job searches, profile views) and comparing compensation against BLS benchmarks. Returns a ranked list of high-risk employees with risk scores and suggested retention actions. Ideal for proactive talent retention strategies. Keywords: employee retention, poaching risk, compensation benchmark, LinkedIn activity, CHRO analytics.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`location`	No	Geographic location filter (e.g., 'San Francisco, CA')
`department`	Yes	Department filter (e.g., 'Engineering', 'Sales')
`min_tenure_months`	No	Minimum tenure in months to include in analysis
`benchmark_job_title`	No	Specific job title for compensation benchmarking

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`risk_assessment`	No
`department_avg_risk`	No
`benchmark_comparison`	No

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, openWorldHint, idempotentHint) are all true and consistent. The description adds useful context about data sources and output format. However, it does not mention the async parameter behavior or any limitations such as performance or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with four sentences covering purpose, data sources, output, and keywords. It is front-loaded with the most important information, though the keyword list could be omitted or integrated.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 5 parameters and an output schema, the description adequately covers the core functionality and return values. It lacks explanation of the async parameter, but overall it is complete enough for an agent to understand the tool's purpose and inputs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions. The description adds no new parameter-specific meaning beyond what is in the schema, achieving the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes employee poaching risk for CHROs using specific data sources. It specifies the output (ranked list with risk scores and actions). However, it does not differentiate from sibling HR tools like 'talent_intelligence' or 'talent_contract_risk_mapper', which could lead to confusion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions it is 'Ideal for proactive talent retention strategies', which implies when to use it. However, it lacks explicit guidance on when not to use it or which alternatives exist among the many HR-related sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tariff_arbitrage_finderA

Read-onlyIdempotent

Inspect

As a COO, identify tariff reclassification opportunities to reduce import costs. Analyzes product HS codes against WTO TFA and USA Trade Online data to find lower-duty classifications. Inputs: product description, current HS code, country of origin, and annual import volume. Outputs: potential duty savings, alternative HS codes, and compliance considerations.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`annualVolume`	No
`currentHsCode`	Yes
`countryOfOrigin`	Yes
`currentDutyRate`	No
`productDescription`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`opportunities`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, openWorldHint, and idempotentHint. The description adds value by disclosing data sources (WTO TFA, USA Trade Online) and mentioning compliance considerations as an output, which are behavioral traits beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, directly stating purpose, inputs, and outputs without extraneous information. It is front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, inputs, outputs, and data sources. Given the complexity (6 params, output schema exists), it provides sufficient context for most usage. Minor gaps include lack of detail on compliance considerations and async parameter behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 17% schema description coverage, the description partially compensates by listing key inputs (product description, current HS code, country of origin, annual import volume), but it does not explain the current duty rate parameter or the async parameter fully. Some parameters lack semantic context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: identifying tariff reclassification opportunities to reduce import costs. It specifies the analysis against WTO TFA and USA Trade Online data, which distinguishes it from sibling tools like tariff_impact_simulator or trade_finance_eligibility.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for COOs but does not explicitly state when to use this tool versus alternatives. It lists inputs and outputs but lacks exclusion criteria or when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tariff_impact_simulatorA

Read-onlyIdempotent

Inspect

As a COO, model how proposed tariff changes affect landed costs for imported goods. Inputs: HS code, current tariff rate, proposed tariff rate, product value, shipping cost, and country of origin. Outputs: detailed cost breakdown including new duties, taxes, and total landed cost impact. Sources include WTO TFA and US Census trade data.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`hsCode`	Yes
`productValue`	Yes
`shippingCost`	No
`countryOfOrigin`	Yes
`currentTariffRate`	Yes
`proposedTariffRate`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`costImpact`	No
`currentDuty`	No
`proposedDuty`	No
`dutyDifference`	No
`currentLandedCost`	No
`proposedLandedCost`	No
`costImpactPercentage`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly, openWorld, and idempotent. Description adds value by specifying data sources (WTO, US Census) and role (COO), but does not disclose rate limits or other behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences cover purpose, inputs, outputs, and sources without redundancy. Front-loaded with role and action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given output schema exists and annotations are strong, description adequately covers inputs, outputs, and sources. Minor gap: async parameter not mentioned, but it's in schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 14% schema coverage, description compensates by listing all input parameters (HS code, rates, value, shipping cost, country) and their purpose, though does not explain constraints like min/max.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool models tariff impact on landed costs, specifying inputs and outputs. It distinguishes from siblings like tariff_arbitrage_finder by focusing on simulation rather than arbitrage.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies use for tariff impact analysis but lacks explicit when-to-use or when-not-to-use guidance. No comparison to sibling tools mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tax_compliance_multiA

Read-onlyIdempotent

Inspect

Multi-jurisdiction tax compliance data for international SaaS, cross-border marketplaces and expat services. Five modes: (1) vat_lookup — validate EU VAT numbers live via VIES SOAP (27 EU countries) or UK VRN via HMRC; (2) sales_tax — US state sales tax rates, nexus thresholds (post-Wayfair 2018), digital goods taxability for all 50 states + DC; (3) gst — APAC GST/SST/consumption-tax rates for IN, SG, AU, NZ, MY, JP, KR, TH, ID, PH, VN with reduced rates and registration thresholds; (4) oss_ioss_eligibility — EU One-Stop-Shop and Import-OSS eligibility analysis (EUR 10k OSS threshold, EUR 150 IOSS per-consignment); (5) transfer_pricing_benchmark — OECD/JTPF operating-margin benchmarks by industry and country (20+ sectors, country-specific adjustments). Returns P0/P1/P2 compliance signals: P0=invalid VAT used for zero-rating, P1=taxable digital goods detected/audit risk, P2=filing deadlines/nexus alerts. Keyless — no API key required. Optional env: HMRC_VAT_API_KEY for UK VAT live validation. Cache TTL 24h.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Tax mode to invoke.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Mode-specific query: vat_lookup -> VAT number with country prefix (e.g. 'FR40303265045'); sales_tax -> US state code or name (e.g. 'CA', 'California'); gst -> ISO country code (e.g. 'SG', 'IN', 'AU'); oss_ioss_eligibility -> annual EU B2C revenue in EUR or keyword (e.g. '5000', 'below'); transfer_pricing_benchmark -> industry name (e.g. 'manufacturing', 'saas', 'r&d').
`country`	No	ISO 3166-1 alpha-2 country code. Required for gst when query is ambiguous. Used in transfer_pricing_benchmark for country-specific OECD adjustments.
`transaction_type`	No	Transaction type for signal generation. 'digital' triggers GST/sales-tax digital goods warnings.

Output Schema

ParametersJSON Schema

Name	Required	Description
`gst`	No
`mode`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`oss_ioss`	No
`sales_tax`	No
`vat_lookup`	No
`quality_score`	Yes
`transfer_pricing`	No

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and destructiveHint. The description adds value by detailing return signals (P0/P1/P2), cache TTL (24h), and keyless access, which are not evident from annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is comprehensive but slightly long. It is well-structured by modes and front-loaded with the primary purpose. Each sentence adds value, though a more concise summary could improve scanability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters with full schema coverage, an output schema, and no nested objects, the description covers all modes, return signals, optional environment variable, and caching behavior, leaving no gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, baseline is 3. The description enriches each parameter with concrete examples (e.g., 'FR40303265045' for vat_lookup, 'CA' or 'California' for sales_tax) and explains the async parameter's purpose for slow operations.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool does multi-jurisdiction tax compliance with five distinct modes (vat_lookup, sales_tax, gst, oss_ioss_eligibility, transfer_pricing_benchmark), each precisely defined. It differentiates the tool from siblings like tax_optimization by focusing on compliance signals.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides detailed context for each mode, including data sources (VIES, HMRC), jurisdictional scope (EU, US, APAC), and key thresholds. However, it does not explicitly compare to alternative tools or specify when not to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tax_optimizationC

Read-only

Inspect

Optimisation fiscale — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Pennylane — Fiscalité optimisée · CIR €1.2M · IP Box France 10% · Économie totale €2.4M/an. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`ipAssets`	No
`activities`	Yes
`financials`	Yes
`jurisdictions`	Yes
`currentTaxOptimizations`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, indicating a read-only, world-dependent operation. The description adds that inputs are validated server-side and that the tool returns a deliverable. This provides some behavioral context beyond annotations, but lacks details on error handling, limits, or response format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise, consisting of two main sentences plus an example and a validation note. Some jargon reduces clarity, but overall it is not overly verbose. The structure frontloads the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, nested objects, no output schema), the description is incomplete. It does not describe the output structure, interpretation, or potential issues like validation errors. The example helps but does not cover the full range of inputs or behaviors.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (14%), with only the 'async' parameter documented. The tool description does not explain any of the seven parameters, relying on the schema which is insufficient. The phrase 'send the documented case fields' implies external documentation, but the description fails to compensate for the schema's gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states 'Optimisation fiscale' and mentions returning a structured, audited deliverable, with a reference case indicating tax optimization strategies. It distinguishes itself from sibling tools like tax_compliance_multi and ma_tax_efficiency_mapper by focusing on optimization. However, it lacks a clear verb like 'analyzes' or 'optimizes' and the jargon 'agent-payable C-suite expertise (CFO)' obscures purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance is provided on when to use this tool versus alternatives. The description includes a reference case but no when-to-use or when-not-to-use instructions. Sibling tools exist for tax compliance and M&A tax efficiency, but no comparative guidance is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

term_sheet_negotiationC

Read-only

Inspect

Négociation term sheet — Gapup agent-payable C-suite expertise (FUNDRAISING). Returns a structured, audited deliverable. Reference case: Agicap Série C €50M — 8 clauses analysées · 3 rouges · Score fondateur 62/100 → plan pour 81. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`round`	Yes
`company`	Yes
`termSheetClauses`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint. The description adds that inputs are validated server-side and that async mode is available via the 'async' parameter. This covers some behavioral traits but does not elaborate on side effects or other constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short and front-loaded with the tool's purpose. The reference case provides useful context without being overly verbose. Could be slightly more structured but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of nested input objects and no output schema, the description is insufficient. It does not explain the structure of the deliverable or how to interpret results, leaving the agent with incomplete information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (only 'async' is documented). The general instruction 'send the documented case fields' adds minimal meaning. Key parameters (company, round, termSheetClauses) lack any added context in the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool does 'négociation term sheet' and mentions it returns a structured, audited deliverable. The reference case provides a concrete example. However, it does not explicitly differentiate from siblings like 'deal_coach' or 'deal_structurer', but the fundraising context makes it reasonably distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description mentions 'FUNDRAISING' context but does not state when not to use it or suggest other tools for similar tasks.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tool_recommendA

Read-only

Inspect

Cross-tool recommendation system: given a free-text intent, returns the most appropriate tools from the 170+ Gapup MCP catalogue, ranked by confidence, with pre-filled input suggestions and an optimal multi-tool chain when applicable. Use this first when you are unsure which tool to call — it navigates the full catalogue for you. Supports 15+ static pre-designed chains for frequent intents (M&A due diligence, sanctions screening, ESG 360, AI Act compliance, FTO patent clearance, crypto wallet tracking, etc.). Domains: compliance | finance | intel | legal | content | data | trade | infra. Pure compute — $0.01/call, no external fetch. Ideal as a first call in any multi-step agent workflow.

ParametersJSON Schema

Name	Required	Description
`lang`	No	Optional ISO 639-1 language hint (fr, en, de, zh, es …). Used for language-aware boosting.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`domain`	No	Optional domain hint to boost tools in this category.
`intent`	Yes	Free-text description of what you want to accomplish. E.g. 'Run a full M&A due diligence on Acme Corp' or 'Je veux vérifier qu'un fournisseur n'est pas sous sanctions OFAC'. FR/EN/DE/ZH supported.
`max_results`	No	Max number of recommendations returned (1-10). Default 5.
`include_chain`	No	Whether to include a suggested_chain of tools in the optimal sequence. Default true. Chain is always included for well-known intents (M&A, compliance, ESG, etc.).

Output Schema

ParametersJSON Schema

Name	Required	Description
`intent`	Yes
`status`	Yes
`sources`	No
`not_covered`	No
`quality_score`	Yes
`recommendations`	Yes
`suggested_chain`	No
`alternative_paths`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, openWorldHint), the description adds valuable behavioral context: 'Pure compute — $0.01/call, no external fetch', and mentions pre-designed chains and domain support, enhancing agent understanding.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is informative and front-loads the main purpose, but is slightly verbose. Each sentence adds value, though minor trimming could improve conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 params, output schema exists, annotations present), the description covers use case, when to use, domains, cost, and chains. It does not detail return values, but the output schema fills that gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage, so parameters are already documented. The description reinforces overall functionality but adds little new parameter-specific meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: given a free-text intent, it returns recommended tools from a catalogue, ranked with suggestions and chains. It distinguishes itself from siblings by being a 'first call' when unsure, making the purpose specific and distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly advises using this tool first when unsure which tool to call and positions it as ideal for multi-step workflows. It provides clear context but lacks explicit exclusions or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

trade_finance_eligibilityA

Read-onlyIdempotent

Inspect

Evaluates trade finance eligibility for CFOs by analyzing counterparty risk and jurisdiction using World Bank and BIS data. Inputs include counterparty country code (ISO 3166-1 alpha-3) and industry sector. Returns risk scores, eligibility flags, and financing terms. Ideal for assessing letters of credit, export credit agency guarantees, and other trade finance instruments. Keywords: trade finance, counterparty risk, jurisdiction risk, letters of credit, ECA guarantees.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`industrySector`	Yes
`annualTradeVolumeUSD`	No
`counterpartyCountryCode`	Yes
`counterpartyCreditRating`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`eligibility`	No
`financingTerms`	No
`countryRiskScore`	No
`maxFinancingAmountUSD`	No
`recommendedInstruments`	No

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, and idempotentHint. The description does not add behavioral details beyond stating the tool evaluates (consistent with read-only). It does not discuss data freshness, latency, authentication, or side effects. Since annotations carry the burden, a score of 3 is appropriate for not adding significant value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is compact: four sentences that front-load the purpose and provide keywords. It avoids unnecessary repetition and is well-structured. However, it could be slightly more efficient by removing the keyword list at the end, but overall it is appropriately sized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 5 parameters and an output schema, the description does not need to detail return values. However, it lacks explanation of how eligibility is determined or what factors influence the risk scores. For a financial tool, more context about data sources (World Bank, BIS) and limitations would improve completeness. It is adequate but not thorough.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (async has a description). The description explains counterpartyCountryCode must be ISO 3166-1 alpha-3 and lists industrySector as an input, but does not cover counterpartyCreditRating, annualTradeVolumeUSD, or async. It adds some context but leaves 3 of 5 parameters unexplained. Schema coverage is low, so the description should compensate more.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool evaluates trade finance eligibility by analyzing counterparty risk and jurisdiction using World Bank and BIS data. It specifies inputs (counterparty country code and industry sector) and outputs (risk scores, eligibility flags, financing terms), and lists ideal use cases (letters of credit, ECA guarantees). This distinguishes it from many sibling tools with different focuses.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the tool is 'ideal for assessing letters of credit, export credit agency guarantees, and other trade finance instruments' and targets CFOs, but it does not explicitly state when not to use it or compare it to alternative sibling tools like africa_trade_finance_esg_rater or tariff_arbitrage_finder. No exclusion criteria or prerequisites are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

transcribe_chapterize_mediaA

Read-onlyIdempotent

Inspect

Transcription and chapterization of long-form media (YouTube, podcasts, direct audio/video) for content marketing teams, podcast publishers, edu tech, journalists and accessibility/compliance.

Pipeline: • YouTube → timedtext captions (keyless) + oEmbed metadata + native timecode chapters from description • Podcast RSS → episode description + duration + timecodes if embedded in show notes • Direct media → partial (requires Whisper API via OPENAI_API_KEY + force_whisper:true) • Chapters: native YouTube timecodes preferred; heuristic TF-IDF segmentation as fallback • Summary: extractive TF-IDF top-sentences (no LLM required) • Language detection: character-set heuristic (CJK→zh, kana→ja, hangul→ko, accents→fr/de/es)

Output formats: json (full structured object) | text (plain transcript) | srt | vtt

SLA: ≤15s budget total. Cache: 24h TTL.

ParametersJSON Schema

Name	Required	Description
`url`	Yes	YouTube URL, podcast RSS feed URL, or direct MP3/MP4 URL. Example: "https://www.youtube.com/watch?v=jNQXAC9IVRw"
`lang`	No	ISO 639-1 language hint (e.g. "en", "fr", "de"). Default "auto".
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`chapters_max`	No	Maximum number of chapters. Default 8.
`output_format`	No	Transcript format. Default "json".
`include_summary`	No	Include extractive summary. Default true.

Output Schema

ParametersJSON Schema

Name	Required	Description
`url`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`summary`	No
`chapters`	Yes
`segments`	Yes
`key_topics`	Yes
`transcript`	Yes
`source_type`	Yes
`lang_detected`	Yes
`quality_score`	Yes
`duration_seconds`	Yes

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses the pipeline steps, language detection heuristics, fallback methods (TF-IDF segmentation), SLA (≤15s budget), cache TTL (24h), and the async behavior. This adds significant context beyond the annotations (readOnlyHint, idempotentHint), which are already present.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with bullet points and clear sections, front-loading the main purpose. It is informative without being verbose, though slightly longer than necessary. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, 100% schema coverage, output schema exists), the description covers all key aspects: sources, pipeline, output formats, SLA, caching, and language detection. It is complete and leaves no major gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the baseline is 3. The description adds value by explaining the async parameter's purpose, the language detection logic, and the output formats, providing richer context than schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Transcription and chapterization of long-form media' and lists specific sources (YouTube, podcasts, direct audio/video), clearly distinguishing the tool's purpose. It differentiates from siblings by detailing the pipeline and supported media types.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context on when to use the tool (for long-form media from YouTube, podcasts, or direct URIs) and mentions necessary prerequisites (API key for direct media). However, it lacks explicit 'when not to use' guidance or direct comparisons to sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

treasury_optimizerA

Read-only

Inspect

Optimiseur de trésorerie — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Alan — Trésorerie €380M post-Série F · Allocation optimale 4 instruments · Yield +145bp · +€5.5M/an. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`horizon`	No
`constraints`	Yes
`cashPosition`	Yes

Tool Definition Quality

A3.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds behavioral details: inputs are validated server-side, async parameter behavior (returns job_id immediately if async=true), and output is a structured audited deliverable. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is somewhat verbose, including marketing language like the Alan case with yield improvement. It front-loads the purpose but could be more concise. The structure is reasonable but not tight.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 parameters, nested objects, no output schema), the description lacks detail on the return format beyond 'structured, audited deliverable.' It also does not cover when not to use the tool or prerequisites. The async behavior explanation helps but completeness is lacking.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 20% (only 'async' has a description). The description does not explain the meaning or usage of the other parameters beyond 'send the documented case fields.' It adds minimal value over the schema for parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: treasury optimization for C-suite, returning a structured audited deliverable. It references a concrete case (Alan) and distinguishes itself from the many sibling tools, none of which focus on treasury optimization.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage context (finance executives) and mentions server-side validation, but does not explicitly state when to use this tool versus alternatives like working_capital or working_capital_esg_impact_rater among siblings. No when-not or alternative guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

trend_watcherA

Read-onlyIdempotent

Inspect

Monitor emerging trends, regulatory shifts and adoption signals for a given market sector. Returns 5-12 trend cards, each with a momentum score (rising/stable/declining), a 3-month and 12-month outlook, opportunity windows, and recommended actions. When to use this tool: the user asks what is heating up in a market, wants to time a product roadmap or content calendar, or needs an early read on a sector. Inputs: a sector to monitor and 3-8 keywords defining the watch perimeter. Delivered by Manue, the AI CMO of the Gapup portfolio.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No	Optional context (geography, language target, comparator window, etc.)
`sector`	Yes	Sector to monitor (e.g. 'B2B SaaS productivity', 'EU fintech', 'climate-tech hardware')
`keywords`	Yes	3-8 keywords describing the watch perimeter

Output Schema

ParametersJSON Schema

Name	Required	Description
`kpis`	No	3-5 headline KPI bubbles
`trends`	Yes	5-12 trend cards for the sector
`recommendations`	No	Prioritised strategic recommendations
`executiveSummary`	Yes	Board-ready sector overview prose

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read-only, open-world, idempotent, and non-destructive. Description adds behavioral details: returns 5-12 trend cards with specific structure (momentum, outlooks, etc.) and delivery by an AI persona. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: purpose, usage, inputs. The footer about 'Manue, the AI CMO' is extra but not harmful. Information is front-loaded and efficiently structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists and annotations are clear, the description covers purpose, input constraints, output format, and usage guidance comprehensively. No gaps for an agent to misuse.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline 3. Description mentions sector and keywords but does not add significant meaning beyond the schema. It lightly restates the inputs without new details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool monitors emerging trends, regulatory shifts, and adoption signals for a market sector. It specifies the output: 5-12 trend cards with momentum score, outlooks, opportunity windows, and actions. This distinguishes it from sibling tools like competitive deep dives or market sizing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly lists when to use: when user asks what is heating up, wants to time a roadmap or calendar, or needs an early read. No direct exclusions or alternative tool mentions, but the context is clear enough for most scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ugc_moderation_classifierA

Read-onlyIdempotent

Inspect

Multi-language UGC content moderation for marketplaces, social platforms and comment systems. Detects policy violations in text content across 9 policies and 12 languages without external API calls.

Policies checked: • hate — hate speech, slurs, dehumanization (50+ terms × 12 languages) • sexual — explicit sexual content, pornography references, nudity solicitation • violence — threats, weapon references, graphic violence • self_harm — suicidal ideation, self-injury, eating disorder promotion • harassment — doxxing, stalking, cyberbullying, blackmail • scam — phishing, investment fraud, romance scam, lottery fraud • spam — bots, keyword stuffing, excessive caps, emoji storms, suspicious URLs • copyright — piracy, leaked content, serial keys, streaming fraud • minor_safety — grooming signals, CSAM references, minor + adult content combos

Languages: en / fr / de / es / it / pt / nl / zh / ja / ko / ar / ru (auto-detected)

Output includes severity (low/medium/high/severe), confidence (0-100), matched patterns, excerpt, recommended action, age appropriateness (adult/teen/child), and signals.

No API key required. Stateless — no content is stored or logged.

ParametersJSON Schema

Name	Required	Description
`lang`	No	Language override. If omitted, language is auto-detected.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`content`	Yes	Text content to moderate (comment, review, post, chat message).
`policies`	No	Policies to check. Default: all 9 policies.
`content_type`	No	Type of content. Affects recommended_action heuristic. Default: comment.

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`signals`	Yes
`sources`	Yes
`violations`	Yes
`lang_detected`	Yes
`quality_score`	Yes
`age_appropriate`	Yes
`content_preview`	Yes
`policies_checked`	Yes
`recommended_action`	Yes

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and idempotentHint. The description adds valuable behavioral details: statelessness, no logging, auto-detection of language, and the async behavior with job polling. This fully informs the agent of the tool's traits beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with bullet points for policies and languages. It is somewhat long but appropriate for the tool's complexity. Front-loads the purpose and scope, then details policies and languages.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 5 parameters with full schema coverage and an output schema, the description covers all necessary context: purpose, supported policies, languages, async mode, and stateless behavior. No gaps are evident.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds meaning by explaining policy examples (e.g., '50+ terms × 12 languages' for hate) and output fields (severity, confidence, etc.), which enriches parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a multi-language UGC content moderation tool that detects policy violations in text. It lists 9 specific policies and 12 languages, providing a precise scope. There are no sibling tools with similar functionality, so it fully distinguishes itself.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains that no API key is required, no external calls are needed, and it is stateless with no content storage. It also mentions the async option for avoiding timeouts. However, it does not explicitly state when not to use this tool or suggest alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

upsell_hunterC

Read-only

Inspect

Chasseur d'upsell — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Upsell 8 comptes · €127k potentiel · Top 3 : Alan+Qonto+Pennylane · Playbook 5 étapes. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`horizon`	No
`product`	Yes
`accounts`	Yes
`targetUpsellEur`	No

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint, indicating safe read operation and external data usage. The description adds that it returns an audited deliverable and validates inputs server-side, but does not disclose other behaviors like mutation, rate limits, or auth needs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, which is concise, but includes extraneous details like a reference case and marketing language that do not help an agent select or invoke the tool. It could be more focused.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested objects, no output schema), the description is incomplete. It does not explain what the deliverable contains, how results are structured, or how to interpret outputs, leaving the agent without critical context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, and the description does not compensate by explaining any parameters. The mention of 'send the documented case fields' is vague and does not clarify the purpose or usage of parameters like accounts, product, or horizon.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description indicates it is an upsell hunter that returns a structured deliverable, with a reference case. However, the purpose is somewhat vague and not stated as a clear verb+resource. It mentions 'agent-payable C-suite expertise' which is not directly actionable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The sibling list includes many related tools like cross_sell_reco and account_expansion_mapper, but the description does not differentiate. There is no mention of prerequisites or conditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

usdc_x402_payments_intelA

Read-only

Inspect

Real-time analytics on x402 protocol USDC micropayments for MCP endpoints on Base network. Unique competitive advantage: aggregates internal production telemetry (our own traffic data) with on-chain USDC Transfer events and Bazaar marketplace listings — data no external competitor can access. Four modes: (1) facilitator_stats — Coinbase x402 facilitator settlement statistics (volume, count, top payees/payers). Uses Coinbase CDP API if COINBASE_X402_API_KEY is set; falls back to Base mainnet RPC scan of USDC transfers to known facilitator addresses. (2) endpoint_intel — Per-MCP-endpoint analytics: tx count, USDC volume, unique callers, success rate, catalog size. For gapup-mcp.io endpoints: reads internal JSONL telemetry (richest data source, unique). (3) agent_caller_profile — Anonymous profile of a calling agent wallet: tx count, USDC spent, top endpoints, inferred persona (depth-seeker / bulk-scanner / generalist / researcher / explorer). Wallet anonymised via SHA-256. (4) price_radar — USDC price distribution by tool category (data_lookup / synthesis / compliance / competitive) from Bazaar + internal catalog. Returns median, P25, P75. Network: Base mainnet. USDC contract: 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913. Cache: 30 min LRU. Timeout per source: 8s. Optional env: COINBASE_X402_API_KEY (higher-fidelity facilitator stats).

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analytics mode: facilitator_stats=network-wide settlements \| endpoint_intel=per-URL analytics \| agent_caller_profile=per-wallet analytics \| price_radar=price distribution by category
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`category`	No	Tool category for price_radar mode. Defaults to all.
`period_days`	No	Lookback window in days (5-90, default 30)
`endpoint_url`	No	MCP endpoint URL for endpoint_intel mode (e.g. https://mcp.gapup.io/mcp)
`wallet_address`	No	EVM wallet address for agent_caller_profile mode (0x...)

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`sources`	Yes
`price_radar`	No
`quality_score`	Yes
`endpoint_intel`	No
`facilitator_stats`	No
`agent_caller_profile`	No

Tool Definition Quality

A4.2/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond the annotations (readOnlyHint=true), the description adds significant behavioral context: caching (30 min LRU), timeout per source (8s), fallback behavior for facilitator_stats (Coinbase CDP API vs. Base RPC), network and USDC contract address, optional env variable, and wallet anonymization via SHA-256. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with enumerated modes and bullet points, making it scannable. However, it is verbose, including multiple sentences on competitive advantage and detailed mode descriptions that could be condensed. The essential information is front-loaded with the purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with four modes, multiple data sources, caching, and fallback behavior, the description covers most operational details. It mentions return values for price_radar but not explicitly for other modes (though an output schema exists). Overall, it provides sufficient context for an agent to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds value by explaining how parameters relate to specific modes (e.g., endpoint_url for endpoint_intel, wallet_address for agent_caller_profile, category for price_radar), which aids in understanding parameter relevance. This additional context justifies a score above baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides 'Real-time analytics on x402 protocol USDC micropayments for MCP endpoints on Base network' and enumerates four specific modes with distinct purposes. It distinguishes itself from siblings by highlighting its unique competitive advantage of aggregating internal telemetry with on-chain data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives clear guidance on when to use each mode (facilitator_stats, endpoint_intel, agent_caller_profile, price_radar) and what each mode returns. However, it does not explicitly advise when not to use this tool or how it compares to related sibling tools like x402_liquidity_monitor or x402_payment_fraud_detector, limiting its utility for tool selection among alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vendor_esg_blacklist_monitorA

Read-onlyIdempotent

Inspect

As a COO, quickly check if a vendor is blacklisted for ESG non-compliance using CDP and GRI data. Input the vendor's legal name or identifier to receive their ESG risk score, blacklist status, and compliance violations. Returns structured data including CDP disclosure score, GRI alignment, and any regulatory flags. Ideal for vendor due diligence, risk assessment, and sustainability reporting. Keywords: ESG, vendor risk, compliance, CDP, GRI, sustainability, blacklist.

ParametersJSON Schema

Name	Required	Description
`year`	No	Reporting year (default: current year)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`vendorId`	No	Optional identifier (e.g., LEI, DUNS)
`vendorName`	Yes	Legal name of the vendor to check

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`vendorId`	No
`warnings`	Yes
`griAligned`	No
`vendorName`	Yes
`violations`	No
`blacklisted`	Yes
`esgRiskScore`	No
`cdpDisclosureScore`	No

Tool Definition Quality

A3.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint, openWorldHint, and idempotentHint as true. The description adds value by detailing the output structure (CDP score, GRI alignment, regulatory flags) and emphasizes the quick check nature, complementing the annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is 5 sentences, front-loading the purpose and key details. It is relatively concise but includes some redundant phrases (e.g., 'As a COO') and a keyword list at the end that could be integrated or removed. Overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, 1 required) and the presence of an output schema and detailed annotations, the description adequately covers the tool's behavior and return value structure. It does not explain how to use the async feature, but this is a minor gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema coverage is 100%, with all 4 parameters described. The description only loosely covers vendorName and vendorId ('legal name or identifier') and mentions year implicitly via 'reporting year' context. It does not explain the async parameter beyond the schema, so it adds minimal value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'check' and the resource 'vendor blacklist status' using specific data sources (CDP, GRI). It differentiates from siblings like 'supplier_esg_audit' and 'vendor_esg_diversity_scanner' by focusing on blacklist monitoring, though it does not explicitly name alternatives.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies a use case ('As a COO', 'quickly check') but provides no explicit guidance on when not to use the tool or which sibling tool to use instead. No alternatives or exclusions are mentioned, leaving the agent without clear decision criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vendor_esg_diversity_scannerA

Read-onlyIdempotent

Inspect

For COOs: scans vendor ESG reports to identify suppliers lacking diversity disclosures in GRI or CDP filings. Input a supplier name or identifier to receive a structured assessment of gender, ethnicity, and board diversity metrics. Returns compliance gaps, missing data flags, and source references from CDP open data and GRI standards. Ideal for vendor risk assessment and ESG compliance tracking.

ParametersJSON Schema

Name	Required	Description
`year`	No	Reporting year to check (default: current year)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`supplierId`	No	CDP or GRI identifier for the supplier (e.g., CDP company ID)
`supplierName`	Yes	Exact or partial name of the supplier to scan

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`reportLinks`	No	URLs to relevant ESG reports
`supplierName`	Yes
`complianceScore`	Yes	Percentage compliance with diversity disclosure standards
`diversityDisclosures`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. Description adds value by specifying outputs ('structured assessment... compliance gaps, missing data flags, source references') and confirming no destructive effects. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each serving a purpose: audience and goal, input and output, and ideal use case. Front-loaded with key phrase 'For COOs' and actionable verb 'scans'. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema existence and rich annotations, the description is complete for an agent to select and invoke the tool. It covers purpose, inputs, outputs, and target users, and differentiates from a large set of sibling tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, and description aligns with key parameters (e.g., 'supplier name or identifier' matches supplierName/supplierId). However, description does not elaborate on 'year' or 'async' parameters beyond schema, so it adds minimal additional meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description uses specific verb ('scans') and resource ('vendor ESG reports'), explicitly states purpose ('identify suppliers lacking diversity disclosures'), and differentiates from siblings like 'supplier_esg_audit' or 'vendor_esg_blacklist_monitor' by focusing on diversity metrics and specific standards (GRI, CDP).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description clearly identifies target user ('For COOs') and use cases ('vendor risk assessment and ESG compliance tracking'). While it does not explicitly state when not to use or list alternatives, the context is sufficiently clear for an agent to infer appropriate usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vendor_managementC

Read-only

Inspect

Gestion des fournisseurs — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Qonto (12 fournisseurs · €2.4M/an) — €290k économies identifiées · 4 renegociations prioritaires. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`vendors`	Yes
`objectives`	Yes

Tool Definition Quality

C2.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true, and the description confirms it returns a deliverable, not modifying data. It adds server-side validation context. No contradiction found. The description could mention more about what happens if inputs fail validation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief (three sentences) and front-loads the tool's purpose. However, the reference case example takes up space that could be used for more essential information. Overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex input schema (nested objects, 4 parameters) and no output schema, the description is insufficient. It does not explain what the 'audited deliverable' contains, how to interpret results, or how to use the async parameter. The agent lacks critical usage details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 25%, yet the description adds no parameter-level explanation. 'Send the documented case fields' is vague and does not clarify the purpose of each field or the nested structure. The async parameter is not mentioned.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is for vendor management with a specific output: a structured, audited deliverable. It gives a concrete reference case. However, it does not distinguish from sibling tools like vendor_risk_assessor or procurement_spend_optim, which may have overlapping purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

There is no guidance on when to use this tool vs alternatives. The phrase 'Gapup agent-payable C-suite expertise (COO)' implies a strategic context, but no explicit context or exclusions are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vendor_risk_assessorC

Read-only

Inspect

Évaluateur de risque fournisseurs — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Gapup Hub — 15 fournisseurs · €1.8M spend · 3 critiques · Heatmap + plan de remédiation. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`vendors`	Yes
`riskFramework`	No
`assessmentPurpose`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and returns a deliverable, but does not mention async behavior or potential external data use beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, moderately concise, but includes a reference case that adds context without being essential. Could be leaner.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has complex inputs (nested objects, async) and no output schema, but the description omits output structure, async usage, and required fields beyond 'send the documented case fields'. Incomplete for effective invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (20%) per context signals. The description does not explain any parameters, leaving the agent to rely solely on the schema, which is insufficient for complete understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is a supplier risk assessor that returns a structured, audited deliverable, with a reference case. However, it does not differentiate from sibling tools like vendor_management or supplier_esg_audit.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description hints at C-suite expertise but provides no explicit guidance on when to use this tool versus alternatives. No exclusions or recommended use cases are given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vertical_ai_agent_governanceA

Read-onlyIdempotent

Inspect

Generates a comprehensive vertical AI agent workforce integration plan for CHROs, including governance frameworks, human-AI collaboration metrics, and upskilling recommendations. Inputs: industry vertical, workforce size, and current AI adoption level. Outputs: role-specific AI integration roadmaps, skill gap analysis, and performance benchmarks. Uses O*NET skill taxonomies and Gartner AI adoption trends. For best results with large datasets, pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`industry`	Yes
`target_roles`	No
`workforce_size`	Yes
`ai_adoption_level`	No
`include_benchmarks`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`skill_gap_analysis`	No
`integration_roadmap`	No
`collaboration_metrics`	No
`governance_recommendations`	No

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint) declare read-only and idempotent behavior. The description adds context on data sources (O*NET, Gartner) and async behavior to avoid timeout, complementing annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (4 sentences) and well-organized: purpose, inputs, outputs, data sources, async tip. No unnecessary words, and every sentence adds meaningful information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (6 parameters, enums, output schema exists), the description covers all critical aspects: purpose, inputs, outputs, data sources, and a usage optimization tip. It enables correct agent execution without gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 17% schema description coverage, the description compensates by explaining three key inputs ('industry vertical, workforce size, and current AI adoption level') and their role. It also mentions async parameter behavior, adding value beyond the schema's enum values.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Generates a comprehensive vertical AI agent workforce integration plan for CHROs', specifying inputs, outputs, and data sources. It differentiates from sibling tools like ai_governance_pilot by focusing on workforce integration.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates when to use (for CHROs, with specified inputs) and includes an async usage tip for large datasets. However, it does not explicitly exclude alternatives or compare to sibling tools, which would strengthen guidelines.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vuln_exploitability_forecastB

Read-onlyIdempotent

Inspect

As a CTO, assess the exploitability risk of CVEs using EPSS scores and cloud asset exposure data. Input a CVE ID (e.g., CVE-2021-44228) to receive exploitability likelihood, affected cloud services, and threat intelligence context. Returns structured risk metrics for prioritization. Sources: CVE NVD, OpenCVE, GitHub Advisories. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`cveId`	Yes
`cloudProvider`	No
`includeDetails`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`cveId`	Yes
`status`	Yes
`sources`	Yes
`warnings`	Yes
`epssScore`	No
`lastUpdated`	No
`cloudExposure`	No
`epssPercentile`	No

Tool Definition Quality

B3.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark the tool as read-only and idempotent. The description adds value by explaining the async behavior to avoid timeouts and listing external data sources (CVE NVD, OpenCVE, GitHub Advisories). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise at 4 sentences, front-loading the purpose and role. Some phrases (e.g., 'Sources: ...') could be more integrated, but overall no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has multiple parameters and an output schema, the description covers the main purpose and async option but fails to explain key parameters like 'cloudProvider' and 'includeDetails'. The presence of an output schema reduces the need to describe return values, but parameter gaps hurt completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 25% schema description coverage, the burden is on the description to explain parameters. The description only addresses the 'async' parameter, leaving 'cveId', 'cloudProvider', and 'includeDetails' unexplained. The pattern for 'cveId' is given in schema but no semantic context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool assesses exploitability risk for CVEs using EPSS scores and cloud exposure data. It specifies the target user (CTO) and provides a concrete example (CVE-2021-44228). However, it does not explicitly differentiate this tool from sibling tools like 'cve_security_lookup' or 'vuln_patch_priority_engine', which could serve similar purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies high-level risk assessment but gives no guidance on when to use this tool vs. alternatives. It does not list prerequisites, exclusions, or decision criteria. Even the hint about async use is more of a technical note than a usage guideline.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vuln_patch_priority_engineA

Read-onlyIdempotent

Inspect

As a CTO, quickly prioritize unpatched CVEs by combining exploitability scores (EPSS) with cloud asset criticality. Input a list of CVE IDs and your AWS service types (e.g., EC2, RDS) to receive a ranked patching order with risk scores and estimated cloud impact. Uses public NVD, OpenCVE, and AWS pricing data. Ideal for vulnerability management and cloud security posture improvement.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`cveIds`	Yes	List of CVE identifiers to analyze (e.g., ["CVE-2021-44228", "CVE-2023-3824"])
`maxResults`	No	Maximum number of prioritized CVEs to return (default: 10)
`awsServices`	No	AWS service types affected by these CVEs (e.g., ["EC2", "RDS", "Lambda"])

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`prioritizedCves`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, openWorldHint=true, and idempotentHint=true. The description adds behavioral context by mentioning data sources (NVD, OpenCVE, AWS pricing) and output details (ranked patching order with risk scores and cloud impact), which goes beyond what annotations provide. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, each earning its place: first sets role and purpose, second details inputs and outputs, third states data sources and use cases. Front-loaded and concise with no waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity, annotations, full schema coverage, and existence of an output schema, the description adequately covers inputs, outputs, data sources, and use cases. It could mention the async parameter behavior, but the schema covers that. Overall, it provides sufficient context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with all parameters described. The description reaffirms the inputs (CVE IDs and AWS services) but does not add new meaning beyond the schema. The explanation of the process (combining scores) adds general context but not parameter-specific semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool prioritizes CVEs by combining exploitability scores with asset criticality. It uses a specific verb ('prioritize') and resource ('unpatched CVEs'), and distinguishes itself from siblings like 'cve_security_lookup' and 'vuln_exploitability_forecast' that focus on lookup or forecasting.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides usage context ('ideal for vulnerability management and cloud security posture improvement') but does not explicitly state when to use this tool versus alternatives or when not to use it. It lacks exclusions or comparisons with sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

weather_climate_intelA

Read-only

Inspect

Physical climate intelligence for insurance underwriting, agritech, logistics, energy trading and ESG/climate risk disclosure. Three modes: (1) forecast — 14-day daily weather forecast with temperature, precipitation, wind and humidity; (2) historical — daily records and monthly aggregates for any date range since 1940, with anomaly detection (P90/P95 heat events, extreme precipitation days); (3) climate_risk — long-term physical risk scoring combining CMIP6 ensemble projections (2020-2050), altitude, FEMA flood zones (US) and historical baselines. Risk dimensions: flood, heat (days >35°C/year), drought (SPI), wildfire, sea-level. Overall score 0-100 (100 = severe). Location: city string or lat/lon coordinates. Sources: Open-Meteo (keyless, global, 1940→2050), Open-Elevation, FEMA NFHL (US), NOAA CDO (optional NOAA_API_KEY env var for US+global station data). SLA: ≤25s p95. Cache: 1h forecast / 24h historical / 7d climate_risk.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	'forecast' (14 days), 'historical' (date range since 1940), 'climate_risk' (long-term physical risk score)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`date_to`	No	ISO date YYYY-MM-DD — end of date range (required for historical/climate_risk)
`metrics`	No	Weather metrics to include. Default: all metrics.
`location`	Yes	Geographic location. Provide either {city, country?} or {lat, lon}.
`date_from`	No	ISO date YYYY-MM-DD — start of date range (required for historical/climate_risk)

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`sources`	Yes
`forecast`	No
`location`	Yes
`historical`	No
`climate_risk`	No
`quality_score`	Yes

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses key behavioral traits: it is read-only (consistent with readOnlyHint annotation), provides SLA (≤25s p95), cache durations per mode, data sources (Open-Meteo, FEMA, etc.), and risk dimensions. No contradiction with annotations; the description adds valuable context beyond structured fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with use cases and modes. While it is fairly long, every sentence carries valuable information (sources, SLA, cache, risk dimensions). Minor conciseness improvements possible, but overall efficient for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (three modes, multiple parameters, nested location object) and the presence of an output schema, the description is comprehensive. It covers location, date ranges, metrics, risk dimensions, data sources, and performance guarantees, leaving no critical gaps for an AI agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema covers 100% of parameters with descriptions. The description adds meaning by explaining the implications of mode (e.g., anomaly detection for historical, CMIP6 projections for climate_risk) and the location format (city or lat/lon). This enriches schema defaults without being redundant.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides physical climate intelligence with three distinct modes (forecast, historical, climate_risk) and lists specific use cases like insurance underwriting and agritech. It effectively distinguishes the tool's purpose from sibling tools by detailing unique features and data sources.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly explains when to use each of the three modes, including necessary parameters like date ranges for historical and climate_risk. However, it does not mention when not to use this tool or suggest alternative sibling tools (e.g., climate_scenario_rcp), slightly limiting its guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

webhooks_manageAInspect

Manage HTTP webhook callbacks for async tools (T5/T6 batch flagships). Instead of polling every 5s, register a callback URL — Gapup posts the job result to your endpoint the moment it completes. Supported events: job.completed | job.failed | monitoring.alert | quota.threshold. Modes: register (add endpoint), list (view active webhooks), revoke (soft-delete), test (fire a test payload to verify your receiver), history (last 20 fires). Security: every delivery is signed with HMAC-SHA256 on the body — verify the X-Gapup-Signature header against sha256(secret, body).

ParametersJSON Schema

Name	Required	Description
`url`	No	(register) HTTPS/HTTP endpoint that will receive POST callbacks. Must return 2xx within 10s.
`mode`	Yes	register — add a webhook endpoint. list — view your active webhooks. revoke — soft-delete a webhook by webhook_id. test — fire a test payload to verify the receiver is alive. history — last 20 delivery attempts for a webhook.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`events`	No	(register, optional) Events to subscribe to. Defaults to all events if omitted.
`secret`	No	(register, optional) A secret string used to sign deliveries with HMAC-SHA256. Store it safely — verify X-Gapup-Signature header on your receiver.
`webhook_id`	No	(revoke / test / history) The webhook_id returned from register.
`caller_hash`	No	Optional caller identity override. If omitted, uses the internal session hash.

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations: it discloses HMAC-SHA256 signing, soft-delete for revoke, test mode that fires a test payload, and security requirements. This fully informs the agent about the tool's behavior and side effects, with no contradictions to the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the core purpose. It covers all essential aspects in a compact manner, but could be slightly more concise by trimming redundant phrases. Still, every sentence adds value and there is no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 params, 1 required, output schema present), the description is thorough. It explains each mode, supported events, security mechanism, and constraints (e.g., 10s timeout for URL). No additional context is needed for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description enriches each parameter with practical context, such as the meaning of each mode, default events, and security notes. For example, it explains that 'secret' is used for HMAC-SHA256 and that 'webhook_id' is obtained from register. This goes beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the tool's purpose: managing HTTP webhook callbacks for async tools. It lists the supported events and modes (register, list, revoke, test, history), making it distinct from other tools. The verb 'manage' combined with the resource 'webhook callbacks' and the specific modes leaves no ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use the tool (instead of polling every 5s) and details each mode's use case. However, it does not explicitly mention when not to use it or provide alternatives, such as the job_result tool for polling, which could be considered a gap.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

web_search_multilangA

Read-only

Inspect

Multi-language, multi-source web search that goes beyond Anglo-centric results. Supports 15 languages (fr/de/es/it/pt/nl/ja/zh/ko/ar/ru/sv/pl/tr/en) with automatic detection. Aggregates results from Mojeek (independent search engine, multilang) and Wikipedia (native multilang API), with DDG and HN as English-language complements. Returns deduplicated results ranked by cross-engine consensus. Use when you need non-English search results, when DDG fails, or for geographically-biased queries. Phase 2 #7 of the geo/lang expansion plan. Note: Brave/Bing/Searx are blocked from DO IPs — configure AICI_RESEARCH_PROXY_URL for residential proxy.

ParametersJSON Schema

Name	Required	Description
`lang`	No	2-letter language code. If omitted, auto-detected from query characters and lexical markers.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Search query in any language
`country`	No	ISO-3166-1 alpha-2 country code for geographic bias (e.g. FR, DE, JP, BR). Optional.
`max_results`	No	Maximum number of results to return (default 10).

Output Schema

ParametersJSON Schema

Name	Required	Description
`query`	Yes
`status`	Yes
`results`	Yes
`sources`	Yes
`by_engine`	Yes
`lang_used`	Yes
`country_used`	No
`quality_score`	Yes
`total_unique_results`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds significant behavioral context beyond annotations: aggregates from multiple engines, deduplicates by cross-engine consensus, specifies blocked engines from DO IPs and proxy requirement. Annotations only indicate readOnlyHint and openWorldHint.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is three sentences plus a note, each sentence providing essential information without redundancy. Front-loaded with core purpose, then usage guidance, then technical configuration.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of multi-language, multi-source search, the description covers purpose, sources, languages, usage guidance, and configuration needs. Output schema exists, so no need to detail return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema already has descriptions for all 5 parameters (100% coverage). Description adds minimal extra parameter detail (e.g., auto-detection of lang) but mainly focuses on overall behavior rather than individual parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it is a multi-language, multi-source web search that goes beyond Anglo-centric results. Lists specific languages and sources (Mojeek, Wikipedia, DDG, HN), distinguishing it from other search tools in the sibling list.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: when needing non-English results, when DDG fails, or for geographically-biased queries. Also provides proxy configuration note for blocked engines.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

win_loss_decoderB

Read-only

Inspect

Analyse Win/Loss deals — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Win/Loss 32 deals Q1 2026 · Win rate 41% → 68% potentiel · Playbook 8 actions CRO. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`deals`	Yes
`company`	Yes
`product`	Yes
`topCompetitors`	No
`primaryChallenge`	No
`salesCycleTargetDays`	No

Tool Definition Quality

B3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description reinforces that it returns an audited deliverable and inputs are validated server-side, adding context without contradiction. It does not reveal potential async behavior or rate limits, but annotations cover safety.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes unnecessary jargon like 'Gapup agent-payable C-suite expertise (CRO)' and a reference to a case study, adding noise. Key information could be presented more efficiently.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 7 parameters, nested objects, and no output schema, the description is insufficient. It does not explain the async parameter, return format, or provide examples of successful usage. The mention of 'validated server-side' hints at constraints but lacks specificity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 14%, and the description provides no additional parameter guidance beyond 'send the documented case fields'. With nested objects and many required fields, the agent lacks clarity on how to construct inputs properly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes win/loss deals and returns a structured deliverable, making the purpose evident. The mention of C-suite expertise adds specificity. However, the jargon 'Gapup agent-payable' may confuse some agents.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives like competitive_deep_dive or deal_coach. The description does not mention exclusions or prerequisites, leaving the agent to infer usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

workflow_orchestratorA

Read-only

Inspect

Meta-tool that CHAINS multiple MCP tools sequentially into a named workflow — delivering a composite output in a single call. 10 predefined workflows: compliance_full_audit (6 steps: KYC+sanctions+AI_gov+privacy+ESRS+CSRD), deal_due_diligence (7 steps: deep_dive+registry+court+patents+KYC+financials+M&A), market_entry_brief (6 steps: country_study+regulations+procurement+tax+AGOA+market_brief), competitor_intelligence_pack (5 steps: deep_dive+intel+patents+earnings+pitch_deck), esg_360 (5 steps: ESG_audit+carbon+CSRD+ESRS+supplier_esg), ip_freedom_to_operate (4 steps: patent_search+async_deep+IP_audit+competitive), climate_property_assessment (3 steps: climate_risk+real_estate+geo), pharma_target_screen (4 steps: trials+adverse_events+patents+meta_analysis), sanctions_360 (5 steps: KYC+Russian_sec+registry+crypto_wallet+court_filings), talent_market_brief (4 steps: salary+trends+adjacent_roles+skills_taxonomy). Returns steps_executed, consolidated P0/P1/P2 signals, overall_status, estimated_cost_usd, and raw outputs per step. Cache: 1h LRU per (workflow, target). Budget: 60s global timeout → partial if exceeded. Use when an agent needs a composite liverable without orchestrating tools manually.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`params`	No	Optional overrides passed to sub-tools. Keys depend on workflow (e.g., country, sector, role, drug, technology, wallet_address, acquirer).
`target`	Yes	The entity to analyze. A company name for most workflows; location for climate_property_assessment; role+country for talent_market_brief.
`workflow`	Yes	Named workflow to execute. Each workflow chains 3-7 tools sequentially.
`skip_failed_steps`	No	Default true: continue on step failure. Set false to abort on first error.

Output Schema

ParametersJSON Schema

Name	Required	Description
`target`	Yes
`outputs`	Yes
`summary`	Yes
`workflow`	Yes
`overall_status`	Yes
`steps_executed`	Yes
`total_duration_ms`	Yes
`estimated_cost_usd`	Yes
`consolidated_signals`	Yes

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses chaining behavior, composite output structure (steps_executed, signals, status, cost, raw outputs), cache (1h LRU), timeout (60s), partial results, and async option. Adds significant value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-structured with purpose, workflow list, return details, constraints, and usage tip. Dense but front-loaded; workflow list slightly redundant with schema enum.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers all key aspects: parameter semantics, return structure, caching, timeout, async support, and usage context. Output schema exists, so return details aren't needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, but description adds concrete meaning for 'target' (varies by workflow) and 'skip_failed_steps' (default true). Also explains async parameter for job polling.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it's a meta-tool that chains MCP tools into named workflows, delivering composite output. Lists 10 predefined workflows with step counts, distinguishing it from individual sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('composite liverable without orchestrating tools manually'), but does not explicitly exclude cases or mention alternatives. Context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

working_capitalC

Read-only

Inspect

Optimiseur du BFR — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Agicap — BFR optimisation · DSO 52→38j · Cash libéré +€2.8M · 3 quick wins immédiats. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`industry`	No
`challenges`	Yes
`financials`	Yes
`topCustomers`	No
`topSuppliers`	No

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that the tool 'returns a structured, audited deliverable' and that inputs are validated server-side. These are consistent with readOnlyHint. However, no additional behavioral traits beyond annotations are disclosed, and the description does not contradict annotations. Score is adequate because it maintains alignment but adds limited extra context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short (3-4 lines), but it includes a lengthy reference case that may not be essential. The first line is partially jargon ('Gapup agent-payable'), and the structure is not front-loaded with the most critical information. It could be more concise and better organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters including nested objects, no output schema), the description is insufficient. It does not explain the deliverable's format, how results should be interpreted, or the process beyond server-side validation. The reference case provides anecdotal context but lacks comprehensive guidance, leaving significant gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 14% (only the async parameter), yet the description provides no information about any parameters. It merely states 'send the documented case fields' without specifying what those fields are or how they should be used. This fails to compensate for the low coverage, leaving parameters poorly explained for an agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool is an 'Optimiseur du BFR' (Working Capital Optimizer) for CFOs, indicating it analyzes and optimizes working capital. It mentions returning a structured, audited deliverable and provides a reference case. While the purpose is reasonably clear, it does not explicitly differentiate from sibling tools like 'working_capital_esg_impact_rater' or mention the specific resource being optimized, leaving some ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lacks explicit guidance on when to use this tool versus alternatives. It mentions 'C-suite expertise (CFO)' implying it's for high-level strategic analysis, but does not state when it is appropriate, when not, or how it differs from other working capital tools. No context on prerequisites or exclusions is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

working_capital_esg_impact_raterA

Read-onlyIdempotent

Inspect

As a CFO, assess how ESG factors (Environmental, Social, Governance) influence working capital efficiency using IMF SDR and BIS data. Inputs include company sector, geographic exposure, and ESG risk scores. Outputs provide a quantitative impact rating on working capital metrics like days sales outstanding (DSO) and inventory turnover, alongside IMF SDR-aligned liquidity risk indicators.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	Yes	Primary geographic exposure (e.g., 'EU', 'APAC')
`sector`	Yes	Industry sector (e.g., 'manufacturing', 'energy')
`currency`	No	Reporting currency (ISO 4217 code, e.g., 'USD', 'EUR')
`esgRiskScore`	Yes	Aggregate ESG risk score (0-100)
`workingCapitalRatio`	No	Current working capital ratio (current assets / current liabilities)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`impactRating`	No	ESG impact on working capital efficiency (-100 to +100)
`esgFactorBreakdown`	No
`liquidityRiskIndicator`	No	IMF SDR-aligned liquidity risk score (0-1)
`workingCapitalAdjustment`	No	Projected adjustment to working capital ratio (%)

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint) already indicate safe, non-destructive operation. The description adds value by specifying data sources (IMF SDR, BIS) and output metrics (DSO, inventory turnover), giving behavioral context beyond annotations. No contradiction or missing side effects for a read-only tool.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences: first states purpose, second details inputs/outputs. No fluff, appropriately sized, and front-loaded with the main action. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given complexity (ESG impact assessment), 6 parameters, and output schema exists, the description covers essential aspects: purpose, data sources, input categories, and output metrics. Lacks potential limitations or data freshness notes, but overall sufficient for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%; all 6 parameters have descriptions. The tool description mentions key inputs (sector, region, ESG risk score) but does not add significant new meaning beyond the schema. Baseline score applies with no enhancement.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to assess ESG impact on working capital efficiency using specific data sources (IMF SDR, BIS). It identifies the role (CFO), inputs (sector, region, ESG risk scores), and outputs (impact rating on DSO, inventory turnover, liquidity risk indicators). This distinguishes it from sibling tools like 'working_capital' which likely focus on general working capital without ESG context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when CFO needs ESG-related working capital assessment, but lacks explicit when-not-to-use guidance or alternatives. With numerous sibling tools (e.g., 'supplier_esg_audit', 'working_capital'), no comparison provided. Clear context but no exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

working_capital_fx_hedge_optimizerA

Read-onlyIdempotent

Inspect

For CFOs managing multinational working capital, this tool analyzes real-time ECB and FRED foreign exchange rates to recommend optimal hedging strategies. Input base currency, target currencies, and working capital amounts to receive forward contract suggestions, natural hedge opportunities, and cost-benefit analysis of various hedging instruments (forwards, options, swaps). Outputs include hedge ratios, estimated cost savings, and risk reduction metrics.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`baseCurrency`	Yes	ISO 4217 code of the company's functional currency (e.g., 'USD', 'EUR')
`riskAppetite`	No	Company's risk tolerance for currency fluctuations	balanced
`timeHorizonDays`	No	Planning horizon in days (default: 90)
`targetCurrencies`	Yes	ISO 4217 codes of currencies to hedge against (e.g., ['EUR', 'GBP', 'JPY'])
`workingCapitalAmounts`	Yes	Working capital amounts in each target currency (e.g., { EUR: 5000000, GBP: 3000000 })

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`recommendations`	No

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, and the description reinforces this by stating it analyzes and recommends (no modification). It adds value by naming data sources (ECB, FRED) and output types, though it omits details like rate limits or latency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, each earning its place: first sentence states purpose and data sources, second specifies inputs, third lists outputs. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists and schema coverage is 100%, the description covers target user, data sources, inputs, and outputs comprehensively. No gaps remain for an agent to select and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, baseline is 3. The description adds meaning by grouping inputs and describing outputs (forward contract suggestions, natural hedge opportunities), which goes beyond the schema details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it analyzes ECB and FRED rates to recommend optimal hedging strategies for working capital, differentiating it from related tools like 'fx_rate' which merely provides rates.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It targets CFOs managing multinational working capital and lists required inputs, providing clear usage context. However, it does not explicitly contrast with sibling tools or state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

x402_liquidity_monitorA

Read-onlyIdempotent

Inspect

Monitors real-time x402-USDC liquidity depth across 12 decentralized and centralized exchanges, providing slippage alerts and depth analysis for CFO liquidity risk assessment. Inputs include slippage thresholds and exchange selection; outputs liquidity depth, price impact estimates, and warning flags. Essential for optimizing trade execution and managing liquidity exposure. Keywords: liquidity monitoring, slippage analysis, DEX/CEX depth, x402-USDC pair, CFO financial tooling.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`exchanges`	No	List of exchanges to monitor (defaults to all 12 if empty)
`depthLevels`	No	Liquidity depth levels to analyze (percentage from mid-price)
`slippageThreshold`	Yes	Maximum acceptable slippage percentage (0-100)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`midPrice`	No	Current x402-USDC mid-price
`warnings`	Yes
`priceImpact`	No
`liquidityDepth`	Yes
`slippageAlerts`	Yes

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds value by detailing inputs (slippage thresholds, exchange selection) and outputs (liquidity depth, price impact, warning flags), which annotates behavioral traits beyond what annotations provide. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with front-loaded purpose, followed by inputs/outputs and keywords. No wasted words; every sentence contributes essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters, full schema coverage, strong annotations, and an output schema, the description covers the tool's role, inputs, and outputs adequately. It does not mention the async parameter's existence or result polling, but that is documented in the schema. Minor gap for edge cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good individual parameter descriptions. The description adds minimal extra meaning beyond summarizing inputs (e.g., 'slippage thresholds and exchange selection'), but does not significantly enhance understanding of depthLevels or async parameter. Baseline score is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific resource (x402-USDC liquidity depth) and action (monitors), with explicit mention of 12 exchanges and CFO liquidity risk assessment. It effectively distinguishes from sibling tools like usdc_x402_payments_intel and x402_payment_flow_analyzer by focusing on liquidity depth and slippage alerts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states it is 'Essential for optimizing trade execution and managing liquidity exposure,' providing clear context. However, it does not explicitly state when not to use this tool or mention alternatives among the 150+ sibling tools, leaving some ambiguity about comparative usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

x402_payment_flow_analyzerA

Read-onlyIdempotent

Inspect

As a CTO, analyze USDC payment flows involving x402 addresses to assess counterparty risk, trace transaction paths, and evaluate regulatory exposure. Input a wallet address or transaction hash to receive risk scores, flow diagrams, and compliance flags from Chainalysis and TRM Labs public APIs. Ideal for due diligence, fraud detection, and compliance reporting. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`depth`	No	Hops to trace in payment flow
`txHash`	No	USDC transaction hash to trace
`address`	Yes	Ethereum wallet address to analyze
`includeRiskScore`	No	Include counterparty risk scoring

Output Schema

ParametersJSON Schema

Name	Required	Description
`flowId`	No	Unique identifier for this payment flow analysis
`status`	Yes
`sources`	No
`warnings`	No
`riskScore`	No	Counterparty risk score (0-100)
`complianceFlags`	No
`exposureSummary`	No
`transactionPath`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, openWorldHint, idempotentHint) already indicate read-only, idempotent behavior with external data. The description adds value by explaining the async mode for timeout avoidance and the use of Chainalysis and TRM Labs public APIs. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is efficient and front-loaded: two sentences covering purpose and key usage note. Every sentence earns its place, with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 5 parameters, output schema, and rich annotations, the description covers the core functionality, use cases, and async behavior. It does not repeat output details (since output schema exists) and provides enough context for an agent to decide when to use this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description mentions that you can input a wallet address or transaction hash, slightly adding to the schema's individual parameter descriptions. However, it does not detail other parameters like depth and includeRiskScore beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: analyzing USDC payment flows with x402 addresses for risk assessment, transaction tracing, and regulatory evaluation. It uses specific verbs (analyze, assess, trace, evaluate) and a specific resource (USDC payment flows, x402 addresses), which distinguishes it from sibling tools like x402_liquidity_monitor and x402_payment_fraud_detector.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions ideal usage scenarios (due diligence, fraud detection, compliance reporting) and provides guidance on using async to avoid timeout. However, it does not explicitly state when not to use this tool or compare it to sibling alternatives, such as the x402_payment_fraud_detector for fraud-specific cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

x402_payment_fraud_detectorA

Read-onlyIdempotent

Inspect

Risk-focused tool that analyzes x402-USDC payment transactions for fraud patterns using on-chain forensics. Takes a transaction hash or wallet address as input and returns risk scores, suspicious indicators, and historical patterns. Designed for risk management teams to quickly assess payment legitimacy. Includes keywords: fraud detection, USDC risk, blockchain forensics, transaction monitoring. pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`walletAddress`	No
`includeHistory`	No
`amountThreshold`	No
`transactionHash`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`warnings`	Yes
`riskScore`	Yes
`isSuspicious`	Yes
`sanctionsMatch`	No
`fraudIndicators`	No
`transactionHistory`	No

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, openWorldHint, and idempotentHint. The description adds context about async behavior, on-chain forensics, and return values (risk scores, indicators), providing behavioral details beyond annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is fairly concise (about 70 words) and front-loads the main purpose. However, the 'Includes keywords' section adds little value for an AI agent and could be removed or integrated more naturally.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 5 parameters, output schema, and annotations, the description provides a general overview but lacks parameter details and clarity on when to use walletAddress vs transactionHash. It covers the core functionality but leaves gaps in parameter semantics.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 20% (only async described). The description mentions inputs (transaction hash or wallet address) but does not explain walletAddress, includeHistory, amountThreshold, or their roles. It fails to add meaning for most parameters, which is insufficient for a low-coverage schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes x402-USDC payment transactions for fraud patterns using on-chain forensics. It specifies inputs (transaction hash or wallet address) and outputs (risk scores, indicators), distinguishing it from sibling tools like general fraud_detector or x402_payment_flow_analyzer.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description targets risk management teams for quick payment legitimacy assessment and advises using async:true to avoid timeouts. However, it does not explicitly compare to alternatives or state when not to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.