mcp-knowledge

Name: mcp-knowledge
Author: getgapup

by io.github.getgapup

Server Details

100+ MCP tools for AI agents: content metadata, trade intelligence, business-expertise analysis.

Status: Healthy
Last Tested: 2026-07-23 01:33
Transport: Streamable HTTP
URL
Repository: getgapup/gapup-mcp
GitHub Stars: 0
Server Listing: @gapup/mcp-knowledge

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

C2.6/5.0

Tool DescriptionsC

Average 3.7/5 across 271 of 271 tools scored. Lowest: 1.8/5.

Server CoherenceC

Disambiguation2/5

With 271 tools, many share similar descriptions and purposes, especially the numerous 'Gapup agent-payable' tools that all return structured deliverables. Tools like 'budget_variance_ai' and 'margin_doctor_finance' both analyze financial variances, making it difficult for an agent to select the correct one.

Naming Consistency2/5

Tool names mix French and English inconsistently (e.g., 'bp_narratif', 'churn_defender', 'anti_demissions_hr'). Some use verb_noun patterns, others are noun phrases, and the casing is irregular (e.g., 're_deal_screener' vs 'pricing_in_deal'). No consistent naming convention is followed.

Tool Count1/5

271 tools is far too many for a single server, regardless of domain. The count suggests a lack of scoping, as many tools seem to be fine-grained variations for specific personas (CMO, CFO, CRO, etc.). A server this large overwhelms agents and increases latency.

Completeness3/5

The tool set covers a vast array of domains (finance, sales, HR, legal, ESG, AI, etc.), but many areas lack depth. For example, there are few update/delete operations, and many tools are one-shot report generators without supporting CRUD endpoints. The coverage is broad but shallow.

Available Tools

271 tools

abm_architectC

Read-only

Inspect

Architecte ABM — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Gapup Hub — ABM 20 comptes nommés · Budget €120k · Tier 1×5 + Tier 2×15 · Playbooks 3 niveaux. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`product`	Yes
`salesTeam`	No
`icpCriteria`	Yes
`abmBudgetEur`	No
`targetAccounts`	Yes
`currentChannels`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint and openWorldHint, indicating safe read behavior. The description adds that inputs are validated server-side. No contradiction between description and annotations. However, it does not elaborate on other behavioral aspects like processing time or result storage.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes a reference case example. It is front-loaded with the title. However, mixing French and English may reduce clarity for some agents. It could be more concise by removing the example and focusing on essential usage.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 parameters, nested objects, no output schema), the description lacks completeness. It does not explain what the deliverable contains, how results are returned, or any post-processing steps. The absence of output schema description leaves a significant gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%. The description does not explain individual parameters; it only mentions 'documented case fields' and provides a reference case. Agents must infer parameter meaning from names and types alone, which is insufficient for complex nested objects.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as an 'Architecte ABM' that returns a structured, audited deliverable for C-suite expertise (CMO). It includes a reference case to illustrate scope. This distinguishes it from sibling tools like 'abm_lookalike_account_finder' and 'account_expansion_mapper' which focus on specific sub-tasks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description implies it's for C-level ABM strategy but does not compare to similar tools or state prerequisites. Lacks when-not-to-use or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

abm_lookalike_account_finderA

Read-onlyIdempotent

Inspect

As a CMO, discover 50 B2B accounts that closely match your top 10 customers' tech stacks and firmographics. This tool analyzes public web data including robots.txt and OpenGraph metadata to identify lookalike accounts for targeted ABM campaigns. Input your top customer domains and desired firmographic filters to receive a ranked list of potential targets with matching technologies and company attributes.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`tech_stack_keywords`	No	Specific technologies to match in lookalike accounts
`firmographic_filters`	No
`top_customer_domains`	Yes	List of top 10 customer domains to use as seed accounts

Output Schema

ParametersJSON Schema

Name	Required	Description
`stats`	No
`status`	Yes
`sources`	Yes
`warnings`	Yes
`lookalike_accounts`	Yes
`matched_technologies`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, openWorldHint, and idempotentHint. The description adds valuable behavioral context: 'analyzes public web data including robots.txt and OpenGraph metadata' and returns a 'ranked list of potential targets with matching technologies and company attributes.' No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, each serving a distinct purpose: stating the goal, explaining the analysis method, and specifying inputs and outputs. It is front-loaded with the most critical information and contains no redundant language.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 params with nested objects, high schema coverage, existing output schema, and rich annotations), the description covers all essential aspects: purpose, data sources, input format, and output. It is complete enough for an agent to understand and use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is high (75%), and the schema descriptions for parameters like 'top_customer_domains' are already detailed. The description briefly reiterates input types ('top customer domains' and 'desired firmographic filters') but does not add significant meaning beyond the schema. It's adequate but not exceptional.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'discover 50 B2B accounts that closely match your top 10 customers' tech stacks and firmographics.' It uses a specific verb ('discover') and defines the resource ('lookalike accounts'). The tool is well-differentiated from siblings like 'account_expansion_mapper' and 'competitor_profiles' by focusing on lookalike discovery based on tech stacks and firmographics.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage guidance: 'Input your top customer domains and desired firmographic filters to receive a ranked list of potential targets.' It implies when to use (for targeted ABM campaigns) but does not explicitly state when not to use or mention alternative tools. Given the sibling list, the context is sufficient for most agents to select appropriately.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

account_expansion_mapperC

Read-only

Inspect

Mapping d'expansion comptes — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Notion B2B Enterprise — top 30 strategic accounts · expansion plays NRR 130%+ target · Snowflake/Shopify/Vercel/Stripe analyzed. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`accounts`	Yes
`ownership`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint and openWorldHint, so the description's mention of server-side validation and returning a deliverable aligns with a safe, read-only operation. It adds some context but doesn't fully describe behavior beyond what annotations convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short (three sentences), but it includes somewhat jargony phrases like 'Gapup agent-payable C-suite expertise (CRO)' which may not be helpful. Front-loading is acceptable.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 5 parameters (including nested objects) and no output schema, the description fails to explain return format or what the deliverable contains. It leaves significant gaps for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (20%), and the description does not explain parameters or their roles. The schema has some descriptions, but the description adds no extra meaning, leaving many parameters underdocumented.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states 'Mapping d'expansion comptes' and mentions returning a structured, audited deliverable, with a specific reference case for B2B enterprise expansion. It gives a clear verb and resource, but the purpose could be more precise. Sibling tools like abm_architect or champion_mapping could overlap, but the description doesn't explicitly distinguish.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The reference case implies it's for top strategic accounts in B2B enterprise settings, but no when-not-to-use or alternative tool names are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

action_plan_esgC

Read-only

Inspect

Plan d'action ESG — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: TechCorp SAS — Plan ESG 36 mois (500 FTE, €60M CA, score 54→76/100). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`horizon`	Yes		36 mois
`ambitions`	Yes
`targetLabels`	No
`currentScores`	No
`availableResources`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark readOnlyHint=true, so the description's mention of returning a deliverable aligns. It adds that inputs are validated server-side, but does not disclose other behavioral traits like potential async execution (via async parameter) or any limitations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a few sentences and includes a reference case which adds some value but also verbosity. It is reasonably concise but could be more structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 parameters, nested objects, no output schema) the description is insufficient. It does not explain the output format or return value structure, and lacks details on how to interpret the deliverable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%, and the main description does not explain any parameters beyond a vague 'send the documented case fields'. The complex nested schema requires much more explanation for effective use.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool produces a structured, audited ESG plan deliverable. It includes a reference case for context. However, it does not differentiate from similar sibling tools like esg_audit_multi or sustainability_report.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives. Given the large number of ESG-related siblings, the lack of usage context or exclusions is a significant gap.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

adversarial_input_stress_testerA

Read-onlyIdempotent

Inspect

An asynchronous risk assessment tool that evaluates AI model resilience against adversarial inputs following NIST AI Risk Management Framework (RMF) red-teaming protocols. Designed for security and compliance personas, it accepts model outputs or decision boundaries and returns structured risk scores, failure modes, and adversarial examples. Requires async:true to avoid timeout errors. Outputs include status, warnings, and source references.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`maxTests`	No	Maximum number of adversarial tests to run
`modelOutput`	Yes	The AI model's output or decision to be stress-tested
`adversarialDataset`	No	Optional custom adversarial inputs to test
`sensitivityThreshold`	No	Threshold for flagging high-risk adversarial examples

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`riskScore`	No	Normalized risk score from adversarial testing
`failureModes`	No
`adversarialExamples`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses key behavioral trait: being asynchronous with timeout considerations, which annotations (readOnlyHint=true, idempotentHint=true) do not cover. Adds context about following NIST RMF protocols and output structure. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, front-loaded with purpose and key constraints (async). Well-structured and concise, though could be slightly more compact by merging some points.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With output schema present, description adequately covers inputs, async behavior, and output components (status, warnings, source references). Sufficient for a security tool with moderate complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so description does not need to detail individual parameters. It adds high-level context about accepting 'model outputs or decision boundaries' and async usage, but does not provide meaning beyond schema for each parameter.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description specifies it's an 'asynchronous risk assessment tool' for evaluating 'AI model resilience against adversarial inputs' following 'NIST AI RMF red-teaming protocols'. This clearly distinguishes it from sibling tools, which cover diverse domains like marketing, finance, and compliance, but not adversarial input testing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Requires async:true to avoid timeout errors' and mentions outputs include 'status, warnings, and source references'. Provides clear context for when to use async mode, but does not specify exclusions or alternative tools for similar tasks.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

affiliate_fraud_clickstream_detectorA

Read-onlyIdempotent

Inspect

Analyzes affiliate clickstream data from Common Crawl to flag potential fraud patterns (duplicate IPs, rapid clicks, device spoofing). Designed for CMOs to validate affiliate traffic quality and prevent budget waste. Inputs: affiliate network name and date range. Outputs: fraud probability score, suspicious IP list, and pattern analysis. Keywords: affiliate fraud detection, clickstream analysis, marketing attribution, traffic validation.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`threshold`	No	Fraud probability threshold (0.1-0.99)
`date_range`	Yes
`affiliate_network`	Yes	Name of the affiliate network to analyze (e.g., 'CJ Affiliate', 'Rakuten')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`suspicious_ips`	No
`fraud_probability`	No	Overall fraud probability score (0-1)
`patterns_detected`	No
`total_clicks_analyzed`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already set readOnlyHint=true, indicating safety. The description adds behavioral details: uses Common Crawl data, outputs fraud probability score, suspicious IP list, and pattern analysis. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences plus keywords. The first sentence immediately states the core purpose, with no wasted words. Ideal for quick agent scanning.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 4 parameters (including nested objects) and an output schema, the description provides a solid high-level overview. It misses guidance on async usage, but the output schema likely covers return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description lists only affiliate_network and date_range as inputs, ignoring async and threshold. Schema descriptions cover 75% of parameters, so baseline is 3. The description adds minimal value beyond schema for parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes affiliate clickstream data to flag fraud patterns like duplicate IPs, rapid clicks, and device spoofing. It specifies the target user (CMOs) and distinguishes itself from generic fraud detectors by focusing on affiliate clickstream.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It explicitly targets CMOs for validating affiliate traffic quality, providing good context. However, it does not mention when not to use this tool or alternatives like the generic 'fraud_detector' or 'social_influencer_fake_follower_detector'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

africa_trade_barrier_breakerA

Read-onlyIdempotent

Inspect

As a COO, analyze non-tariff trade barriers (NTBs) across African trade corridors using WITS and UNCTAD STAT data. Input origin/destination countries and product HS codes to receive barrier mapping with severity scores and actionable mitigation strategies. Returns structured risk assessment, regulatory compliance gaps, and supply chain optimization recommendations. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`hs_code`	No	6-digit Harmonized System product code
`origin_country`	Yes	ISO 3-letter country code for export origin
`destination_country`	Yes	ISO 3-letter country code for import destination
`include_regulatory_details`	No	Whether to include detailed regulatory text in output

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`warnings`	Yes
`barrier_summary`	Yes
`trade_flow_impact`	No
`regulatory_details`	No
`mitigation_strategies`	Yes

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnly, openWorld, idempotent hints. The description adds context about data sources (WITS, UNCTAD STAT) and the async option, which informs behavior beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with purpose and key details. It is concise but could be more structured (e.g., bullet points for outputs). Still efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With output schema present, the description adequately covers inputs, outputs (structured risk assessment, gaps, recommendations), and the async option. Missing details like rate limits or constraints on HS code length, but overall complete given tool complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% so parameters are well-defined. The description reinforces the use of origin/destination countries and HS codes, and adds guidance for the async parameter (avoid timeout). However, it does not add significant new meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes non-tariff trade barriers (NTBs) across African corridors using specific data sources, with input of origin/destination countries and HS codes, and outputs barrier mapping with severity and mitigation strategies. This is specific and distinct from sibling tools like 'africa_trade_finance_esg_rater' or 'africa_trade_preference_arbitrage'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context (target user: COO) and notes the async option for timeout avoidance, but lacks explicit guidance on when to use this tool versus alternatives like trade barrier or finance tools. It does not mention exclusions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

africa_trade_finance_esg_raterA

Read-onlyIdempotent

Inspect

As a COO, evaluate ESG compliance of African trade finance providers using World Bank WITS trade statistics and CDP climate disclosure data. Input the financial institution's name or identifier, and receive an ESG rating with breakdown across environmental, social, and governance dimensions. Ideal for due diligence on trade partners or portfolio risk assessment. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`year`	No	Assessment year (2018-2023)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`countryCode`	No	ISO 2-letter country code (e.g., 'ZA' for South Africa)
`institutionName`	Yes	Full name of the trade finance provider (e.g., 'Standard Bank Group')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`warnings`	Yes
`esgRating`	Yes
`socialScore`	No
`tradeVolume`	No	Annual trade finance volume (USD)
`carbonIntensity`	No	CO2 emissions per million USD financed (tons)
`governanceScore`	No
`environmentalScore`	No

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, idempotentHint, openWorldHint. Description adds that the tool uses external data sources and returns a rating with breakdown. Notably includes async guidance: 'Pass async:true to avoid timeout', which is behavioral context beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four short sentences, each serving a distinct purpose: data sources and input, output format, use case, async guidance. No redundancy or filler. Front-loaded with key action (evaluate ESG compliance). Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (ESG rating using external data), the description adequately covers purpose, input, output, and async behavior. Output schema exists for details. No mention of error handling or rate limits, but not critical for a read-only tool with idempotency.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. Description adds meaning by clarifying async usage ('to avoid timeout') and specifying that institutionName expects 'Full name or identifier'. Provides context for year range implicitly by mentioning assessment year, but schema already defines range. Overall adds extra value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description explicitly states 'evaluate ESG compliance of African trade finance providers' using specific data sources (World Bank WITS, CDP climate disclosure data) and outputs 'an ESG rating with breakdown across environmental, social, and governance dimensions'. This clearly distinguishes it from sibling tools like supplier_esg_audit or esg_audit_multi by specifying target (African trade finance) and data sources.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

States 'Ideal for due diligence on trade partners or portfolio risk assessment', providing clear context for use. Lacks explicit when-not-to-use or explicit alternatives among siblings, but the description confines it to African trade finance providers, implying exclusion of other sectors.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

africa_trade_preference_arbitrageA

Read-onlyIdempotent

Inspect

Analyzes AGOA (African Growth and Opportunity Act) and EBA (Everything But Arms) trade preference arbitrage opportunities for COOs evaluating export strategies. Compares tariff rates, trade volumes, and preference utilization across eligible African countries using WITS and OECD trade data. Returns structured analysis of potential duty savings, market access advantages, and compliance requirements. — pass async:true REQUIRED to avoid x402 timeout.

ParametersJSON Schema

Name	Required	Description
`year`	No	Reference year for trade data
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`hs_code`	Yes	6-10 digit Harmonized System product code
`exporting_country`	Yes	ISO 2-letter country code of African exporter
`importing_country`	No	ISO 2-letter country code of target market (US/EU)
`preference_scheme`	No	Trade preference scheme to analyze

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`duty_savings_pct`	No	Estimated duty savings percentage under preference scheme
`trade_volume_usd`	No	Annual trade volume in USD for given HS code
`market_access_score`	No	Composite score of market access advantage (0-100)
`compliance_requirements`	No	List of compliance requirements for preference eligibility
`preference_utilization_rate`	No	Percentage of eligible exports utilizing preference

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint, idempotentHint, and openWorldHint. The description adds value by noting the potential for x402 timeout and requiring async:true, which is a behavioral trait beyond what annotations provide. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three sentences: main purpose, details, and a usage note. It front-loads the key information and avoids redundancy. However, it could be slightly tighter.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity, full schema coverage, and presence of an output schema, the description adequately covers purpose, data sources, output type, and critical async requirement. It is sufficiently complete for an agent to understand the tool's role.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters. The tool description does not add further meaning to individual parameters beyond the schema, but this is acceptable given the baseline of 3 for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it analyzes AGOA and EBA trade preference arbitrage opportunities for COOs, using specific data sources and returning structured analysis. It distinguishes the tool's purpose well, though it does not explicitly differentiate from sibling tools like 'africa_trade_preference_optimizer' or 'agoa_eba_intelligence'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the target user (COOs evaluating export strategies) and includes a critical usage note about passing async:true to avoid timeout. However, it does not provide explicit when-to-use or when-not-to-use guidance relative to similar tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

africa_trade_preference_optimizerA

Read-onlyIdempotent

Inspect

As a COO, analyze AGOA/EBA duty savings opportunities with HS code-level trade route optimization. Input origin country, destination country, and HS code to receive duty savings estimates, optimal trade routes, and preference utilization recommendations. Uses UN Comtrade trade flow data, WCO tariff schedules, and African Union trade agreement rules. Ideal for export market evaluation, supply chain optimization, and trade agreement compliance analysis. Keywords: AGOA, EBA, duty savings, trade optimization, HS code, African trade, export strategy.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`hsCode`	Yes	6-10 digit Harmonized System code (e.g., '010121' for live horses)
`quantity`	No	Estimated annual export quantity in units
`valueUsd`	No	Estimated annual export value in USD
`originCountry`	Yes	ISO 3166-1 alpha-3 country code of export origin (e.g., 'KEN' for Kenya)
`destinationCountry`	Yes	ISO 3166-1 alpha-3 country code of import destination (e.g., 'USA' for United States)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`dutySavings`	No	Estimated annual duty savings in USD under optimal preference program
`optimalRoute`	No
`alternativeRoutes`	No
`complianceWarnings`	No	Potential compliance risks or documentation requirements

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, openWorldHint=true, and idempotentHint=true, indicating safe, read-only behavior. The description adds no new behavioral traits beyond stating it 'analyzes', which aligns with annotations but does not provide additional detail.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (approximately 70 words) and front-loaded with the primary purpose. Subsequent sentences add value with data sources and use cases without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, 3 required, with output schema), the description adequately covers purpose, inputs, and sources. It does not need to explain return values due to the presence of an output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and parameter descriptions are already detailed. The description reinforces the role of HS codes but adds no new semantic information beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: analyze AGOA/EBA duty savings opportunities with HS code-level trade route optimization. It specifies inputs and outputs and distinguishes from sibling tools like africa_trade_barrier_breaker and africa_trade_preference_arbitrage by focusing on optimization and duty savings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context for when to use the tool, such as 'export market evaluation, supply chain optimization, and trade agreement compliance analysis,' but does not explicitly mention when not to use it or compare with alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

agoa_eba_intelligenceA

Read-only

Inspect

Intelligence préférentielle AGOA (US→Africa) et EBA/GSP (EU→Africa). Vérifie l'éligibilité d'un pays africain aux programmes tarifaires préférentiels, l'éligibilité d'un produit par code HS, identifie les meilleures opportunités d'export Afrique→US/EU, et fournit les règles de conformité (rules of origin, valeur ajoutée, docs). Différenciateur Africa diaspora : 39 pays AGOA + 47 LDCs EBA encodés. Sources : AGOA.info · EU EBA · EU GSP+ · WTO Tariff · UN Comtrade.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Mode d'analyse : 'country_eligibility' (statut AGOA/EBA/GSP d'un pays africain) \| 'product_eligibility' (éligibilité d'un produit par code HS) \| 'trade_opportunity' (top opportunités export Afrique→US/EU) \| 'compliance_check' (rules of origin, seuils valeur ajoutée, documentation)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`hs_code`	No	Code HS (Harmonized System) 6+ chiffres (requis pour product_eligibility). Exemple : '620342' = pantalons coton homme, '090111' = café arabica non torréfié, '060310' = fleurs fraîches.
`country_iso`	No	Code ISO 2-lettres du pays africain (requis pour country_eligibility). Exemples : KE=Kenya, NG=Nigeria, ZA=Afrique du Sud, ET=Éthiopie, LS=Lesotho, GH=Ghana.
`destination`	No	Marché de destination pour trade_opportunity : 'US', 'EU', ou 'both' (défaut). Ignoré pour les autres modes.

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, so the safety profile is clear. The description adds value by detailing the tool's sources (AGOA.info, EU EBA, etc.) and the scope of data (39 AGOA countries, 47 LDCs). It does not contradict annotations and provides useful context about the tool's knowledge base.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph that efficiently conveys the tool's purpose, capabilities, and differentiators. It is front-loaded but could be slightly more concise (3 sentences). All information is relevant and earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, 4 modes, trade data from multiple sources), the description is fairly complete. It covers all modes, mentions data sources and country counts, and notes the tool's specialization. It lacks explicit guidance on mode selection, but the schema covers that. Overall, an agent can understand the tool's capabilities and context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage for all 5 parameters. The description does not add new meaning beyond what the schema provides; it only generally mentions the four modes. The parameter descriptions in the schema are already detailed (e.g., enum values, examples for hs_code and country_iso). Thus, the description adds minimal semantic value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose (intelligence on AGOA/EBA/GSP preferential trade programs for Africa) and lists specific capabilities: checking country/product eligibility, identifying export opportunities, and providing compliance rules. It also distinguishes itself from siblings by specifying the geographic scope (Africa→US/EU) and the number of encoded countries (39 AGOA + 47 LDCs EBA).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains what the tool does but does not explicitly state when to use it versus alternatives. While it mentions 'Différenciateur Africa diaspora' to hint at its focus, there is no guidance on when not to use it or which sibling tool to choose for related but different tasks (e.g., tariff_arbitrage_finder, africa_trade_preference_arbitrage).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_act_incident_responseA

Read-onlyIdempotent

Inspect

Generates EU AI Act incident response playbooks with regulator notification templates for risk management teams. Inputs include incident severity, AI system type, and affected stakeholders. Outputs structured playbook steps, regulator notification drafts, and compliance checklists. Essential for high-risk AI system breaches requiring formal EU notification — pass async:true REQUIRED to avoid x402 timeout. Keywords: AI Act compliance, incident response, regulator notification, risk management, ISO 27035, NIST SP 800-61.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`severity`	Yes
`incident_type`	Yes
`ai_system_type`	No
`incident_description`	No
`affected_stakeholders`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`next_steps`	No
`playbook_steps`	No
`compliance_checklist`	No
`regulator_notification`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint and idempotentHint. Description adds critical async requirement and mentions potential x402 timeout, which is valuable behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences plus keywords. Front-loaded with core purpose and target audience. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Has output schema to cover return values, description explains key inputs and async need. For a 6-param tool with 3 enums, it provides sufficient context for selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 17% schema coverage, description lists key inputs (severity, AI system type, affected stakeholders) but does not detail all parameters like incident_type or incident_description. Adequate but not comprehensive.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it generates EU AI Act incident response playbooks with regulator notification templates for risk management teams. Distinct from sibling tools like ai_act_training_data_audit and incident_response_evidence_collector.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly identifies use case: high-risk AI system breaches requiring formal EU notification. Also mandates async usage to avoid timeouts. Lacks explicit when-not-to-use or alternatives, but clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_act_sandbox_regulatory_sandboxA

Read-onlyIdempotent

Inspect

A legal-focused tool for simulating EU AI Act regulatory sandbox submissions. Provides structured feedback on compliance, risk levels, and required documentation based on EUR-Lex and OECD AI Policy Observatory sources. Accepts AI system descriptions, intended use cases, and technical specifications as input. Returns detailed assessment with warnings, citations, and actionable recommendations for legal teams and AI developers.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`sector`	No	Primary sector of application
`riskLevel`	Yes	Self-assessed risk level of the AI system
`intendedUse`	Yes	Primary and secondary use cases of the AI system
`documentation`	No	List of provided documentation types (e.g., 'technical', 'ethical', 'data')
`systemDescription`	Yes	Detailed description of the AI system including purpose, architecture, and data sources

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`assessment`	No

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds context about using EUR-Lex and OECD sources but does not disclose additional behavioral traits such as rate limits or that it is a simulation, not an official determination.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences covering purpose, sources, and input/output. Efficient and front-loaded with key information, no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and presence of an output schema, the description covers core aspects. It could mention that it is a simulation and not legally binding, but overall it is sufficient for an agent to understand the tool's role.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with all parameters described. The description loosely maps inputs (systemDescription, intendedUse, riskLevel) but adds minimal new meaning beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool simulates EU AI Act regulatory sandbox submissions and provides structured feedback on compliance, risk levels, and documentation. It distinguishes itself from sibling tools like ai_act_incident_response and ai_act_training_data_audit by focusing on sandbox simulation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for legal teams and AI developers but does not explicitly state when to use this tool versus alternatives like ai_act_incident_response or ai_act_training_data_audit. No exclusions or when-not scenarios are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_act_training_data_auditA

Read-onlyIdempotent

Inspect

As a CTO, audit AI training datasets for EU AI Act compliance with bias detection and regulatory risk assessment. Inputs: dataset identifier (Hugging Face ID or URL) and optional risk thresholds. Outputs: compliance score, bias metrics, regulatory warnings, and source references. Ideal for pre-deployment risk evaluation. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`dataset_id`	Yes	Hugging Face dataset identifier or direct URL to dataset
`risk_threshold`	No
`include_bias_metrics`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`bias_metrics`	No
`compliance_score`	No
`dataset_metadata`	No
`regulatory_warnings`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, idempotentHint. Description adds specifics: bias detection, regulatory risk assessment, and outputs (compliance score, bias metrics, warnings, references). Also mentions async behavior. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with no wasted words, front-loaded with the main purpose, and structured as clear sentences covering inputs, outputs, and usage tip.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 4 parameters, output schema exists, and complexity, the description covers essential context: inputs, outputs, use case, and async advice. No gaps for agent decision-making.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 50%. Description explains dataset_id (Hugging Face ID or URL) and optional risk_threshold, but does not detail async or include_bias_metrics. Adds meaning beyond schema for dataset_id but not fully.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it audits AI training datasets for EU AI Act compliance with bias detection and regulatory risk assessment, and distinguishes from sibling tools like ai_act_sandbox_simulator and ai_governance_full_report by focusing on training data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context of use: 'Ideal for pre-deployment risk evaluation' and advises using async to avoid timeout. Does not explicitly state when not to use or list alternatives, but gives clear usage direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_governance_full_report_asyncA

Read-only

Inspect

Audit EU AI Act complet (Règlement UE 2024/1689) — implémentation native audit-grade. Classifie le système IA selon les 4 tiers de risque (unacceptable/high_risk/limited_risk/minimal_risk/gpai) sur la base de l'Annexe III et de l'Article 5. Produit : (1) classification tier + justification + articles applicables, (2) checklist conformité Articles 9-15 + 50 + 53-55, (3) gaps documentation Annexe IV, (4) mapping ISO 42001, (5) deadlines EU AI Act 2025-2029, (6) estimation coût et effort, (7) top 10 recommandations P0/P1/P2. Retourne immédiatement (<300ms) un job_id. Poller avec ai_governance_full_report_result(job_id) après eta_seconds (~90s). Cache 7 jours pour inputs identiques. Async tool — register a webhook via webhooks_manage(register, url, [job.completed]) to receive callbacks instead of polling. Faster + lighter. DISCLAIMER : non substitutif à un avis juridique professionnel.

ParametersJSON Schema

Name	Required	Description
`company_size`	No	Taille entreprise : startup (≤50), smb (51-250), mid (251-1000), large (1001-5000), enterprise (>5000)
`data_sources`	No	Sources de données utilisées par le système IA
`affected_persons`	No	Catégories de personnes affectées par les décisions du système (ex: candidats, employés, clients)
`geographic_scope`	No	Zones géographiques de déploiement (ex: 'EU', 'France', 'Global')
`intended_purpose`	Yes	Finalité prévue du système IA : à quoi sert-il concrètement
`deployment_context`	No	Contexte de déploiement : interne (usage employés), public, B2B, B2C
`ai_system_description`	Yes	Description détaillée du système IA : ce qu'il fait, comment il fonctionne, quelles décisions il prend

Output Schema

ParametersJSON Schema

Name	Required	Description
`job_id`	Yes	Identifiant unique du job — passer à ai_governance_full_report_result
`status`	Yes
`eta_seconds`	Yes	Durée estimée avant disponibilité du résultat
`submitted_at`	Yes	Timestamp ISO-8601 de soumission

Tool Definition Quality

A3.7/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotation `readOnlyHint: true` contradicts the description, which indicates the tool submits an async job and triggers computation, thus modifying state (creating a job and caching results). The description does not clarify this discrepancy, making it misleading for an agent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the main purpose and provides essential details in a structured bullet list. While slightly verbose in French, it is well-organized and each sentence adds value. Could be slightly trimmed but efficient overall.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (async, webhook, output schema exists), the description covers all necessary aspects: what it does, what it returns initially, how to get results, caching, and optional webhook integration. It explains the 7 output components without relying on output schema details.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds little meaning beyond the schema—it mentions required parameters but does not elaborate on format, constraints, or usage tips for any parameter. No extra semantic value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs a full EU AI Act audit, classifies risk tiers, and produces 7 specific outputs. It distinguishes itself from the sibling tool `ai_governance_full_report_result` by being the async submission endpoint, and from `ai_governance_pilot` by being the full report.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs to poll with `ai_governance_full_report_result` or register a webhook via `webhooks_manage` for callbacks. Mentions cache duration (7 days) and provides a disclaimer. Offers clear when-to-use guidance compared to synchronous alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_governance_full_report_resultA

Read-onlyIdempotent

Inspect

Poll the result of an ai_governance_full_report_async job. Returns status=pending while running, status=completed with the full EU AI Act governance audit report once done (risk_tier, compliance checklist Articles 9-15/50/53-55, Annex IV documentation gaps, ISO 42001 alignment, deadlines 2025-2029, cost estimate, top-10 recommendations P0/P1/P2, compliance_score), status=failed on error, or status=not_found if the job_id is unknown or expired (TTL 24h). Call this after the eta_seconds hint returned by ai_governance_full_report_async (~90s).

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	Yes	The job_id returned by ai_governance_full_report_async (prefix: aigfr_)

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already show readOnlyHint, idempotentHint, destructiveHint. Description adds specific statuses (pending, completed, failed, not_found), expiration TTL (24h), and content of completed report, going well beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences cover purpose, statuses, content, and usage hint. Front-loaded with purpose, no redundancy. Every part earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With output schema existing, description covers polling behavior, statuses, and hints. Schema coverage is 100% for the single parameter. Complete for a poll-result tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Only one parameter (job_id) with 100% schema coverage. Description adds the prefix 'aigfr_' for validation, adding value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it polls the result of an async job, distinguishing it from the async version and other tools. Verb 'Poll' is specific, and resource 'result of ai_governance_full_report_async' is unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises calling after the eta_seconds hint from the async tool (~90s). Explains statuses and expiration (24h TTL). Lacks explicit 'when not to use', but context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ai_governance_pilotC

Read-only

Inspect

Pilotage de gouvernance IA — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: TalentScope SAS — scoring IA candidats RH (EU AI Act Annex III §4, high-risk). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`aiUseCases`	Yes
`targetFrameworks`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, indicating no side effects. The description adds that inputs are validated server-side and returns a deliverable, but does not disclose rate limits, authentication, or failure behaviors. It is consistent with annotations and adds some context, so a score of 3 is appropriate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (two sentences) and front-loads the name, but the jargon and lack of structure reduce clarity. It is concise but not particularly informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of sibling tools and complex input schema with no output schema, the description is incomplete. It does not explain the output structure, how 'pilot' differs from 'full_report', or how to handle the async option. The agent lacks context to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only 'async' parameter described). The description does not explain the complex nested parameters (company, aiUseCases, targetFrameworks) beyond a vague 'send the documented case fields'. The reference case provides a hint but not a clear mapping to input fields.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool performs AI governance piloting and returns a structured, audited deliverable, with a specific reference case. The verb 'pilotage' and mention of 'RISK' indicate a risk-focused assessment, distinguishing it from full report tools. However, the exact nature of the output is not fully clear, and the jargon 'Gapup agent-payable C-suite expertise' may confuse agents.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus siblings like 'ai_governance_full_report_async' or 'ai_act_training_data_audit'. The description mentions 'reference case' but does not specify optimal scenarios or prerequisites. An agent would struggle to decide between governance tools based on this description.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

anti_demissions_hrC

Read-only

Inspect

Bouclier anti-démissions — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Buffer Inc — détection des at-risk parmi 80 FTEs (Q1 2026). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`signals`	Yes
`employees`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, and the description confirms a read-only operation by stating it 'returns a structured, audited deliverable.' It also notes server-side validation, adding some behavioral context. However, it does not disclose the deliverable format, pagination, or any side effects, which would be desirable given no output schema.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences of moderate length, but the inclusion of a reference case (Buffer Inc) could be considered extraneous. It is front-loaded with the title line but lacks clear structure (e.g., bullet points or separate sections). Adequate but not exemplary.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, 3 required, nested objects, no output schema), the description is insufficient. It does not explain the return format, result interpretation, or the meaning of the deliverable. Without an output schema, the description should cover these; it does not.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only 'async' and 'focus' have descriptions). The description adds no parameter-level details, only a vague instruction to 'send the documented case fields.' It fails to compensate for the missing schema descriptions of the three required parameters (company, signals, employees).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool returns a structured deliverable for detecting at-risk employees ("détection des at-risk parmi 80 FTEs") and references a specific case. It distinguishes from siblings like 'churn_defender' by targeting C-suite retention. However, the French phrasing may reduce clarity for English agents, and the purpose could be more explicitly stated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It only mentions server-side validation and to 'send the documented case fields,' which does not help the agent decide between this and sibling tools like 'talent_intelligence' or 'enps_auto'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

arbitration_awards_lookupA

Read-onlyIdempotent

Inspect

Commercial arbitration intelligence for litigation lawyers, M&A due diligence teams, sovereign wealth funds and trade finance compliance. Covers 8 major institutions: ICC, AAA, LCIA, HKIAC, SIAC, CIETAC, DIAC, ICDR.

Three modes: • party_lookup — find awards by party name (searches 20 landmark public awards + JusMundi best-effort) • institution_index — browse awards and caseload stats per institution with date range filter • clause_check — audit an arbitration clause for missing elements (institution, seat, language, arbitrator count, governing law, binding nature)

Note: Most arbitration awards are confidential. This tool surfaces public awards (Yukos, Crystallex, Achmea, etc.) plus redacted statistics from institutional annual reports. Private awards are not accessible.

Cache: 24h (arbitration data is very stable). No API key required.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	party_lookup: search by party name or keyword. institution_index: browse awards by institution + stats. clause_check: audit an arbitration clause for issues.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	For party_lookup: party name or keyword (e.g. "Yukos", "Russia"). For institution_index: institution name or keyword. For clause_check: full text of the arbitration clause to audit.
`date_to`	No	ISO date filter to (YYYY-MM-DD). Applied to award_date.
`date_from`	No	ISO date filter from (YYYY-MM-DD). Applied to award_date.
`institution`	No	Filter by institution. Default 'all'.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`awards`	No
`status`	Yes
`sources`	Yes
`clause_check`	No
`quality_score`	Yes
`institution_stats`	No

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint, and destructiveHint false. Description adds valuable behavioral context: 24-hour cache, no API key required, data sources (public awards, redacted stats), and the note about private awards not being accessible. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is well-structured with a concise overview, bulleted mode list, and clear notes on confidentiality and cache. Every sentence adds value, and the most important information is front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, 2 enums, output schema exists), the description covers purpose, modes, data sources, limitations, cache, and authentication comprehensively. It is complete enough for effective tool selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description adds examples for the 'query' parameter (e.g., 'Yukos', 'Russia') and explains the three 'mode' enum values in context. This provides incremental value over the schema descriptions, but the schema already describes parameters adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool provides 'commercial arbitration intelligence' with specific audience and coverage of 8 major institutions. It explicitly lists three distinct modes, making the purpose unambiguous and differentiated from sibling tools like legal_clause_extractor.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description explicitly states target users (litigation lawyers, M&A teams, etc.) and provides detailed instructions for each mode. It also notes the limitation that most awards are confidential and provides cache and authentication info, giving clear guidance on when and how to use the tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

attack_surface_monitorC

Read-only

Inspect

Surveillance surface d'attaque — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Answers: Which Internet-facing assets of combine a critical CVE, an exposed service, and no WAF — top findings to fix in 14 days? · What is the attack surface of : subdomains, open ports, SSL/TLS grades, and associated CVEs? · Give me a CISO-ready ASM report with blast radius estimate and SLA-driven remediation plan for . · What is the email phishing risk for ? Assess SPF/DMARC posture and recommend improvements. · During M&A due diligence, what are the top cyber exposures on 's Internet-facing infrastructure? Reference case: Velora Payments — 8 assets exposés · 2 critiques (CVE-2023-44487 HTTP/2 RapidReset, Admin panel ouvert) · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`domain`	Yes
`exclusions`	No
`scope_cidrs`	No
`include_email_surface`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that it returns a structured, audited deliverable and that inputs are validated server-side, but provides no details on auth needs, rate limits, or side effects beyond what annotations imply.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose and unstructured, mixing French and English with bullet points in a single paragraph. It includes a reference case that is not essential, sacrificing conciseness for detail.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema and 6 parameters with low coverage, the description fails to explain the return structure, pagination, or how to interpret the deliverable. It remains incomplete for a tool of this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 17% schema description coverage, parameters like domain and focus are mentioned only indirectly via examples (e.g., 'domain appears in questions). Parameters exclusions, scope_cidrs, and async lack any explanation in the description, leaving the agent underinformed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly indicates the tool monitors attack surface for a domain, listing concrete questions it answers (e.g., assets with critical CVEs, email phishing risk). However, it lacks a succinct verb and does not differentiate from siblings like cyber_risk_auditor or vuln_exploitability_forecast.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Examples of when to use the tool are provided (e.g., M&A due diligence, CISO report), but no explicit when-not or alternative tools are mentioned. The context implies security assessment but leaves exclusion unclear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

audit_pre_flightC

Read-only

Inspect

Pré-audit comptable — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Spendesk — Pré-audit commissaire · Readiness 74/100 · 4 findings critiques · Checklist 18 docs. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`audit`	Yes
`company`	Yes
`systems`	Yes
`financials`	Yes
`knownIssues`	Yes

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark this as readOnly and openWorld. The description adds little beyond a reference case and server-side validation note. No disclosure of auth needs, rate limits, or behavioral constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is reasonably concise (~50 words) but contains jargon ('Gapup agent-payable C-suite expertise') that may confuse an agent. The structure is a single paragraph with no clear separation of purpose, usage, or examples.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (6 params, nested objects, no output schema), the description is incomplete. It does not clarify the deliverable structure, required fields priority, or success criteria for the pre-flight audit.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (17%), and the description does not explain any parameter semantics. It merely states 'send the documented case fields', relying entirely on the schema, which is insufficient given the six parameters with nested objects.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description mentions 'Pré-audit comptable' and 'returns a structured, audited deliverable', giving a general sense of a pre-audit readiness check. However, it lacks a precise verb-resource statement and does not differentiate from hundreds of sibling tools, many of which could relate to audits or finance.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The only usage hint is 'send the documented case fields', which is an instruction on input, not a selection criterion.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

banking_fee_negotiatorA

Read-onlyIdempotent

Inspect

As a CFO-focused tool, banking_fee_negotiator analyzes your bank's fee structures (account maintenance, wire transfers, credit lines) and provides data-driven negotiation recommendations. Input your current fees and bank details to receive benchmark comparisons from World Bank and ECB SDW, along with specific levers to reduce costs. Ideal for optimizing treasury operations and improving financial efficiency. Keywords: bank fees, cost optimization, treasury management, financial benchmarking, negotiation strategy.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`industry`	No	Industry classification (e.g., 'manufacturing', 'retail')
`bank_country`	Yes	ISO 2-letter country code of the bank
`credit_line_fee`	No	Current annual credit line fee percentage
`wire_transfer_fee`	No	Current domestic wire transfer fee in USD
`international_wire_fee`	No	Current international wire transfer fee in USD
`account_maintenance_fee`	Yes	Current monthly account maintenance fee in USD

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`negotiation_levers`	No
`credit_line_benchmark`	No	Industry benchmark for credit line fees percentage
`wire_transfer_benchmark`	No	Regional benchmark for domestic wire transfer fees in USD
`international_wire_benchmark`	No	Regional benchmark for international wire transfer fees in USD
`account_maintenance_benchmark`	No	Regional benchmark for account maintenance fees in USD

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, idempotentHint, and openWorldHint. The description adds valuable behavioral context: it uses external data sources (World Bank, ECB SDW) and provides specific negotiation levers. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three concise sentences plus a keyword line. It is front-loaded with the purpose and clearly structured. The keyword line is slightly redundant but not detrimental.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read-only analysis tool with output schema, the description adequately covers inputs (fees, country), process (benchmarking), and output (recommendations). The async parameter is explained in the schema. No major gaps given the tool's nature.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed parameter descriptions. The tool description adds minimal extra meaning beyond the schema—it mentions input fees and bank country context but does not elaborate on syntax or constraints. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes bank fee structures and provides negotiation recommendations. It specifies inputs (current fees, bank details) and outputs (benchmark comparisons, cost reduction levers). The CFO-focused positioning and keywords differentiate it from sibling tools like treasury_optimizer.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when one has bank fees and wants to negotiate, and states it's ideal for treasury optimization. However, it does not explicitly mention when not to use this tool or suggest alternative tools for related tasks (e.g., broader treasury analysis).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

battle_cards_liveC

Read-only

Inspect

Fiche de combat live — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub vs McKinsey Lilli — Deal SaaS B2B €500k · Win rate +11 pts · 6 objections clés armées. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`ourOffer`	Yes
`competitor`	Yes
`dealContext`	Yes
`knownWeaknesses`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true. Description adds that it is 'live', 'audited', and inputs are validated server-side, but these are minor additions. No destructive hints or further behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is short and front-loaded with the key term. However, the reference case adds length without essential clarity. Could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema and the description only vaguely mentions a 'structured, audited deliverable'. For a tool with nested objects and 5 parameters, more detail on expected outputs and parameter relationships is needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%. The tool description does not explain the parameters beyond 'send the documented case fields'. Required fields like 'ourOffer', 'competitor', 'dealContext' lack semantic guidance. Async is explained in schema but not in description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states 'Fiche de combat live' (live battle card) and mentions returning a structured, audited deliverable for C-suite expertise. However, it does not differentiate from sibling tools like 'battle_plan', and the purpose could be more specific about the exact output.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives such as 'battle_plan' or 'competitive_deep_dive'. The reference case is illustrative but does not set usage boundaries or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

battle_planC

Read-only

Inspect

Plan de bataille marketing — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Q3 2026 · Budget €120k · Pipeline €800k · 5 chantiers prioritaires. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`quarter`	Yes
`teamSize`	Yes
`arrTarget`	Yes
`budgetEur`	Yes
`arrCurrent`	Yes
`companyName`	Yes
`topChannels`	Yes
`icpDescription`	Yes
`currentBlockers`	Yes
`primaryObjective`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint and openWorldHint. The description adds that inputs are validated server-side and returns an audited deliverable, which is consistent and provides some behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief (two sentences) and front-loaded with the tool's purpose. The second sentence adds context about reference and validation, but the overall structure is effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 11 parameters, no output schema, and limited description, the tool definition lacks completeness. It does not describe the deliverable's format, success criteria, or how to interpret results, leaving significant gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 9% schema description coverage and 11 parameters, the description does not explain any parameter meanings. It only references 'documented case fields' without elaboration, failing to compensate for the schema's lack of descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function as a marketing battle plan that returns a structured deliverable, referencing a specific case. It distinguishes itself as a C-suite expertise tool, but does not explicitly differentiate from siblings like 'brand_builder' or 'marketing_roi_dashboard'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or when-not-to-use guidance is provided. The reference case implies a scenario but does not mention alternatives among the many marketing-related sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bias_amplification_trackerA

Read-onlyIdempotent

Inspect

Tracks bias amplification in LLM outputs by analyzing fairness metrics from HuggingFace's model leaderboard. Designed for risk assessment personas to detect and quantify demographic, gender, or racial bias amplification in generated text. Accepts model identifiers or output samples, returns structured bias metrics and amplification trends.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`modelId`	No	HuggingFace model identifier (e.g., 'facebook/opt-1.3b')
`outputSamples`	No	Array of LLM output strings to analyze for bias amplification
`demographicGroups`	No	Specific demographic groups to monitor (e.g., ['gender', 'race'])

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`biasMetrics`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint. The description adds that it uses HuggingFace's leaderboard and returns structured metrics and trends, providing useful context beyond annotations. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, no wasted words, front-loaded with the primary action. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 4 parameters with full schema coverage and an output schema, the description adequately covers purpose, inputs, and output nature. It is complete for an agent to understand and invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description says 'Accepts model identifiers or output samples' which maps to modelId and outputSamples, but adds little meaning beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool tracks bias amplification in LLM outputs using fairness metrics from HuggingFace's model leaderboard. It specifies the target persona (risk assessment) and differentiates from siblings like hallucination_confidence_meter by focusing on demographic/gender/racial bias.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates it's designed for risk assessment personas and accepts model identifiers or output samples. While it doesn't explicitly exclude alternatives, the context is clear enough for an agent to decide when to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bond_covenant_esg_compliance_checkerA

Read-onlyIdempotent

Inspect

As a CFO, quickly assess whether your bond covenants meet ESG compliance standards set by BIS and ECB. This tool analyzes covenant text against regulatory benchmarks, identifying potential ESG-related risks in carbon emissions, governance practices, and social impact clauses. Input bond covenant details and receive structured compliance insights with source references. Ideal for pre-issuance due diligence or ongoing monitoring of existing bond portfolios.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`couponType`	No	Type of bond coupon
`covenantText`	Yes	Full text of the bond covenant to analyze
`issuerSector`	No	Industry sector of the bond issuer (e.g., energy, finance)
`jurisdiction`	No	Legal jurisdiction governing the bond (e.g., EU, US)
`maturityDate`	No	Maturity date of the bond

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`riskAreas`	No
`complianceScore`	No
`recommendations`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint, which the description supports. The description adds valuable behavioral context, such as analyzing against regulatory benchmarks and returning structured compliance insights with source references, but does not mention the optional async parameter or detailed output format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, front-loaded with purpose and role, and contains no unnecessary information. Each sentence adds value: target user, what it does, and when to use it.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and full parameter coverage, the description provides sufficient context for an agent to understand the tool's functionality and decide on its use. It covers purpose, inputs, and outputs, but could be improved by noting the async behavior or filtering capabilities.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters. The description does not add extra meaning beyond the schema (e.g., how parameters like issuerSector influence analysis), but it does not need to since the schema is complete. A score of 3 reflects the baseline expectation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to assess bond covenant ESG compliance against BIS and ECB standards. It identifies specific risk areas (carbon emissions, governance, social impact) and distinguishes from similar siblings like 'bond_covenant_monitor' by focusing on ESG compliance.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description positions the tool for CFOs and suggests use cases ('pre-issuance due diligence or ongoing monitoring'), but it does not provide explicit guidance on when not to use it or mention alternative tools. The usage context is implied rather than clearly defined.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bond_covenant_monitorA

Read-onlyIdempotent

Inspect

As a CFO, monitor bond covenant compliance by analyzing leverage ratios (debt-to-equity, debt-to-EBITDA) and interest coverage ratios using real-time financial data. Input a company's ticker symbol and optional covenant thresholds to receive compliance status, key financial metrics, and SEC filing references. Ideal for proactive debt management and regulatory compliance tracking. Keywords: bond covenants, leverage ratio, interest coverage, debt compliance, SEC filings, financial health.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`ticker`	Yes	Company ticker symbol (e.g., 'AAPL')
`covenantThresholds`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`warnings`	Yes
`debtToEquity`	No
`leverageRatio`	No
`lastFilingDate`	No
`complianceStatus`	Yes
`interestCoverage`	No
`nextFilingDeadline`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds context beyond annotations by mentioning use of real-time data and output including compliance status, metrics, and SEC references. It aligns with readOnlyHint and idempotentHint. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured paragraph that front-loads the core purpose. It includes keywords for searchability without being verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, the description sufficiently covers inputs, ratios analyzed, and intended use case. It does not discuss limitations or prerequisites, but the information is adequate for typical use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description explains the ticker and optional covenant thresholds (e.g., leverage and coverage ratios), adding meaning beyond the schema. The async parameter is not highlighted, but schema coverage is moderately high.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it monitors bond covenant compliance by analyzing specific ratios (leverage and interest coverage) using real-time financial data. It distinguishes itself from sibling tools like 'bond_issuance_optimizer' by focusing on monitoring rather than optimization.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly targets CFOs for proactive debt management and regulatory compliance, indicating when to use it. However, it does not specify when not to use or mention alternative tools, which could improve clarity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

bp_narratifC

Read-only

Inspect

Business Plan narratif — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Stripe Series A 2012. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`raise`	Yes
`company`	Yes
`keyMetrics`	Yes

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already state readOnlyHint and openWorldHint. The description adds that inputs are validated server-side and returns an audited deliverable, which is helpful but does not fully disclose behavior like execution time, async support (hinted by async param), or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief (3 sentences) but the first sentence is redundant with the tool name. It front-loads a marketing phrase ('Gapup agent-payable C-suite expertise') but lacks clear structure. Could be more direct.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given nested input objects and no output schema, the description fails to explain the returned deliverable structure or how to interpret results. Incomplete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 25% schema description coverage, the description should compensate but only says 'send the documented case fields' without explaining specific parameters. No additional semantics beyond the schema are provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a 'structured, audited deliverable' and references Stripe Series A 2012, giving some sense of purpose. However, it does not clearly distinguish from sibling tools like 'ftg_business_plan' or specify the exact output format.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The reference case is implicit but not actionable. No when-not-to-use or prerequisite information provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

brand_builderC

Read-only

Inspect

Architecte de marque — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Pennylane — brand identity SaaS fintech B2B FR/EU (2023). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`brand`	Yes
`target`	Yes
`founder`	Yes
`existingAssets`	No

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint. The description adds minimal context ('audited deliverable', 'inputs validated server-side') but does not disclose significant behavioral traits beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise but mixes French and English, and lacks structured formatting. It includes a helpful reference case but could be more organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested objects, 5 parameters, no output schema), the description is insufficient. It does not explain the return format, error handling, or provide usage examples.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, and the description does not explain the parameters. It merely says 'send the documented case fields' without adding meaning to the nested objects (founder, brand, target, etc.).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a brand architect tool for C-suite CMO expertise, returning a structured deliverable. However, it does not explicitly differentiate from sibling tools like 'positioning_strategist' or 'brand_equity_voice_share_calculator'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. It briefly mentions C-suite CMO expertise but lacks when-not-to-use or alternative references.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

brand_equity_voice_share_calculatorA

Read-onlyIdempotent

Inspect

Calculates brand equity voice share for CMOs by analyzing mentions across 500K+ news articles and forums from Common Crawl and Wayback Machine. Inputs include brand name, competitors, and time range. Outputs voice share percentage, sentiment distribution, and top sources. Ideal for competitive benchmarking and brand visibility tracking. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`brand`	Yes
`time_range`	Yes
`competitors`	No
`include_forums`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`top_sources`	No
`total_mentions`	No
`brand_voice_share`	No
`sentiment_distribution`	No
`competitor_voice_shares`	No

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, openWorldHint, and idempotentHint, so the safety profile is covered. The description adds useful behavioral context: it mentions using Common Crawl and Wayback Machine, and warns about potential timeouts via the async hint. However, it does not detail rate limits, pagination, or whether results are cached, which would enhance transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three concise sentences: first defines the action, second lists inputs and outputs, third states use case and a tip. No fluff, every sentence adds value. Well-structured and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 params, nested objects, output schema), the description covers main purpose, data sources, outputs, and a usage hint. However, it misses the include_forums parameter entirely. Since output schema exists, return values need not be explained, but the missing parameter is a notable gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only async and time_range have descriptions). The description lists three inputs (brand, competitors, time_range) but provides no additional meaning beyond their names. It omits the include_forums parameter entirely. While it partially compensates for the low coverage, it does not fully explain parameter formats or behaviors.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool calculates brand equity voice share for CMOs, specifies data sources (500K+ articles from Common Crawl and Wayback Machine), and lists exact outputs (voice share percentage, sentiment distribution, top sources). This distinguishes it from siblings like brand_builder or competitive_deep_dive by focusing on voice share metrics.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context: 'Ideal for competitive benchmarking and brand visibility tracking.' It also gives a practical tip ('Pass async:true to avoid timeout'), but does not explicitly state when not to use this tool or mention alternatives, leaving room for ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

budget_variance_aiA

Read-only

Inspect

Analyse d'écart budgétaire — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Answers: Explain the key drivers of the budget vs actual variance for in — what are the top 10 narrative explanations? · Which cost categories drove the budget overrun for in , and what corrective actions should management take? · Revise the Q4 forecast based on observed Q3 variances for — give me 3 scenarios (base, optimistic, conservative). · Prepare a board-ready budget variance memo for — , budget €M vs actual €M, with management actions. · What are the quick wins to reduce budget overspend for by end of quarter without impacting growth targets? Reference case: Doctolib Q3 2026 — budget €38.5M vs actual €41.2M (+7.0%) — cloud + headcount + deals timing. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`entity`	Yes
`budgetContext`	Yes

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and openWorldHint=true, indicating the tool is read-only and handles unknown inputs. The description adds behavioral context: it is an AI-powered CFO-level analysis, returns a structured deliverable, supports async execution, and validates inputs server-side. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the main purpose but is verbose, containing multiple example questions and a reference case. Some sentences are list-like, and the overall length could be reduced without losing clarity. It earns its sentences, but not all are essential.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (4 parameters, nested objects, no output schema), the description provides a good overview of capabilities and examples but lacks detail on return format (only 'structured, audited deliverable') and does not explain all parameters or their usage (e.g., focus, notes). It is adequate but not fully complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (25%), with 4 parameters: async, focus, entity, budgetContext. The description only hints at some fields (company, period, budget amounts) via examples but does not explicitly map them to parameters or explain their semantics beyond what the schema provides. This fails to compensate for the low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs budget variance analysis and provides a structured, audited deliverable. It gives example questions and a reference case, distinguishing it from siblings like margin_doctor_finance or financial_model_3statement by focusing specifically on budget vs. actual variance with narrative explanations, corrective actions, and forecasting scenarios.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description outlines when to use the tool (for variance analysis, driver explanations, corrective actions, forecasting, board memos, quick wins) and mentions input validation. However, it does not explicitly state when not to use it or provide alternatives from the sibling list, so guidance on exclusions is absent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

candidate_screening_rankingA

Read-onlyIdempotent

Inspect

AI-powered candidate screening and ranking for recruiters, hiring managers, ATS providers and recruitment AI agents. Ingests a job description and 1-50 candidate resumes, returning a ranked shortlist with score breakdowns across five weighted criteria: skills_match (tech stack and soft skills extracted from JD vs resume), experience_match (years vs seniority level inferred from JD), education_match (degree level + top-school detection), role_progression (Junior to Senior to Lead patterns), culture_fit_estimate (remote/hybrid, startup vs enterprise). Per candidate: overall_score 0-100, matched/missing skills, red_flags (job hopping, employment gaps, seniority mismatch), green_flags (long tenure, promotions), 3-5 interview questions, fit_summary. Diversity signals are first-name proxies ONLY with mandatory ethical WARNING. All processing is local -- no external API calls, instant response, privacy-preserving.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`candidates`	Yes	Array of candidate objects. Maximum 50.
`role_country`	No	Optional ISO 2-letter country code for regional context (informational).
`job_description`	Yes	Full text or summary of the job description and role requirements.
`criteria_weights`	No	Optional weighting per criterion. Default: skills=0.4, experience=0.2, education=0.1, progression=0.15, culture=0.15.

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`nice_to_have`	Yes
`quality_score`	Yes
`required_skills`	Yes
`candidates_ranked`	Yes
`diversity_signals`	No
`shortlist_recommended`	Yes
`job_description_summary`	Yes

Tool Definition Quality

A4.3/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint, idempotentHint, and destructiveHint. The description adds significant behavioral context beyond these: 'All processing is local -- no external API calls, instant response, privacy-preserving' and 'Diversity signals are first-name proxies ONLY with mandatory ethical WARNING.' These details disclose important operational traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single lengthy paragraph (around 150 words). While it front-loads the purpose, it could be more concise and better structured (e.g., bullet points for criteria and output). It is not overly verbose but lacks the brevity that would earn a higher score.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, nested objects, output schema), the description is comprehensive. It explains the output format (overall score, matched/missing skills, red flags, etc.) and covers key behavioral aspects (privacy, local processing). The presence of an output schema reduces the need to describe return values in detail, but the description still provides sufficient context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the input schema already documents all parameters. The description summarizes the scoring criteria but does not add new meaning beyond what the schema provides. The narrative helps contextualize, but the baseline of 3 is appropriate given high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts by clearly stating the tool's purpose: 'AI-powered candidate screening and ranking for recruiters...' It specifies the verbs (screening, ranking) and resources (job description, candidate resumes). The detailed breakdown of criteria and output distinguishes it from sibling tools like 'talent_intelligence', which likely focuses on different aspects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use the tool (for screening and ranking candidates with a job description) and outlines the criteria used. However, it does not explicitly state when not to use it or mention alternative tools. The context of many diverse sibling tools implies its specific use case, but explicit guidance would improve clarity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

capacity_planningC

Read-only

Inspect

Planification capacitaire — Gapup agent-payable C-suite expertise (CHRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — 22→48 FTE en 12m · ARR €480k→€1.7M · Plan d'embauches par département. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`benchmarks`	No
`financials`	Yes
`constraints`	No
`currentTeam`	Yes
`hiringBudgetEur`	No

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint, establishing safe read-only behavior. The description adds that the tool returns an 'audited deliverable' with server-side input validation. This is consistent and adds some context beyond annotations, but does not significantly expand behavioral disclosure.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively brief but uses mixed languages and jargon (e.g., 'Gapup agent-payable C-suite expertise'). It is front-loaded with the tool's purpose but lacks clear structure. Some sentences are fragmented.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, nested objects, no output schema), the description is highly incomplete. It does not explain the return format, how to use the async parameter, or the expected structure of 'documented case fields.' The agent would likely need additional information to invoke this tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 14%, and the description adds no information about individual parameters. It merely says to 'send the documented case fields,' leaving the agent to infer the meaning of all 7 parameters from the schema alone, which is insufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is for capacity planning, targeting CHRO/senior management, and returns a structured deliverable. The reference case provides concrete context. However, it does not explicitly differentiate from a large set of sibling tools, but the topic is specific enough.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives. No prerequisites, exclusions, or comparisons to siblings are provided. The description assumes the agent knows the tool's role.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

capital_strategyC

Read-only

Inspect

Stratégie de financement — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: Alan assurance santé SaaS — séquence Seed→A→B→C (2016-2022). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`growthPlan`	Yes
`financialPosition`	Yes
`founderConstraints`	Yes

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint and openWorldHint) already indicate the tool is non-destructive and may use external data. The description adds 'server-side validation' and 'audited deliverable', but does not disclose costs, rate limits, or other behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes extraneous details like the reference case. It is front-loaded with the purpose, but the mixed French/English and jargon reduce clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex nested input schema and no output schema, the description fails to explain the deliverable format or how to interpret results. The agent lacks information about what the tool returns, which is critical for a tool that produces a structured report.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only the async parameter described). The description says 'send the documented case fields' but does not elaborate on the meaning of the nested object properties, leaving the agent to infer from names alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description indicates the tool provides a financing strategy and returns a structured deliverable, but it is vague about the exact action and output. It mentions a reference case but does not clearly differentiate from siblings like cap_table_strategist or deal_structurer.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lacks explicit guidance on when to use this tool versus alternatives. The reference case implies SaaS context but no when-not-to-use or exclusion criteria are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cap_table_strategistC

Read-only

Inspect

Stratège du cap table — Gapup agent-payable C-suite expertise (FUNDRAISING). Returns a structured, audited deliverable. Reference case: Aleph AI Series B — modèle dilution multi-rounds + simulations secondaires + hygiène equity · 5 scenarios. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`plannedRounds`	Yes
`currentCapTable`	Yes
`founderObjectives`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint=true, openWorldHint=true) already cover safety and external data use. Description adds server-side validation and reference case, but does not contradict annotations. No further behavioral details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with purpose. The reference case example adds value but could be shortened. Generally concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has complex nested parameters (company, currentCapTable, plannedRounds) and no output schema. The description lacks detail on return format, expected results, or how to interpret the deliverable. Insufficient for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (17%). The description does not add meaning to parameters beyond the schema; it only refers to 'documented case fields' without elaboration.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool is a cap table strategist for fundraising, returning a structured deliverable. However, it does not differentiate from sibling tools like capital_strategy or ma_deal_screener.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives. Only mentions server-side validation of inputs, but lacks context on prerequisites or exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

carbon_footprint_calculatorA

Read-onlyIdempotent

Inspect

Calculate a company's greenhouse-gas footprint under the GHG Protocol (Scope 1 + 2 + 3, in tCO2eq, tier-2 accuracy ±20%). Returns the emissions breakdown, hotspot identification, 5-8 reduction levers each with capex and payback, an SBTi-aligned reduction trajectory over 5-25 years, the 15 Scope-3 categories in detail, and CSRD/ESRS reporting readiness. When to use this tool: the user needs a carbon assessment for CSRD compliance pre-audit, green-finance access, or supplier ESG scorecards. Inputs: the company profile and its activity data. Delivered by Émilie, the AI Sustainability lead of the Gapup portfolio.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`perimeter`	Yes
`scope1Sources`	No
`scope2Sources`	Yes
`reductionTargets`	No
`scope3Activities`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`kpis`	No	3-5 headline ESG KPI bubbles
`hotspots`	Yes	Top emission sources ranked by contribution
`breakdown`	Yes	Emissions breakdown by scope
`csrdReadiness`	Yes	CSRD/ESRS reporting readiness assessment
`sbtiTrajectory`	No	SBTi-aligned annual reduction trajectory
`reductionLevers`	Yes	5-8 actionable reduction levers with financial analysis
`executiveSummary`	Yes	Board-ready GHG assessment prose
`scope3Categories`	No	GHG Protocol 15 Scope-3 categories detail
`totalEmissionsTco2eq`	Yes	Total GHG footprint in tCO2eq (Scope 1+2+3 combined, ±20% tier-2 accuracy)

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, destructiveHint=false. Description adds accuracy and output details but fails to mention the async parameter behavior or any potential rate limits or computational delays. This gap reduces transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the main purpose but includes unnecessary fluff (e.g., 'Delivered by Émilie, the AI Sustainability lead'). While informative, it could be more concise by removing extraneous details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a complex tool with 8 parameters and nested objects, the description covers the return values (emissions breakdown, reduction levers, etc.) but omits details on input parameters and the async workflow. With an output schema, return value explanation is acceptable, but input guidance is insufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%, meaning most parameters lack descriptions. The description only vaguely mentions 'company profile and its activity data' without detailing parameters like scope1Sources, scope3Activities, reductionTargets, or the async flag. It does not adequately compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it calculates a company's GHG footprint under GHG Protocol with specific accuracy and return details. It distinguishes itself from sibling tools like carbon_roadmap by specifying scope 1+2+3 and CSRD focus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly lists when to use: for CSRD compliance, green-finance, supplier ESG scorecards. However, it does not mention when not to use or provide specific alternatives, though the 'when to use' is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

carbon_roadmapC

Read-only

Inspect

Roadmap carbone — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: Cas démo — Roadmap carbone. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`perimeter`	Yes
`scope1Sources`	No
`scope2Sources`	Yes
`reductionTargets`	No
`scope3Activities`	No

Tool Definition Quality

C2.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the description adds value by noting inputs are validated server-side. However, it does not disclose behavioral traits like response time, pagination, or potential errors. The deliverable content is also left vague.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three sentences, but it is not front-loaded with critical information. It starts with a French title-like phrase, then says 'Returns a structured, audited deliverable'. The reference case and validation note are helpful but could be positioned better.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is complex with 8 parameters including nested objects, and no output schema exists. The description does not explain the output format, how parameters interact, or what constitutes a successful result. It is insufficient for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Parameter schema coverage is only 13%, meaning most parameters lack descriptions. The description says 'send the documented case fields' without explaining any parameters. This fails to compensate for the low schema coverage, leaving the agent without guidance on what inputs are needed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it 'returns a structured, audited deliverable' for a carbon roadmap, which is somewhat clear. However, it does not distinguish from sibling tools like 'carbon_footprint_calculator' or 'sustainability_report'. The verb 'returns' is passive and lacks an action-oriented verb.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. There are no when-to-use or when-not-to-use statements, nor any exclusions or prerequisites. The description is purely about what the tool does without usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

champion_mappingC

Read-only

Inspect

Cartographie du champion — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Spendesk × Decathlon (deal €120k/an) — Champion identifié : CFO Group · Plan 6 semaines multi-touch. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`deal`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`knownContacts`	Yes
`sellerContext`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and the tool returns a 'structured, audited deliverable'. This provides some behavioral context beyond what annotations offer, but it does not detail any potential side effects or dependencies (e.g., rate limits, required permissions). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise but includes a specific reference case (Spendesk × Decathlon) that, while illustrative, adds length without being essential for understanding the tool's core function. The main purpose is front-loaded, but the reference case could be moved to a separate example field.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested objects, 4 parameters, no output schema, low schema coverage), the description does not sufficiently explain the input structure or output format. It promises a 'structured, audited deliverable' but does not specify its shape or contents. The openWorldHint annotation suggests external context may be required, but the description provides little to fill that gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 25% schema description coverage, the description fails to explain the purpose of key parameters like 'deal', 'knownContacts', and 'sellerContext'. It merely says 'send the documented case fields' without elaborating on the fields themselves. The async parameter is documented in the schema, but the description adds no parameter-level insight.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is for 'Cartographie du champion' (champion mapping) in the context of C-suite sales expertise. It mentions returning a structured, audited deliverable and provides a specific reference case (Spendesk × Decathlon) that illustrates its output. However, it could be more explicit about what 'champion mapping' entails and how it differs from other sales tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no explicit guidance on when to use this tool versus alternatives. It instructs to 'send the documented case fields' but does not indicate prerequisites or scenarios where champion mapping is appropriate compared to sibling tools like 'deal_coach' or 'competitive_deep_dive'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

change_failure_root_cause_classifierA

Read-onlyIdempotent

Inspect

Classifies root causes of change failures for CTO-level incident analysis. Uses GitHub PR metadata and Snyk vulnerability data to identify patterns like dependency vulnerabilities, configuration drift, or deployment process gaps. Inputs include GitHub PR URL or incident ID, and outputs structured root cause categories with confidence scores. Ideal for post-mortem analysis and change risk assessment.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`pr_url`	Yes
`incident_id`	No
`snyk_org_id`	No
`time_range_days`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`root_causes`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, and openWorldHint. The description adds meaningful context: it explains it uses GitHub PR metadata and Snyk vulnerability data, and outputs structured categories with confidence scores. It does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, no fluff. Purpose is front-loaded, methodology and inputs follow. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 params, low schema coverage) and presence of an output schema, the description covers the main intent and key inputs. It could mention the async parameter and explain snyk_org_id/time_range_days for full completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%. The description mentions pr_url and incident_id as inputs but does not explain snyk_org_id or time_range_days. It adds some meaning beyond the schema by linking pr_url to GitHub metadata, but fails to cover all parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states a specific purpose: classifying root causes of change failures for CTO-level incident analysis. It distinguishes from siblings by focusing on change failure root cause, different from general vulnerability scanning tools like cve_security_lookup or dependency_vulnerability_scan.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'Ideal for post-mortem analysis and change risk assessment,' providing clear usage context. It does not explicitly exclude other uses or name alternatives, but the context is sufficient for an agent to infer when to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

china_ecommerce_intelA

Read-only

Inspect

Chinese e-commerce intelligence for the ZH diaspora (50M+), import-export teams, brand IP enforcement, MENA/Africa entrepreneurs sourcing from China, and brand monitoring. Covers Taobao, Tmall, JD.com, Pinduoduo, 1688.com (B2B) and AliExpress (cross-border).

Five modes: • product_search — search products by keyword across CN platforms. Returns title ZH/EN, price CNY + USD estimate, sales 30d, rating, seller info, product URL. • seller_profile — full seller/supplier dossier: factory vs reseller detection, certifications (ISO, BSCI, CE), rating, years in business, main categories. • price_history — 12-month price trend for a product (live current price + seasonal model for CN shopping festivals: 11.11, 6.18, CNY). • brand_monitoring — detect counterfeits and grey market listings: price anomaly detection (>50% below MSRP = suspicious), counterfeit keyword scan, risk score 0-100. • market_intel — category overview: top 5 sellers by market share, avg/median price, volume estimate, price range.

Data quality note: LIVE data from Taobao/Tmall/JD/Pinduoduo REQUIRES AICI_RESEARCH_PROXY_URL with CN residential routing (Bright Data -country-cn). Without proxy: AliExpress (cross-border) + curated category fallback available.

Input formats for seller_profile: 'platform:id' e.g. 'aliexpress:123456', '1688:87654321', 'tmall:apple-store-official'. Input formats for price_history: AliExpress product URL or numeric product ID.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analysis mode. product_search=find products, seller_profile=supplier dossier, price_history=price trend, brand_monitoring=counterfeit detection, market_intel=category overview.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Keyword, product name, product_id, seller_id (platform:id), brand name, or category. Accepts Chinese characters (ZH) or English.
`region`	No	Market region. CN-domestic=full platform coverage, cross-border=AliExpress+1688 focus. Default: CN-domestic.
`platform`	No	Target platform. Default: all. Note: taobao/tmall/jd/pinduoduo require CN proxy.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`products`	No
`market_intel`	No
`platform_used`	Yes
`price_history`	No
`quality_score`	Yes
`seller_profile`	No
`brand_monitoring`	No

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and destructiveHint=false, which the description supports by stating it fetches live data. The description adds key behavioral details: proxy dependency for some platforms, data freshness (30-day sales, 12-month price history), and data quality notes (live vs curated fallback). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Well-organized with bullet lists for modes and input formats. Some redundancy (platform list repeated in description and schema), but every sentence adds useful context. Could be slightly more concise by avoiding repetition of platform names, but still efficient for the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 modes, 6 platforms, proxy requirements, multiple input formats), the description covers all critical aspects: use cases, mode descriptions, data sources, input syntax, and limitations. Output schema exists, so return values are not needed. Complete for an agent to select and invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers all 5 parameters with descriptions. The description adds substantial value beyond schema: explains input formats for seller_profile and price_history, describes mode-specific outputs (e.g., 'returns title ZH/EN, price CNY + USD estimate'), and clarifies platform defaults. Schema coverage is 100%, but the description enriches meaning significantly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it provides Chinese e-commerce intelligence, listing specific use cases (ZH diaspora, import-export, brand enforcement, etc.) and five distinct modes with descriptions. The verb 'covers' and detailed platform list make the purpose very clear, differentiating it from general market data tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance on when to use each mode and includes critical constraints: proxy requirement for domestic platforms, fallback without proxy, and input format examples. However, it does not explicitly differentiate from sibling tools like 'china_market_data'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

china_market_dataA

Read-only

Inspect

Chinese capital market intelligence for the ZH diaspora (50M+) and institutional investors. Covers A-Shares (SSE/SZSE), H-Shares (HKEX), and ADRs across four modes:

• company — full company profile: name ZH/EN, USCC (18-digit social credit code), exchange, industry (CSRC classification), chairperson, registered capital, SOE flag • market_quote — real-time quote: price (CNY or HKD), change%, volume, market cap, P/E ratio, dividend yield, last update timestamp • sector_overview — sector snapshot: top 5 companies by market cap, avg P/E, 30-day sector index change. Supported sectors: semiconductor, ev, battery, technology, finance, energy, realestate, consumer, pharma, telecom • regulatory_filing — recent regulatory disclosures (HKEX filings: annual, quarterly, announcements, mergers, IPOs) with title, date, document URL

Input formats accepted: • 6-digit A-Share ticker (e.g. '600519' for Moutai SSE) • HKEX ticker (e.g. '0700.HK' or '700' for Tencent) • Company name in EN or ZH (e.g. '腾讯', 'Kweichow Moutai') • Sector keyword (e.g. 'semiconductor', '半导体')

Data sources: Yahoo Finance (primary, always accessible), Eastmoney push2 + CompanySurvey (via Bright Data proxy when AICI_RESEARCH_PROXY_URL is set), HKEX filing API. Note: Eastmoney/CSRC/SSE are blocked from datacenter IPs without proxy — set AICI_RESEARCH_PROXY_URL to unlock full coverage.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analysis mode. company=full profile, market_quote=price data, sector_overview=top 5 by sector, regulatory_filing=recent filings.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Ticker (6-digit A-share, 4-digit HK, Yahoo format), company name (ZH or EN), or sector keyword.
`exchange`	No	Exchange filter. Default: all. Affects sector_overview ticker selection.
`period_days`	No	Lookback period in days for regulatory filings. Default: 30.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`status`	Yes
`company`	No
`sources`	Yes
`market_quote`	No
`quality_score`	Yes
`sector_overview`	No
`regulatory_filings`	No

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Despite annotations already marking readOnlyHint=true, the description adds critical behavioral details: proxy requirement (AICI_RESEARCH_PROXY_URL), async mode (job_id return), and data source limitations (Eastmoney/CSRC/SSE blocked from datacenter IPs). This fully informs the agent of operational constraints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with bullet points for modes and input formats, and is front-loaded with the purpose. However, its length (multiple paragraphs) could be trimmed for efficiency—every sentence is useful, but overall detail is high.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 modes, multiple input formats, proxy dependencies, async option), the description covers all necessary context: data sources, limitations, required environment variable, and job polling. The existence of an output schema means return values need not be detailed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage, but the description enriches parameter meaning by detailing field contents for each mode (e.g., company profile includes USCC, SOE flag; market includes P/E ratio). This adds value beyond the schema's short descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states 'Chinese capital market intelligence' and enumerates specific assets (A-Shares, H-Shares, ADRs) and four concrete modes (company, market_quote, sector_overview, regulatory_filing). This provides a specific verb+resource+scope, distinguishing it from generic data tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies that this tool should be used for Chinese market data queries, but does not explicitly state when to use it over sibling tools like india_market_data. It provides context about data sources and proxy requirements, but no comparative guidance or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

churn_defenderC

Read-only

Inspect

Bouclier anti-churn — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Spendesk — portefeuille 400 clients PME/ETI, détection churn Q2 2025 (€8M ARR). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`accounts`	Yes
`csrContext`	No
`analysisWindowDays`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds minimal behavioral context mentioning server-side validation and a structured deliverable, but does not elaborate on side effects, authorization needs, or cost implications.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short but lacks clear structure; it combines purpose, a reference case, and input instruction in a single sentence. While concise, it could be better organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of the tool (nested objects, no output schema), the description is incomplete. It does not explain the return format, how to interpret the deliverable, or supply enough context for correct use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 20% schema description coverage, the description should compensate by explaining key parameters. Instead, it only references 'documented case fields' without detailing them, adding no value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly indicates the tool is for churn analysis/anti-churn via the title and reference case. However, it lacks explicit differentiation from similar sibling tools like 'renewal_optimizer', and the verb is implied rather than stated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool vs. alternatives, nor any exclusion criteria. It only instructs to send documented case fields, leaving the agent without context for appropriate invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

climate_scenario_rcpA

Read-only

Inspect

Projections climatiques long terme par scénario IPCC (RCP AR5 + SSP AR6) pour toute localisation. Scénarios : RCP_4_5, RCP_8_5 (AR5), SSP1_2_6, SSP2_4_5, SSP3_7_0, SSP5_8_5 (AR6), ou 'all' (compare tous). Horizons : 2030–2100. Métriques : température (delta vs baseline 1990-2010, jours >35°C, nuits chaudes), précipitations (delta%, événements extrêmes, sécheresses), hausse du niveau de la mer (cm vs 2000), événements extrêmes (ouragans, inondations P100, sécheresses), indice incendie. Sorties : comparaison multi-scénarios, probabilité IPCC, signaux d'impact business par secteur. Sources : Open-Meteo CMIP6 (keyless), IPCC AR6 Atlas lookup, NOAA SLR projections. Usages : TCFD/CSRD physical risk, due diligence actifs long terme, assurance catastrophe, planification infrastructure. Cache 7j. SLA ≤20s.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`metrics`	No	Métriques à inclure. Défaut : toutes.
`location`	Yes	Localisation : {city, country?} ou {lat, lon}
`scenario`	Yes	Scénario IPCC. 'all' génère une comparaison multi-scénarios.
`horizon_year`	Yes	Année horizon de la projection (2030–2100)
`compare_baseline`	No	Comparer vs baseline 1990-2010 (défaut true)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`location`	Yes
`scenario`	Yes
`projections`	Yes
`horizon_year`	Yes
`quality_score`	Yes
`baseline_period`	No
`ipcc_likelihood_label`	Yes
`business_impact_signals`	Yes
`multi_scenario_comparison`	No

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true. The description adds behavioral details: cached 7 days, SLA ≤20s, data sources (Open-Meteo CMIP6, IPCC AR6 Atlas, NOAA SLR), and output types (multi-scenario comparison, probability, business impact signals). No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph but well-structured with purpose first, then details on scenarios, metrics, outputs, sources, and use cases. It is information-dense without being overly verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested objects, existing output schema), the description covers all essential aspects: scenarios, horizons, metrics, location, output types, sources, use cases, caching, and SLA. No gaps identified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%. The description adds meaning: default metrics ('all' implied), location format clarification ({city, country?} or {lat, lon}), and scenario 'all' generates multi-scenario comparison. These details go beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states it provides long-term climate projections by IPCC scenario for any location, listing scenarios, metrics, outputs, and use cases. It clearly distinguishes from other climate tools by focusing on scenario-based projections.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies use cases (TCFD/CSRD physical risk, due diligence, insurance, infrastructure planning) and technical limits (cache 7 days, SLA ≤20s). It does not explicitly mention when not to use or alternative tools, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

clinical_evidence_brieferC

Read-only

Inspect

Brief évidence clinique (GRADE) — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Answers: Review the clinical evidence for <drug/intervention> in — GRADE rating, key trials, safety signals. · Scan safety signals for in — adverse events, severity, frequency from FAERS and trial data. · Assess comparative effectiveness of versus for — what does the evidence show? · Is there evidence supporting drug repurposing of for — existing trials and GRADE quality? · What are the evidence gaps for in before formulary adoption? Reference case: Semaglutide 2.4mg · Chronic weight management in non-diabetic adults · GRADE high efficacy · studies found · nausea/GI signals · FDA approved · PubMed+ClinicalTrials+OpenFDA. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`topic`	Yes
`max_studies`	Yes
`intervention`	No
`evidence_focus`	Yes		all
`target_disease`	No
`date_range_years`	Yes
`intervention_type`	No

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that it returns a structured, audited deliverable and that inputs are validated server-side, which is useful. However, it does not disclose other behavioral traits such as rate limits, authentication needs, or how async polling works (though the async parameter is described in input schema).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose and includes many repetitive example questions. It lacks a clear structure, mixing French and English. The reference case is helpful but could be integrated better. Every sentence should earn its place; several sentences could be condensed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (8 params, 4 required, no output schema), the description is incomplete. It does not describe the output format in detail, nor does it explain all parameters. The reference case gives a partial example but does not cover all use cases. Additional context about evidence sources or limitations would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%, so the description should compensate. The description implies the meaning of 'topic' and 'evidence_focus' through the example questions, but it does not systematically explain all 8 parameters. For instance, 'max_studies' and 'date_range_years' are not mentioned, and there is no mapping between parameters and the reference case.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool returns a structured, audited deliverable for clinical evidence briefs, covering GRADE ratings, key trials, safety signals, etc. It provides several example questions that illustrate its purpose. However, it does not explicitly differentiate from sibling tools like clinical_pharma_intel, but the purpose is specific enough.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists example queries that indicate when to use the tool, but it does not provide guidance on when not to use it or mention alternative tools. There is no explicit context for selection among siblings, which is important given the large sibling set.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

clinical_pharma_intelA

Read-only

Inspect

Clinical and pharmaceutical intelligence for biotech analysts, healthcare fund managers, pharma BD teams, catalyst-driven hedge funds and health journalists. Aggregates live data across five modes: • trials — active/completed clinical trials (ClinicalTrials.gov v2 + EU CTR in parallel, 450k+ records) • pipeline — full pipeline by sponsor: trial count by phase + top indications • approvals — FDA drug label approvals + mechanism of action (OpenFDA) • recalls — FDA enforcement recalls classified by severity (Class I/II/III) • adverse_events — FAERS aggregated reactions: top 10 reactions + serious%

Signal detection (P0/P1/P2): P0 if Class I recall OR trial terminated for safety reason P1 if serious adverse events >30% OR ≥3 recalls in 12 months P2 otherwise (standard monitoring)

All sources are public and keyless. Optional env OPENFDA_API_KEY raises daily quota from 1,000 to 120,000 requests. SLA: ≤16s p95 (parallel fetch, 8s budget per source). Cache: 6h trials, 24h approvals, 12h recalls, 6h adverse events.

ParametersJSON Schema

Name	Required	Description
`mode`	No	Analysis mode. Default "trials". trials=clinical trials, pipeline=sponsor overview, approvals=FDA approvals, recalls=enforcement, adverse_events=FAERS
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`phase`	No	Filter trials by phase (1/2/3/4/NA). Only applies to modes trials and pipeline.
`query`	Yes	Drug name, indication, sponsor or molecule (e.g. "atezolizumab", "metastatic NSCLC", "Roche", "semaglutide")
`country`	No	ISO 2-letter country code to filter trial sites (e.g. US, FR, DE).
`max_results`	No	Maximum number of results to return. Default 20.
`status_filter`	No	Filter trials by status. Only applies to modes trials and pipeline.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`status`	Yes
`trials`	No
`recalls`	No
`signals`	Yes
`sources`	Yes
`pipeline`	No
`approvals`	No
`quality_score`	Yes
`adverse_events`	No

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description goes beyond annotations by detailing data sources (ClinicalTrials.gov, EU CTR, OpenFDA, FAERS), cache durations, SLA (≤16s p95), signal detection logic (P0/P1/P2), and the optional OPENFDA_API_KEY effect. No contradiction with annotations (readOnlyHint=true).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is lengthy but well-organized with sections, bullet points, and clear terminology. Information is front-loaded with the purpose. A few sentences could be trimmed (e.g., listing all siblings), but it remains efficient for the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 modes, signal detection, SLA, caching, optional API key) and the presence of an output schema, the description is complete. It covers data sources, performance, filtering options, and result interpretation without needing additional context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema coverage, each parameter already has a description. The tool description adds nuances like mode-specific applicability and signal detection, but does not significantly enhance parameter understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides clinical and pharmaceutical intelligence with five distinct modes (trials, pipeline, approvals, recalls, adverse_events), specifying the target users (biotech analysts, hedge funds, etc.) and differentiating from sibling tools by domain.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use each mode and mentions the optional API key to increase quota, but does not explicitly state when not to use the tool or suggest alternatives. However, the modes are self-explanatory and cover common pharma use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cloud_cost_ri_optimizerA

Read-onlyIdempotent

Inspect

Analyzes AWS and Azure cloud pricing data alongside RIPE regional demand trends to generate Reserved Instance purchase recommendations for CTOs. Inputs include target cloud provider, instance family, region, and desired commitment term. Outputs include cost savings percentage, optimal RI quantity, and regional demand insights. Ideal for reducing cloud spend with data-driven decisions. Keywords: cloud cost optimization, reserved instances, AWS pricing, Azure pricing, RIPE demand trends.

ParametersJSON Schema

Name	Required	Description
`term`	No
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	Yes
`utilization`	No
`cloud_provider`	Yes
`instance_family`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`ri_cost`	No
`sources`	No
`warnings`	No
`on_demand_cost`	No
`break_even_months`	No
`regional_demand_score`	No
`cost_savings_percentage`	No
`recommended_ri_quantity`	No

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, so the description's claim of 'analyzes' and 'generates recommendations' is consistent. The description adds context about data sources (AWS, Azure, RIPE) and output types, but does not disclose behavioral aspects like rate limits or response size beyond what annotations cover.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph of 4 sentences plus a keyword line, which is relatively concise. It front-loads the core purpose and includes key details. However, the keyword line is somewhat redundant.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers inputs, outputs, and purpose adequately. It mentions the output includes cost savings, RI quantity, and regional insights. With an output schema presumably present, the return values are likely documented. However, the missing 'utilization' parameter in the description is a minor gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (17%), with only 'async' having a description. The description explains 4 of 6 parameters (cloud_provider, instance_family, region, term) but omits 'utilization'. It adds meaning by describing the parameters' roles in generating recommendations, partially compensating for low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it analyzes AWS and Azure pricing data to generate Reserved Instance purchase recommendations for CTOs. It specifies inputs and outputs, and the verb 'generates recommendations' combined with specific resources (cloud pricing data, RIPE trends) makes the purpose unambiguous. No sibling tool overlaps with this functionality.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions it is 'Ideal for reducing cloud spend with data-driven decisions,' which implies when to use it. However, it does not explicitly state when not to use or name alternatives. Despite the large sibling list, no other tool covers RI optimization, so the guidance is adequate but not explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

code_review_depth_optimizerA

Read-onlyIdempotent

Inspect

As a CTO, this tool analyzes your team's historical DORA metrics (deployment frequency, lead time, MTTR, change failure rate) and GitHub pull request data to recommend an optimal code review depth. Input your repository identifier and time range, and receive a structured recommendation on review rigor (light, standard, thorough) with supporting metrics and risk-adjusted rationale.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`teamSize`	No	Number of active developers in the team
`repository`	Yes	GitHub repository identifier in format owner/repo
`riskTolerance`	No	Organization's risk tolerance level
`timeRangeDays`	Yes	Number of days of historical data to analyze

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`recommendation`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read-only, open-world, idempotent. The description adds context about input data (DORA metrics, PR data) and output (structured recommendation with rationale), which goes beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, each earning its place: first explains purpose and audience, second details input and output. No wasted words, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, the description adequately covers purpose, inputs, and output nature without needing to detail return format. It could mention data sources, but overall complete for a recommendation tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and parameters are well-described in the schema. The description reinforces repository and time range but adds no extra semantic value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes DORA metrics and PR data to recommend an optimal code review depth. It uses specific verbs ('analyzes', 'recommend') and specifies outputs (review rigor, metrics, rationale). The tool name and title are distinct from siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies the tool is for CTOs to optimize review depth but does not explicitly state when to use vs alternatives or provide exclusions. It mentions inputs but lacks guidance on when not to use or how it differs from siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

comp_benchmark_geo_deltaA

Read-onlyIdempotent

Inspect

Compares local compensation benchmarks against HQ standards for CHROs, adjusting for cost-of-living and tax differentials. Inputs include job role, local and HQ locations, and salary range. Outputs include adjusted benchmark delta, cost-of-living multiplier, and tax impact. Keywords: compensation benchmark, geographic pay equity, cost-of-living adjustment, tax differential analysis.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`jobRole`	Yes	Standardized job role (e.g., 'Software Engineer III')
`currency`	No	ISO 4217 currency code (e.g., 'USD')
`baseSalary`	No	Current base salary in local currency
`hqLocation`	Yes	HQ location (ISO 3166-2 code or city, country)
`localLocation`	Yes	Local work location (ISO 3166-2 code or city, country)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`taxImpact`	No	Estimated tax differential percentage
`adjustedSalary`	No	Salary adjusted for cost-of-living and taxes
`benchmarkDelta`	No	Percentage difference between local and HQ benchmark
`confidenceScore`	No	0-1 confidence in data quality
`costOfLivingMultiplier`	No	Local cost-of-living index relative to HQ

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnly, openWorld, and idempotent hints. The description adds value by specifying the adjustments (cost-of-living, tax) and outputs, providing behavioral context beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (three sentences plus keywords) and front-loaded with the core purpose. Every sentence contributes meaningfully, with no waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of a detailed schema and annotations, the description covers main inputs, purpose, and key outputs. It omits details like async parameter, but these are in the schema. Sufficient for an agent to understand the tool's role.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 100% schema description coverage, the description only summarizes input categories without adding new semantic details beyond the schema. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool compares local compensation benchmarks against HQ standards with cost-of-living and tax adjustments, which is a specific verb+resource. It does not explicitly distinguish from sibling tools, but the focus on CHROs and geographic equity provides differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for geographic compensation benchmarking for CHROs, but lacks explicit guidance on when to use this tool over alternatives or when not to use it. No exclusions or context for sibling tools are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitive_deep_diveA

Read-only

Inspect

Gold-standard competitive deep dive — STRUCTURED multi-source data (no LLM narrative). Pair tool: competitor_intel for LLM-narrated board briefing + slide script. Aggregates Wikipedia, Yahoo Finance, SEC EDGAR, Wayback Machine, DuckDuckGo, HackerNews, domain scraping — all keyless. Returns agent-shaped JSON: KPIs (funding, employees, revenue, market cap), P0/P1/P2 competitive signals, pricing radar, competitor comparison matrix, Wayback timeline, positioning (sector/industry/icp_hypothesis/moat_signals), quality score. Every field is sourced or marked unavailable — no hallucinated figures. SLA: p50 ~25s, p95 ~30s · score 80+ on listed targets (US/EU/foreign) · score ~40 on private companies (no EDGAR/Yahoo data). Use sync for batch agents (≤30s tolerance). Use competitive_deep_dive_async + competitive_deep_dive_result(job_id) for conversational agents. Inputs: company name or domain (required), optional competitor list (≤5), optional depth (easy/medium/hard).

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`depth`	No	Research depth: 'easy' = Wikipedia + DDG (fast, ~15s); 'medium' = + Yahoo Finance + EDGAR + Wayback (default, ~45s); 'hard' = + HackerNews + domain surfaces + competitor deep dive (~120s)
`company`	Yes	Name or domain of the target company (e.g. 'Salesforce', 'notion.so', 'HubSpot CRM')
`competitors`	No	Optional list of competitor names or domains to include in the comparison matrix (max 5)

Output Schema

ParametersJSON Schema

Name	Required	Description
`kpis`	Yes	Key Performance Indicators sourced from public data
`company`	Yes
`quality`	Yes
`signals`	Yes	Competitive intelligence signals, severity-ranked P0 (critical) to P2 (informational)
`sources`	Yes
`comparison`	Yes	Feature/dimension comparison between target and each competitor
`depth_used`	Yes
`positioning`	Yes	Positioning analysis derived from public data
`generated_at`	Yes
`pricing_radar`	Yes	Pricing tiers extracted from public sources
`domain_resolved`	Yes
`wayback_timeline`	Yes	Historical snapshots of the company website from Wayback Machine

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses output structure (agent-shaped JSON with KPIs, signals, quality score), data sources, SLA, and guarantees against hallucinated figures. Annotations already indicate read-only and open-world, but description adds substantial behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and informative, though somewhat verbose. It efficiently conveys purpose, sibling differentiation, output, SLA, and usage advice. Almost every sentence adds value, but could be trimmed slightly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (4 params, output schema exists, annotations present), the description covers all essential aspects: purpose, output, data sources, SLA, limitations, and usage scenarios. It leaves no critical gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with detailed parameter descriptions (e.g., depth options with sources and timings). The description adds little new parameter meaning, but it does summarize inputs effectively. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs a 'competitive deep dive' using structured multi-source data, distinguishing it from the sibling tool 'competitor_intel' which provides LLM-narrated board briefings. It specifies verb+resource and scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly provides alternative tool ('competitor_intel') and async variants ('competitive_deep_dive_async' + result poll). Also gives SLA expectations and score ranges for different company types, guiding when to use sync vs async.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitive_deep_dive_asyncA

Read-only

Inspect

Async variant of competitive_deep_dive. Returns immediately (<200ms) with a job_id. The research runs in the background (p50≈25s, p95≈30s for depth=medium). Poll the result with competitive_deep_dive_result(job_id) after the eta_seconds hint. Use this instead of competitive_deep_dive when the agent cannot wait >15s for a response. Inputs: same as competitive_deep_dive — company (required), competitors (optional list, max 5), depth (easy/medium/hard, default medium). Async tool — register a webhook via webhooks_manage(register, url, [job.completed]) to receive callbacks instead of polling. Faster + lighter.

ParametersJSON Schema

Name	Required	Description
`depth`	No	Research depth: 'easy'≈15s, 'medium'≈30s (default), 'hard'≈60s
`company`	Yes	Name or domain of the target company (e.g. 'Salesforce', 'notion.so')
`competitors`	No	Optional list of competitor names or domains to include in the comparison matrix (max 5)

Output Schema

ParametersJSON Schema

Name	Required	Description
`job_id`	Yes	Unique job identifier — pass to competitive_deep_dive_result
`status`	Yes	Always 'queued' on submission
`eta_seconds`	Yes	Estimated seconds until result is ready
`submitted_at`	Yes	ISO-8601 submission timestamp

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Describes async behavior: returns <200ms, background runtime p50≈25s, p95≈30s for depth=medium. Provides polling and webhook alternatives. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise single paragraph, front-loaded with key info, no unnecessary words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers async nature, timing, polling, webhook option, and inputs. Given output schema exists, return values are handled elsewhere. Complete for the tool's role.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. Description adds default value for depth (medium) which is not in schema, and restates parameters clearly. Adds value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it is the async variant of competitive_deep_dive, returns immediately with a job_id, and distinguishes from the sync sibling. Mentions inputs are same as competitive_deep_dive.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'Use this instead of competitive_deep_dive when the agent cannot wait >15s for a response.' Also guides to poll with competitive_deep_dive_result or register a webhook.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitive_deep_dive_resultA

Read-onlyIdempotent

Inspect

Poll the result of a competitive_deep_dive_async job. Returns status=pending while running, status=completed with the full report once done, status=failed on error, or status=not_found if the job_id is unknown or expired (TTL 24h). Call this after the eta_seconds hint returned by competitive_deep_dive_async.

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	Yes	The job_id returned by competitive_deep_dive_async

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false; the description adds value by detailing the returned statuses, TTL, and the prerequisite call. No contradictions exist between description and annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with just two sentences, no extraneous words, and front-loads the core purpose. It efficiently conveys all necessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter, presence of an output schema), the description fully covers usage, statuses, and TTL. It even mentions the prerequisite call, making it complete for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The only parameter job_id has a description in the schema that matches the tool's description. Since schema_description_coverage is 100%, the baseline is 3 and the description does not add further semantic information beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool polls the result of a competitive_deep_dive_async job, listing all possible status outcomes. It distinguishes itself from sibling tools like competitive_deep_dive_async which starts the job, and competitive_deep_dive which may be synchronous, making the purpose unambiguous.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly instructs to call this tool after the eta_seconds hint from competitive_deep_dive_async, providing clear when-to-use guidance. It could improve by noting when not to use it, such as preferring the synchronous version, but the given instruction is sufficient for proper usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitor_intelA

Read-onlyIdempotent

Inspect

LLM-narrated competitive-intelligence BRIEFING — for human consumption (board meeting, pitch prep). Pair tool: competitive_deep_dive for raw structured multi-source data (agent-shaped JSON). Returns: recent competitor moves with severity (critical/high/medium/low), prioritised signals, pricing-radar comparison, 3-6 quantified recommendations (impact in € or %, 7/30/90/180-day horizons), and an 8-12 slide presenter script. Use when the buyer wants a narrative briefing or a deck. Inputs: your company (name + one-paragraph pitch) + 1-10 competitors. Delivered by Manue, AI CMO of the Gapup portfolio.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No	Optional — what the buyer wants to track first (e.g. pricing moves, hiring patterns)
`competitors`	Yes	1-10 competitors to analyze
`selfCompany`	Yes	Your company info

Output Schema

ParametersJSON Schema

Name	Required	Description
`kpis`	No	3-5 headline KPI bubbles
`sources`	No	Cited sources
`pricingRadar`	No	Pricing comparison across competitors
`competitorMoves`	Yes	Recent moves per competitor with severity rating
`presenterScript`	Yes	8-12 slide board presenter script
`recommendations`	Yes	3-6 actionable strategic recommendations
`executiveSummary`	Yes	Board-ready prose summary (120-400 chars)

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, idempotentHint=true, destructiveHint=false, aligning with the description of a read operation. The description adds detail on the output structure (severity, recommendations, slides), but does not mention potential rate limits or other behavioral traits. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, each serving a purpose: stating purpose and audience, pairing sibling, listing returns, and usage guidance. No unnecessary words, front-loaded with key information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity, the presence of annotated safety hints, and an output schema (indicated but not shown), the description covers inputs, outputs, usage, and pairing. It is complete for an AI agent to select and invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds little beyond the schema: it mentions the one-paragraph pitch and optional focus, but schema already describes those. The added context ('one-paragraph pitch' matching schema) is marginal.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool produces an LLM-narrated competitive-intelligence briefing for human consumption, distinguishing it from the sibling tool 'competitive_deep_dive' which provides raw structured data. The verb 'briefing' and resource 'competitive intelligence' are specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'Use when the buyer wants a narrative briefing or a deck' and recommends pairing with 'competitive_deep_dive' for raw data, providing clear guidance on when to use this tool vs. alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitor_movesB

Read-only

Inspect

Mouvements concurrents — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: What have my named competitors done recently — releases, pricing changes, hires, funding? · Which competitor signals are the most urgent right now and what should I do about them? Reference case: Notion — moves de ClickUp, Asana, Coda. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`competitors`	Yes
`selfCompany`	Yes

Tool Definition Quality

B3.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds context beyond the readOnlyHint annotation by specifying the output is a 'structured, audited deliverable,' and mentions server-side input validation and async behavior. This complements the annotation well without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but contains some extraneous marketing language ('Gapup agent-payable C-suite expertise') and a reference case that may not be universally understood. It is functional but not optimally concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Without an output schema, the description should explain the deliverable's structure, but it only calls it 'structured, audited' without details. It also lacks guidance on rate limits, result size, or the behavior of the async parameter, leaving gaps for a tool with nested objects and optional async mode.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25%, and the description does not explain the meaning of 'selfCompany.pitch' or 'focus,' nor the structure of 'competitors' beyond what the schema provides. The vague phrase 'documented case fields' does not compensate for the lack of parameter-level detail.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a structured deliverable about competitor moves (releases, pricing, hires, funding) and urgent signals, targeting C-suite users. However, it does not explicitly differentiate from many sibling competitor tools, though the specificity of 'moves' and 'urgency' provides distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies when to use this tool via the questions it answers (recent competitor moves, urgent signals). It does not provide explicit guidance on when not to use it or mention alternatives, which would improve clarity given the large sibling set.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitor_pricing_radarB

Read-only

Inspect

Radar pricing concurrents — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: How do my competitors' pricing plans and monthly prices compare to mine? · Which competitor plan undercuts or out-features my equivalent tier? Reference case: Notion — pricing vs ClickUp, Asana, Coda. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`competitors`	Yes
`selfCompany`	Yes

Tool Definition Quality

B3.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint. The description adds that it returns a 'structured, audited deliverable' but does not disclose behavioral traits like cost, rate limits, or potential delays (though async parameter hints at slowness). Minimal additional value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is three sentences, concise but not optimally structured. First sentence is a fragment, second lists questions, third gives a reference case. Could be more front-loaded and clearer.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 4 parameters (including async and nested objects) and no output schema, the description lacks information on output format, error handling, and result interpretation. It mentions a 'structured, audited deliverable' but does not specify contents, leaving significant gaps for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 25% (low). The description mentions 'selfCompany' and 'competitors' but does not explain 'async' or 'focus' parameters. It provides no additional meaning beyond basic schema property descriptions, failing to compensate for low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: comparing competitors' pricing plans and monthly prices, and identifying undercutting or out-featuring competitors. It distinguishes from siblings like 'competitor_pricing_scrape' (scraping only) and 'competitive_deep_dive' (broader analysis).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for competitive pricing analysis but does not explicitly state when to use versus alternatives, nor provides scenarios when not to use. The reference case (Notion vs ClickUp, etc.) gives context but lacks direct guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitor_pricing_scrapeA

Read-only

Inspect

Scrape and parse a competitor pricing page from a URL or domain. Fetches via proxy-aware timedFetch (tries /pricing, /plans, homepage fallback), then extracts: plan names, prices, billing cadence (monthly/annual/usage-based/one-time), key features, free tier presence, enterprise tier, estimated price range. Returns structured pricing tiers. If unfetchable or no pricing found (anti-bot, SPA, auth wall): returns a clear degraded result with warnings and signals — never fake success. ICP: founders, product managers, pricing strategists, competitive intel teams. Proxy-aware (AICI_RESEARCH_PROXY_URL). Cache TTL 6h.

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	Competitor URL or domain (e.g. 'https://notion.so/pricing', 'notion.so', 'https://www.example.com'). For best results, provide the direct pricing page URL.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.

Output Schema

ParametersJSON Schema

Name	Required	Description
`tiers`	Yes
`domain`	Yes
`status`	Yes
`warnings`	Yes
`url_fetched`	Yes
`has_free_tier`	Yes
`pricing_found`	Yes
`quality_score`	Yes
`raw_price_signals`	Yes
`has_enterprise_tier`	Yes
`plan_names_detected`	Yes
`billing_model_signals`	Yes
`estimated_price_range`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses proxy usage, timedFetch with fallback paths, extraction fields, degraded result behavior on failure, and cache TTL. Adds rich behavioral context beyond annotations (readOnlyHint, openWorldHint) with no contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single paragraph with clear flow: action, process, output, failure mode, audience, technical details. No redundant sentences; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers input, process, output (structured pricing tiers), failure handling, caching, proxy, audience, and async option. Complete for a scraping tool with structured output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage, baseline 3. Description adds context: async behavior returns job_id, and best practice for direct pricing URL. Adds value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool scrapes and parses competitor pricing pages from a URL or domain, with specific verb (scrape, parse) and resource (pricing page). Distinguished from siblings by focusing on single-URL scraping with detailed extraction logic.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly describes input format, fallback behavior, failure handling, and caching. Provides clear context on when to use (scraping a specific competitor pricing page) but does not explicitly name alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitor_profilesA

Read-only

Inspect

Profils concurrents — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: What are the strengths, weaknesses and positioning of each of my competitors? · Give me a SWOT-style profile of a named competitor. Reference case: Notion — profils de ClickUp, Asana, Coda. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`competitors`	Yes
`selfCompany`	Yes

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. Description adds that return is 'structured, audited deliverable' and mentions server-side validation. However, does not disclose async behavior despite async parameter in schema, nor polling or timeout details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is relatively concise (4 sentences) and front-loaded with purpose. Some jargon ('Gapup agent-payable C-suite expertise') may reduce clarity, but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 4 parameters (2 required, nested objects) and no output schema, description covers main purpose and output type but omits details on async parameter, return format, and constraints like max 10 competitors.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25%. Description adds vague 'send the documented case fields' but does not explain individual parameters like selfCompany, competitors, focus, or async beyond schema. Fails to compensate for low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states tool returns a structured, audited deliverable answering competitor strengths/weaknesses/positioning, with a reference case. Differentiates from sibling tools like competitor_intel and competitor_moves by specifying SWOT-style profile.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear context for when to use (competitor analysis, SWOT-style), includes a reference case. Does not explicitly exclude alternatives or state when not to use, but purpose is sufficiently distinct.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

competitor_recommendationsB

Read-only

Inspect

Recommandations concurrentielles — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: Given my competitors, what strategic actions should I take and in what order? · What should my 7/30/90/180-day competitive response plan look like? Reference case: Notion — actions face à ClickUp, Asana, Coda. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`competitors`	Yes
`selfCompany`	Yes

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that the deliverable is 'structured, audited' and that inputs are validated server-side. This provides some additional context about reliability but doesn't disclose any significant behavioral traits beyond what annotations indicate. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise at about 4 sentences, but it includes marketing fluff like 'Gapup agent-payable C-suite expertise (CMO)' that doesn't aid agent understanding. The core information is front-loaded, but there is some redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 4 parameters, with nested objects and no output schema. The description does not detail the output format or explain how to use parameters like 'focus' or what constitutes valid competitors. The reference case helps but is insufficient. For a tool of this complexity, the description lacks completeness in guiding correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (only 'async' gets a description). The tool description does not explain the meaning or usage of the key parameters: selfCompany, competitors, or focus. It merely says 'send the documented case fields,' which is unhelpful. Given the nested objects and low schema coverage, the description fails to compensate, leaving parameter semantics unclear.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides strategic competitive response plans with timelines, answering specific questions about actions and order. It gives a concrete reference case (Notion vs ClickUp, Asana, Coda). However, it does not explicitly distinguish from sibling tools like competitor_profiles or competitive_deep_dive, which may also offer strategic insights. The verb 'recommendations' and resource 'competitive' are present, but sibling differentiation is missing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies that the tool is appropriate when the user needs a structured action plan with timelines (e.g., 'what strategic actions should I take and in what order?'). It hints at a C-suite audience. However, it provides no explicit guidance on when not to use it or which alternatives (e.g., competitor_profiles for general info) to consider. The usage context is implied but not explicitly stated.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

comp_plan_architectB

Read-only

Inspect

Architecture plan de commissionnement — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Comp Plan 8 rôles commerciaux · OTE €65-280k · Budget comp €2.1M · Quota coverage 3.2×. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`targets`	Yes
`geography`	No
`salesTeam`	Yes
`currentChallenges`	Yes
`preferredStructure`	No

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint:true and openWorldHint:true, so the description's assertion that it 'Returns a structured, audited deliverable' aligns but adds little beyond confirming the output. It adds context about server-side validation and the reference case, but does not disclose additional behavioral traits such as authentication requirements or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description consists of three sentences, starting with the core purpose. However, the second sentence includes a lengthy reference case with specific numbers (e.g., OTE €65-280k, Budget comp €2.1M) that adds clutter and may not be essential for understanding the tool's function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the tool's purpose and output type ('structured, audited deliverable') but lacks key details such as what the deliverable contains, how to use inputs effectively, and any constraints. Given the complexity (7 parameters, nested objects) and absence of an output schema, more guidance would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema description coverage at only 14%, the description fails to compensate. It merely says 'send the documented case fields' without explaining parameter purpose, constraints, or relationships. The schema's own descriptions are minimal for most parameters, especially nested objects like 'company' and 'salesTeam'.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Architecture plan de commissionnement' for C-suite expertise (CRO). It specifies that it returns a structured, audited deliverable and provides a concrete reference case. This distinguishes it from sibling tools like 'comp_benchmark_geo_delta' and 'executive_comp_peer_benchmark', which focus on benchmarking rather than plan construction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for architecting a compensation plan, mentioning 'Reference case' and 'send the documented case fields.' However, it does not explicitly state when to use this tool versus alternatives, nor does it provide when-not-to-use guidance or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_audience_profileA

Read-only

Inspect

Return the audience targeting profile of a content entity — its enrichment tags reframed as audience facets with confidence, corroboration and full provenance (verifiable, sourced). The response also carries an entity-level provenance block (average confidence, data freshness). When to use this tool: an ad-tech or marketing agent needs a machine-readable, verifiable audience descriptor for a franchise or work. Input: an entity_id and its type.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entity_id`	Yes	Entity id from content_catalog
`entity_type`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`entity_id`	Yes
`provenance`	Yes	Entity-level trust & freshness summary.
`entity_type`	Yes
`audience_facets`	Yes	Map facet → array of { label, confidence, corroboration, source_count, sources }

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint. The description adds value by detailing the output: 'audience facets with confidence, corroboration and full provenance' and 'entity-level provenance block (average confidence, data freshness).' This goes beyond annotations, though it doesn't cover potential response size or latency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences, front-loaded with purpose, then output details, usage context, and input. Every sentence adds essential information without redundancy or filler. It is appropriately sized and structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists and annotations are present, the description adequately covers what the tool does and when to use it. It does not discuss performance guarantees or error handling, but for a read-only tool with output schema, it is reasonably complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 67%. The description adds that entity_id comes from 'content_catalog' and entity_type is 'franchise or work', but this largely mirrors the schema. The async parameter's behavior is described in schema. Minimal additional meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns 'audience targeting profile' with 'audience facets' from enrichment tags, including confidence and provenance. It specifies the use case for ad-tech/marketing agents and distinguishes from sibling tools like content_enrichment by focusing on audience descriptors.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'When to use this tool: an ad-tech or marketing agent needs a machine-readable, verifiable audience descriptor for a franchise or work.' It provides clear context but does not mention alternative tools or when not to use it, which is a minor gap.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_catalogA

Read-only

Inspect

Browse the Gapup gold-standard content catalogue — video games, films, TV series and music. Returns franchises with their works (title, release year). When to use this tool: an agent needs structured, audited metadata for a cultural franchise, wants to resolve a title to a canonical entity, or browses a domain's catalogue before requesting enrichment. Inputs: a content domain and an optional case-insensitive name filter. Each franchise id can be passed to content_enrichment for its fine-grained tag profile.

ParametersJSON Schema

Name	Required	Description
`name`	No	Optional case-insensitive substring filter on franchise name
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No	Maximum number of franchises to return (default 20)
`domain`	Yes	Content domain to browse

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`domain`	Yes
`franchises`	Yes

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and openWorldHint. Description adds that it returns franchises with works (title, release year) and the relationship to content_enrichment, providing useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Concise, front-loaded with purpose, then usage, then inputs, then sibling tool relationship. No extraneous information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given output schema exists and annotations are provided, the description covers purpose, usage, parameters, and relationship to sibling tool. Complete for its role.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%; description adds that the name filter is case-insensitive and explains inputs in simpler terms, providing modest added value over the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool browses a specific catalog (video games, films, TV series, music) and returns franchises with works. It distinguishes from sibling content_enrichment by noting franchise IDs can be passed to it for fine-grained tags.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: for structured, audited metadata, resolving titles, or browsing before enrichment. Implies not for fine-grained tags, guiding use of content_enrichment.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_compareA

Read-only

Inspect

Compare the tag profiles of two content entities (franchises or works) and measure how similar they are. Returns a Jaccard similarity score, the list of shared tags, the tags unique to each entity, and a breakdown of shared tags by facet. When to use this tool: an agent needs to compare two franchises or works (e.g. 'how similar are Dark Souls and Elden Ring?', 'what do Street Fighter and Mortal Kombat have in common?', 'on which axes do these two games differ?'), find positioning overlap, identify cross-sell opportunities, or answer 'if you liked X you might like Y' questions backed by data. Works for any domain (video-games, music, film, tv).

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entity_a`	Yes	Id of the first entity from content_catalog (e.g. 'game-dark-souls', 'music-daft-punk').
`entity_b`	Yes	Id of the second entity from content_catalog (e.g. 'game-elden-ring', 'music-justice').
`entity_type`	No	Whether both ids are franchises or works (applies to both). Defaults to 'franchise'.

Output Schema

ParametersJSON Schema

Name	Required	Description
`entity_a`	Yes
`entity_b`	Yes
`similarity`	Yes	Jaccard index = \|shared\| / \|union\|, rounded to 2 decimal places. 0 = no overlap, 1 = identical profiles.
`a_tag_count`	Yes
`b_tag_count`	Yes
`entity_type`	Yes
`shared_tags`	Yes	Tags present in both entities (up to 40).
`unique_to_a`	Yes	Tags present only in entity_a (up to 40).
`unique_to_b`	Yes	Tags present only in entity_b (up to 40).
`shared_count`	Yes
`shared_by_facet`	Yes	Count of shared tags per facet (e.g. { genre: 3, theme: 5 }). Shows which dimensions drive the similarity.

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, so the description adds context beyond that, such as the return structure (Jaccard similarity, shared/unique tags) and domain-agnostic nature. No contradictions; the description aligns with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with a clear structure: function statement, output summary, and usage guidance. While examples are helpful, they add some length; overall it's well-organized and not overly verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description adequately covers what the tool does and its outputs (similarity score, shared tags, unique tags, facet breakdown). It specifies prerequisites (ids from content_catalog) and domain applicability, making it complete for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with each parameter documented. The description reinforces parameter semantics (e.g., entity_a/b are ids from content_catalog) but does not add significant new information beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Compare') and resource ('tag profiles of two content entities'), clearly stating it measures similarity and returns Jaccard scores. It distinguishes itself from sibling tools like content_similar by focusing on tag profile comparison with detailed output.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a dedicated 'When to use this tool' section with concrete examples and use cases (e.g., comparing franchises/works, identifying overlap, cross-sell opportunities). However, it does not explicitly state when not to use it or mention alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_discoveryA

Read-only

Inspect

Discover content franchises within a domain. Two modes: pass tag for a precise taxonomy match (every game tagged 'co-op'), or pass query for free-text SEMANTIC search powered by pgvector embeddings — finding franchises by meaning ('dark atmospheric games about isolation') even when no literal tag matches. Results are verifiable: tag mode carries tag confidence/corroboration, semantic mode carries a similarity score; both carry entity freshness. When to use: an agent wants a domain-scoped shortlist by tag or by intent. Inputs: a domain plus either a tag or a free-text query.

ParametersJSON Schema

Name	Required	Description
`tag`	No	Tag label to match precisely (e.g. 'thriller', 'co-op'). Mutually exclusive with `query`.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No	Maximum franchises to return (default 25)
`query`	No	Free-text intent for semantic search (e.g. 'melancholic synth-pop about heartbreak'). Mutually exclusive with `tag`.
`domain`	Yes	Content domain to search within

Output Schema

ParametersJSON Schema

Name	Required	Description
`tag`	No
`count`	Yes
`query`	No
`domain`	Yes
`method`	Yes
`franchises`	Yes

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint. The description adds valuable context: it explains the two modes, result verifiability (tag confidence/corroboration, semantic similarity score, entity freshness), and the async option. No contradictions; adds meaningful behavioral details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is informative and well-structured, starting with purpose, then detailing modes, results, usage, and inputs. Each sentence adds value. Slightly longer than necessary but justified by the complexity of two modes.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (two modes, async option, verifiable results) and presence of an output schema, the description covers all essential aspects: input parameters, mutual exclusivity, result characteristics, and when to use. It is sufficiently detailed for correct selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds significant value by explaining the mutual exclusivity of tag and query, the nature of each mode (precise taxonomy vs. semantic search with pgvector embeddings), and the output characteristics. This enriches parameter understanding beyond the schema's field descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Discover content franchises within a domain.' It specifies two distinct modes (tag for precise taxonomy and query for semantic search), effectively distinguishing it from sibling tools like content_catalog or content_enrichment. The verb 'discover' with resource 'content franchises' is specific and actionable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use: 'an agent wants a domain-scoped shortlist by tag or by intent.' While it does not name specific alternatives, it implies when not to use (e.g., full catalog) and covers the primary use cases. Clear guidance for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_engineC

Read-only

Inspect

Moteur de contenu — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Notion — content engine 2026 (productivity B2B). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`brand`	Yes
`months`	Yes
`cluster`	Yes
`maxArticlesPerMonth`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, consistent with returning a deliverable. The description adds no further behavioral details (e.g., latency, side effects).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences are brief, but the first sentence is front-loaded. However, it omits essential details, making it less effective.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given nested objects, low schema coverage, and no output schema, the description fails to explain return structure or parameter roles, leaving gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 20%, and the description does not explain any parameters. It merely says 'send the documented case fields', leaving the agent to parse an undocumented schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The name 'Moteur de contenu' and description indicate a content engine returning a structured, audited deliverable. It is distinct from siblings but could be more explicit about the deliverable type.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description only mentions server-side validation, providing no context for appropriate usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_enrichmentA

Read-only

Inspect

Return the enriched tag profile of a content entity — the Gapup moat. Each tag carries a facet (genre, theme, play-mode, perspective…), a confidence score, a corroboration score and its full provenance (which sources corroborated it, when). The response also carries an entity-level provenance block (average confidence, data freshness). When to use this tool: an agent has a franchise or work id (from content_catalog) and needs a fine-grained, machine-readable, verifiable characterisation for matching, recommendation, contextual targeting or analysis. Inputs: an entity id and its type.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entity_id`	Yes	Entity id from content_catalog (e.g. 'music-daft-punk', 'film-the-dark-knight-collection:the-dark-knight')
`entity_type`	No	Whether the id is a franchise or a work (default franchise)

Output Schema

ParametersJSON Schema

Name	Required	Description
`tags`	Yes
`entity_id`	Yes
`tag_count`	Yes
`provenance`	Yes	Entity-level trust & freshness summary.
`entity_type`	No

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, so the description doesn't need to repeat safety. It adds valuable behavioral context by detailing the response structure (provenance, confidence scores) and the fact that it returns a 'fine-grained, machine-readable, verifiable' profile, which enriches transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with purpose. It has a logical flow: what it does, output details, when to use, inputs. Each sentence adds value, but it could be slightly more concise without losing information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description need not explain return values. It covers inputs, usage context, and the nature of the output. The tool is sufficiently complex, and the description provides all necessary contextual information for an agent to use it correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%. The description adds semantic value by explaining entity_id comes from content_catalog with examples, entity_type defaults to franchise, and async parameter for slow tools to avoid timeouts. This goes beyond the schema's property descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns the enriched tag profile of a content entity with specific fields like facet, confidence score, corroboration score, and provenance. It distinguishes itself from siblings by calling it the 'Gapup moat' and specifies its use for characterization needs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: when an agent has a franchise or work id from content_catalog and needs fine-grained characterization. It mentions the source tool content_catalog, but doesn't explicitly mention when not to use, leaving room for slight improvement.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_evergreen_score_analyzerA

Read-onlyIdempotent

Inspect

Evaluates content evergreen potential for CMOs by analyzing historical traffic patterns and backlink authority. Takes a content URL and optional time range, returns an evergreen score (0-100), traffic trend analysis, and backlink profile. Ideal for content strategy planning, SEO optimization, and identifying high-value evergreen assets. Uses Wayback Machine and Common Crawl public APIs.

ParametersJSON Schema

Name	Required	Description
`url`	Yes	Content URL to analyze
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`toDate`	No	End date for historical analysis (YYYY-MM-DD)
`fromDate`	No	Start date for historical analysis (YYYY-MM-DD)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`lastSeen`	No
`warnings`	Yes
`firstSeen`	No
`trafficTrend`	Yes
`backlinkCount`	No
`evergreenScore`	Yes
`backlinkDomains`	No

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint. The description adds that it uses Wayback Machine and Common Crawl public APIs, indicating external dependencies and potential rate limits, and mentions the output structure (score, trend, backlinks).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with four sentences, each adding value: core purpose, input/output, use cases, and data sources. No fluff or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 4 parameters, annotations, and an output schema, the description covers the return values (evergreen score, traffic trend, backlink profile) and use cases comprehensively. It is complete for the tool's intended use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all 4 parameters. The description reinforces the meaning by noting 'takes a content URL and optional time range,' which maps to url, toDate, and fromDate. It adds context beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool evaluates content evergreen potential using historical traffic and backlink authority, specifying the target user (CMOs) and returning a score with analysis. This is a specific verb+resource that distinguishes it from siblings like content_audience_profile.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Ideal for content strategy planning, SEO optimization, and identifying high-value evergreen assets,' providing clear context. However, it lacks explicit 'when not to use' or comparisons to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_provenanceA

Read-only

Inspect

Audit the full data provenance of a content entity — all its enrichment tags with their extraction source, corroboration score, source list and last verification date, plus an entity-level freshness summary. Use this tool before citing or relying on enriched content data in a high-stakes context (ad targeting, editorial, analysis). Inputs: entity_id (required) and entity_type (franchise or work).

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entity_id`	Yes	Entity id from content_catalog (e.g. 'video-game-elden-ring')
`entity_type`	No	Whether the id is a franchise or a work (default: franchise)

Output Schema

ParametersJSON Schema

Name	Required	Description
`lineage`	Yes	Full tag lineage from v_data_lineage — one entry per tag.
`entity_id`	Yes
`entity_type`	Yes
`freshness_summary`	Yes	Entity-level freshness & trust summary from v_entity_freshness.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds behavioral context about the tool's audit nature and what it returns (freshness summary, source list). No contradictions, but does not discuss auth requirements or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose and contents, followed by usage context and inputs. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With output schema present (not shown but indicated), description covers all needed context: purpose, contents, use case, inputs. For a read-only audit tool, it is fully adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. Description adds value by specifying 'entity_id from content_catalog (e.g. 'video-game-elden-ring')' and default entity_type, going beyond schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Audit the full data provenance of a content entity' and enumerates specific components (enrichment tags, extraction source, corroboration score, etc.). It differentiates from siblings like content_catalog and content_enrichment by focusing on provenance and freshness.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises 'Use this tool before citing or relying on enriched content data in a high-stakes context (ad targeting, editorial, analysis).' This provides clear when-to-use guidance, but lacks explicit when-not-to or references to alternative tools among many content-related siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_rankingA

Read-only

Inspect

Return the TOP-ranked content entities in a category, by a chosen criterion — the direct answer to superlative / decision queries: 'best video games', 'top RPGs', 'cheapest games', 'best value RPGs', 'best FPS playable right now', 'most popular music artists'. Criteria: critic_score, popularity, price, value (critic score per unit price). direction flips it (asc = cheapest/lowest first). available_only restricts to entities currently buyable. Sliceable by genre and release-year window; every result carries its score, price and source. When to use: an agent must produce a ranked shortlist to support a recommendation, a purchase or a 'what is the best X' decision.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`genre`	No	Optional genre filter, e.g. 'RPG', 'FPS', 'thriller'
`limit`	No	Number of ranked results (default 20)
`domain`	Yes	Content domain to rank within
`year_to`	No	Optional latest release year
`criterion`	No	critic_score (0-100, default) · popularity · price · value (critic score per unit price)
`direction`	No	desc = best/highest first (default); asc = cheapest/lowest/least first. Defaults to asc for price.
`year_from`	No	Optional earliest release year
`available_only`	No	If true, restrict to entities currently available to buy/play (default false)

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`genre`	No
`domain`	Yes
`ranking`	Yes
`year_to`	No
`criterion`	Yes
`direction`	No
`year_from`	No
`available_only`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds significant behavioral context: explains that results carry score, price, and source, how direction and available_only work, and sliceable by genre/year. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured, starting with core function, then criteria details, then usage guidance. It is front-loaded with the main purpose. It could be slightly more concise (e.g., 'critic score per unit price' repeated twice), but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (9 params, output schema exists, many siblings), the description covers ranking criteria, direction, filtering options, and usage context. It mentions output includes score, price, source. It is complete enough for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% (all 9 parameters have schema descriptions). The description adds further meaning by explaining criterion values, direction defaults, and the meaning of available_only, going beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns top-ranked content entities by a chosen criterion, with specific verb 'Return the TOP-ranked content entities'. It distinguishes itself from siblings by focusing on superlative/decision queries like 'best video games', 'top RPGs', etc. The scope is well-defined.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use: 'an agent must produce a ranked shortlist to support a recommendation, a purchase or a 'what is the best X' decision'. It provides clear context but does not explicitly mention when not to use or alternatives, though the sibling list is large.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_similarA

Read-only

Inspect

Find content entities similar to a given one. For embedded franchises this uses SEMANTIC vector similarity (pgvector) over the enrichment profile — surfacing entities that feel alike even when their tags differ literally. Falls back to shared enrichment-tag overlap for works or non-embedded entities. Each result carries a similarity score and its entity-level freshness/confidence (verifiable, sourced). When to use this tool: an agent wants recommendations or lookalikes for a franchise or work. Input: an entity_id and its type.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No
`entity_id`	Yes	Entity id from content_catalog
`entity_type`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`method`	Yes	How similarity was computed.
`similar`	Yes
`entity_id`	Yes
`source_provenance`	Yes	Provenance of the source entity used to compute similarity.

Tool Definition Quality

A4.2/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds behavioral details beyond annotations: two algorithms, fallback behavior, output includes similarity score and freshness/confidence. Annotations already declare readOnlyHint and openWorldHint; no contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

One dense paragraph that covers purpose, method, output, and usage. Could be more structured (e.g., bullet points), but it is front-loaded with the core action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an existing output schema, return values are partly covered. However, the description omits guidance on the async parameter and the limit parameter, which are important for invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 50% (async, entity_id have descriptions). Description adds that entity_type is needed, but doesn't explain limit or async behavior. Minor inconsistency: schema lists entity_type as optional, description implies both required.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool finds similar content entities using specific methods (semantic vector similarity for embedded franchises, tag overlap for others). It distinguishes from sibling content tools by focusing on recommendations/lookalikes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'When to use this tool: an agent wants recommendations or lookalikes for a franchise or work.' Lacks exclusion criteria or alternative tool references, but the use case is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

content_taxonomyA

Read-only

Inspect

Return the enrichment taxonomy of a content domain — every tag grouped by facet (genre, theme, mood, play-mode…). When to use this tool: an agent needs the controlled vocabulary to filter, classify or query content. Input: a domain.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`domain`	Yes	Content domain

Output Schema

ParametersJSON Schema

Name	Required	Description
`domain`	Yes
`taxonomy`	Yes	Map facet → array of tag labels
`tag_count`	Yes
`facet_count`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true and openWorldHint=true, so the safety profile is clear. The description adds behavioral context by stating the output is 'every tag grouped by facet', which goes beyond the annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise: two sentences that front-load the purpose and usage. Every sentence adds value with zero waste. Ideal for quick agent interpretation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description, combined with annotations and a present output schema (context signals indicate output schema exists), provides a complete picture. The output is described as tags grouped by facet, and the usage guidance is clear. Could optionally mention output format or pagination, but not required given the output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, with both parameters ('async' and 'domain') well-described in the JSON schema. The description adds minimal value for parameters, just 'Input: a domain.' Baseline 3 is appropriate since the schema does the heavy lifting.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns the enrichment taxonomy of a content domain, grouping tags by facets like genre, theme, mood. It uses a specific verb 'Return' and identifies the resource ('enrichment taxonomy') and scope ('every tag grouped by facet'), effectively distinguishing it from sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states 'When to use this tool: an agent needs the controlled vocabulary to filter, classify or query content.' This provides clear context for usage. However, it does not mention when not to use or specify alternatives, which would make it stronger.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

contract_risk_scannerC

Read-only

Inspect

Scanner de risques contractuels — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Salesforce MSA — revue d'un client SaaS B2B EMEA. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`contractText`	Yes
`contractContext`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint. The description adds that inputs are validated server-side and returns a structured, audited deliverable. However, it does not disclose other behavioral traits like latency, error modes, or asynchronicity (async parameter is documented only in schema). Adequate but not rich.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (two sentences, one fragment) and relatively concise. However, some jargon ('Gapup agent-payable C-suite expertise') may confuse. Structure is front-loaded with purpose. Efficient but could be clearer.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Tool has 4 parameters, nested objects, no output schema. The description is brief and does not explain the output format, error handling, or effect of the 'focus' parameter. The reference case is an example but not comprehensive. Incomplete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (25%). The description does not compensate by explaining the parameters beyond the schema. It only generically says 'send the documented case fields', adding no semantics for contractContext, contractText, focus, or async. Little added value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool scans contractual risks and returns a structured deliverable. It provides a reference case, but does not explicitly differentiate it from sibling tools like 'vendor_risk_assessor' or 'talent_contract_risk_mapper'. The purpose is clear but lacks sibling distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. No prerequisites, exclusions, or context about when not to use it. The reference case implies use for contract risk scanning, but that's implicit. Missing explicit usage instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

corporate_registry_lookupA

Read-onlyIdempotent

Inspect

Resolve legal information about a company from its national corporate registry. Returns a normalised, sourced company profile: legal status, registration number, directors, shareholders, recent filings, registered address, share capital, and a quality score (0–100). Coverage: France (INPI, keyless — full SIREN/SIRET with directors), 3M+ entities worldwide via GLEIF LEI (keyless, large companies), UK (Companies House, optional key), Netherlands (KvK, optional key), and OpenCorporates (token required since 2026). Sources are tried in cascade; quality_score increases with each source that succeeds. When to use: due-diligence, KYC screening, supplier verification, M&A research, or any workflow needing verified company identity and legal status. Optional env vars: COMPANIES_HOUSE_API_KEY (UK), KVK_API_KEY (NL), OPENCORPORATES_API_TOKEN (OpenCorporates token).

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`country`	No	ISO 3166-1 alpha-2 country code (e.g. 'FR', 'GB', 'NL', 'DE', 'SG', 'AU', 'US'). If omitted, inferred from legal suffix in company name, then falls back to global search.
`identifier`	No	Optional registry identifier for a fast direct lookup: SIREN (FR, 9 digits), Companies House number (GB, 8 chars), KvK number (NL, 8 digits), etc.
`company_name`	Yes	Company name or trading name to look up (e.g. 'Sanofi', 'Tesco PLC', 'Notion Labs Inc')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`registry`	Yes
`directors`	Yes
`freshness`	Yes	ISO timestamp
`identifier`	Yes
`legal_form`	No
`legal_name`	No
`company_name`	Yes
`jurisdiction`	Yes
`shareholders`	Yes
`quality_score`	Yes	0-100 confidence score
`share_capital`	No
`filings_recent`	Yes
`incorporation_date`	No
`registered_address`	No

Tool Definition Quality

A4.8/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read-only, idempotent, nondestructive behavior. The description adds valuable context: sources are tried in cascade, quality_score increases with successful sources, and OpenCorporates requires a token since 2026. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is detailed but well-structured, starting with the core purpose and then expanding on sources and usage. It is front-loaded and each sentence adds information, though slightly longer than strictly necessary.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of multiple sources, optional keys, cascade logic, and an output schema, the description covers all essential aspects: coverage, data returned, usage guidance, and key requirements. It equips an agent to select and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions. The description adds value by explaining country inference from legal suffix and providing examples for identifier (SIREN, Companies House number). This enhances understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool resolves legal information from national corporate registries and lists specific data points returned (legal status, registration number, directors, etc.). It names coverage areas and distinguishes from siblings by its focused use case in due diligence and KYC.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists when to use the tool (due-diligence, KYC screening, supplier verification, M&A research) and mentions optional API keys for improved access. This gives agents clear context for invocation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

court_filings_multiA

Read-only

Inspect

Aggregate court filings, judgments and litigation records for a company or individual across five major legal jurisdictions: US (CourtListener / PACER), UK (National Archives — EWHC/EWCA/UKSC/UKUT), EU (ECHR HUDOC — European Court of Human Rights), France (Légifrance / Cour de cassation) and Germany (BGH / BVerfG). Returns structured case records with type classification (civil/criminal/antitrust/bankruptcy/administrative/unknown), status (filed/pending/decided/appealed/unknown), parties extracted from case titles, opinion URLs and verbatim snippets. Cross-case pattern recognition produces severity-ranked signals (P0–P2) for criminal, antitrust, bankruptcy, regulatory, data-breach and IP categories. Use when: due diligence on a counterparty, vendor risk assessment, competitive intelligence (litigation history), regulatory exposure mapping. All sources are public and keyless. Optional env var COURTLISTENER_API_KEY raises US rate limits beyond the default 5 req/s anonymous tier. SLA: ≤25s p95 (all jurisdictions fetched in parallel, 8s budget per source). Quality score: 20 pts per jurisdiction with ≥1 case retrieved, +10 if signals detected, +5–10 if ≥2–3 distinct sources contributed.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`date_to`	No	ISO date YYYY-MM-DD — latest filing or decision date to include
`date_from`	No	ISO date YYYY-MM-DD — earliest filing or decision date to include
`party_name`	Yes	Name of the company or individual to search (e.g. "Apple Inc", "TotalEnergies", "Volkswagen AG")
`jurisdiction`	No	Jurisdictions to search. Defaults to all ["US","UK","EU","FR","DE"].

Output Schema

ParametersJSON Schema

Name	Required	Description
`cases`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`party_name`	Yes
`quality_score`	Yes
`by_jurisdiction`	Yes
`jurisdictions_searched`	Yes

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds substantial behavioral context: returns structured records with classification, status, parties, opinion URLs, snippets; cross-case pattern recognition produces severity-ranked signals; SLA of ≤25s p95; quality scoring formula. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the main purpose. Every sentence adds value: jurisdictions, output details, use cases, authentication, SLA, quality scoring. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple jurisdictions, pattern recognition), the description is comprehensive: covers purpose, inputs, outputs, performance, quality, and use cases. It lacks explicit mention of error handling or empty results, but these are minor. Output schema exists, so return value details are covered.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description does not elaborate on individual parameters beyond what the schema provides, but it gives overall context for the jurisdiction and date parameters. No additional parameter-level meaning is added.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it aggregates court filings across five major jurisdictions, lists specific sources, and provides a specific verb ('aggregate') and resource. It is distinct from sibling tools which focus on other domains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states use cases: due diligence, vendor risk assessment, competitive intelligence, regulatory exposure mapping. Also notes that all sources are public and keyless, and mentions optional rate-limit improvement via API key.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

crm_connectorAInspect

Push, update, search and log activities in HubSpot, Salesforce or Pipedrive. 4 modes: push_lead (create contact/lead), update_opportunity (update deal stage/amount), search_contact (lookup by email), log_activity (call/email/meeting/note). Returns resource_id, direct CRM URL, signals and quality_score. If credentials are absent, returns a mock result with a warning signal. Auth: HubSpot via Bearer access_token; Salesforce via access_token + base_url; Pipedrive via api_key.

ParametersJSON Schema

Name	Required	Description
`data`	Yes	Payload depending on mode. push_lead: {email,first_name,last_name,company,phone,job_title}. update_opportunity: {deal_id/opportunity_id,stage,amount,close_date}. search_contact: {email}. log_activity: {type,body,contact_id/person_id,subject}.
`mode`	Yes	Action to perform in the CRM
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`provider`	Yes	CRM provider to target
`credentials`	No	Auth credentials. HubSpot: access_token. Salesforce: access_token + base_url. Pipedrive: api_key.

Output Schema

ParametersJSON Schema

Name	Required	Description
`url`	No
`mode`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`success`	Yes
`provider`	Yes
`data_synced`	No
`resource_id`	No
`quality_score`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description reveals key behavioral traits beyond annotations: returns mock results with warning if credentials absent, details auth patterns per provider, and explains async mode behavior. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with 4 sentences, each adding value. It front-loads the core purpose and modes, then returns and auth details. No redundant or wasted information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite the tool's complexity (5 parameters, nested objects, enums, async, auth variations), the description covers all essential aspects: modes, payloads, return values, auth, and fallback behavior. The output schema exists, so return value details are sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, providing baseline 3. The description adds significant meaning by specifying the payload fields for each mode (e.g., push_lead requires email, first_name, etc.) and the credential structure per provider, which goes beyond the schema's generic descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool pushes, updates, searches, and logs activities in CRMs with specific modes (push_lead, update_opportunity, search_contact, log_activity). It distinguishes the tool from siblings by detailing the four distinct actions and supported providers (HubSpot, Salesforce, Pipedrive).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly describes when to use each mode, provides auth requirements per provider, and mentions the async option for avoiding timeouts. However, it does not explicitly exclude alternatives or specify when not to use the tool, though the context is clear enough for an agent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cross_sell_recoC

Read-only

Inspect

Recommandations cross-sell — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Alan × Gapup Hub — 3 produits recommandés · Fit 'perfect' × 2 · ARR potentiel +€18k. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`account`	Yes
`company`	Yes
`portfolio`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint: true, indicating no mutation. The description adds that it returns a structured, audited deliverable and that inputs are validated server-side, providing some behavioral context beyond annotations. However, it does not disclose any potential side effects or detailed behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes jargon ('Gapup agent-payable C-suite expertise (CRO)') and a reference case that may not be universally understood. It is not ideally front-loaded with a clear action, and every sentence does not clearly earn its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With complex parameters (nested objects) and no output schema, the description is insufficient. It does not explain the output format, how recommendations are generated, or what makes a valid input case. The reference case helps but leaves many gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (only async has a description). The description does not elaborate on parameters; it only says 'send the documented case fields' without specifying what those fields are. Thus, it fails to add meaning beyond the schema, especially given the low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly indicates it provides cross-sell recommendations, with a specific reference case. It states 'Returns a structured, audited deliverable.' While the phrasing is somewhat jargon-heavy and mixes languages, the core purpose is discernible and distinguishes it from sibling tools like upsell_hunter.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description offers no explicit guidance on when to use this tool versus alternatives. It mentions that inputs are validated server-side but does not specify context, prerequisites, or exclusions. There is no mention of when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

crypto_wallet_intelA

Read-only

Inspect

Multi-chain on-chain analytics for crypto trading agents, on-chain analysts, AML/compliance teams and DeFi BD. Covers Ethereum, Base, Polygon, BSC, Arbitrum, Optimism — EVM-compatible addresses only.

5 modes: • wallet_profile — full wallet summary: type (EOA/contract/CEX/protocol), inferred persona (whale/MEV-bot/DeFi-user/hodler…), age, tx count, native balance, ERC-20 count, NFT collections, OFAC sanctions flag • token_flows — ERC-20 inflows/outflows per token on the selected period, priced in USD via CoinGecko • pnl_estimate — FIFO realized + unrealized P&L on the period with confidence rating (high/medium/low) • counterparties — top 20 counterparties ranked by USD volume with CEX/DEX/protocol labels • defi_positions — active DeFi positions detected via Etherscan interaction history (Aave/Compound/Uniswap/Curve/Lido/Balancer/SushiSwap)

Signal detection (P0/P1/P2): P0 if OFAC SDN match OR direct Tornado Cash / sanctioned-protocol interaction P1 if >$1M volume on wallet <30 days old OR MEV-bot pattern OR >80% volume on single counterparty P2 informational (CEX wallet, new wallet, no anomaly)

Sources: Etherscan family (keyless free-tier, optional API key per chain), DefiLlama (keyless), public EVM RPC (keyless), CoinGecko free tier (keyless). Cache TTL: 5 min (wallet activity evolves fast). Budget: 8s per source.

Env vars (all optional, raise Etherscan rate-limit from 1 req/5s to 5 req/s): ETHERSCAN_API_KEY · BASESCAN_API_KEY · POLYGONSCAN_API_KEY BSCSCAN_API_KEY · ARBISCAN_API_KEY · OPTIMISM_API_KEY

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analysis mode. wallet_profile=full wallet summary + persona + sanctions flag. token_flows=ERC-20 inflows/outflows per token priced in USD. pnl_estimate=FIFO realized+unrealized P&L with confidence. counterparties=top 20 counterparties by volume. defi_positions=active positions on Aave/Compound/Uniswap/Curve/Lido/etc.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`chain`	No	Chain to analyze. Default "ethereum". Use "all" to scan all 6 chains (slower, ~30s).
`address`	Yes	EVM-compatible wallet address (0x... 40 hex chars). Works on all supported chains.
`period_days`	No	Lookback window in days for token_flows, pnl_estimate, counterparties, defi_positions. Default 30.
`min_value_usd`	No	Minimum USD value filter for token_flows and counterparties. Default $100.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`address`	Yes
`signals`	Yes
`sources`	Yes
`token_flows`	No
`pnl_estimate`	No
`quality_score`	Yes
`counterparties`	No
`defi_positions`	No
`wallet_profile`	No
`chains_analyzed`	Yes

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description fully discloses behavioral traits: read-only (consistent with readOnlyHint), data sources with rate limits, cache TTL (5 min), budget per source, and env vars for API keys. Signal detection thresholds are explicit. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with clear sections (modes, signal detection, sources). It is front-loaded with the core purpose. While lengthy, the detail is justified given the tool's complexity. Minor redundancy in mode listing (both in description and schema) but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, 5 modes, EVM constraint, signal detection), the description covers all necessary context: usage, data sources, cache, env vars, and output expectations (since output schema exists). It leaves no significant gaps for an agent to select and invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description adds overall context for modes (e.g., wallet_profile includes persona and sanctions) but does not significantly augment individual parameter meaning beyond what the schema already provides (e.g., enums, patterns, defaults).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides multi-chain on-chain analytics for crypto wallets, lists 5 specific modes (wallet_profile, token_flows, etc.), and includes target users (trading agents, AML teams). It distinguishes from sibling tools by being focused on EVM-compatible addresses with detailed analytics modes, not just simple screening.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives clear context on when to use (crypto analysis, AML, DeFi BD) and includes signal detection levels (P0/P1/P2) guiding priority. It mentions EVM-only so agents know not to use for non-EVM chains. However, it does not explicitly exclude alternatives like sanctions_screener_multi or kyc_screener, leaving some ambiguity among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

customer_marketingC

Read-only

Inspect

Marketing clients & ambassadeurs — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Gapup Hub — 12 clients analysés · 4 ambassadeurs identifiés · Programme + 6 case studies + référral. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`goals`	Yes
`company`	Yes
`product`	Yes
`customers`	Yes
`targetUseCases`	No
`contentBudgetEur`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, so the tool is read-only. The description adds that inputs are validated server-side. However, it does not mention the async behavior or the return format beyond 'structured, audited deliverable'. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise (3 sentences plus a reference case). It front-loads the primary purpose. However, the reference case may be less informative than structured details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 7 parameters, nested objects, and no output schema, the description lacks important details like output structure, async handling, and parameter relationships. It is insufficient for an agent to fully understand the tool's capabilities.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 14%, but the description does not explain any parameters beyond generic 'send the documented case fields'. The schema has descriptions for some parameters, but the description adds no value for parameter meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it is for 'Marketing clients & ambassadeurs' and returns a structured deliverable. However, it is not fully specific about the exact operation, and it does not distinguish clearly from sibling tools like 'event_marketing' or 'growth_path_architect'. The reference case adds context but is not a clear definition.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description does not mention exclusions, prerequisites, or comparison with similar tools. Among many siblings, this is a gap.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

customer_voice_synthC

Read-only

Inspect

Synthèse voix client — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Alan (assurance santé) — 3 personas · Top 5 douleurs · Repositionnement messagerie recommandé. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`dataSources`	Yes
`targetSegments`	Yes
`repositioningFocus`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint and openWorldHint. Description adds that inputs are validated server-side and returns a structured deliverable, but doesn't elaborate on auth, rate limits, or other behaviors beyond what annotations imply.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is relatively short and front-loaded with the main purpose. However, it includes unnecessary references like 'Gapup agent-payable' that may be confusing. Could be more streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 parameters, nested objects, no output schema), the description is insufficient. It does not describe the deliverable structure, return format, or how to use results. Annotations partially fill safety gaps but overall incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 20% (only async and dataSources.items.description have descriptions). The description says 'send the documented case fields' but does not explain the required parameters company, dataSources, targetSegments, or the optional repositioningFocus. Fails to add meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it produces a structured, audited deliverable on customer voice, targeting C-suite (CMO) with a reference case. However, it does not explicitly define the core function in simple terms, relying on French and jargon.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives. Mentions target audience (CMO) but no exclusions or prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cve_security_lookupA

Read-only

Inspect

Look up CVE vulnerability data for enterprise security teams, DevSecOps and SOC analysts. Supports two modes: exact CVE ID lookup (e.g. 'CVE-2024-3094') or keyword search by product/vendor (e.g. 'openssl', 'Apache Tomcat'). Cross-references four authoritative keyless sources: NVD NIST (official CVE database, CVSS v3 scores, affected CPEs), CISA KEV (Known Exploited Vulnerabilities catalog — exploit_in_wild flag), EPSS FIRST (exploit probability 0-1), GitHub Security Advisories (ecosystem-specific: npm/pypi/maven). Returns structured vulnerability records with CVSS v3 scores, affected product version ranges, CWE weakness classification, references and exploitation status. Signals engine produces P0/P1/P2 alerts: P0=CVSS>=9 + active exploitation, P1=CVSS>=7 or EPSS>=70%, P2=CWE pattern clusters. Relevant for EU NIS2 and DORA supply chain risk obligations. Optional env: NVD_API_KEY (raises NVD rate-limit 5→50 req/30s), GITHUB_TOKEN (raises GHSA GraphQL rate-limit). Cache TTL 6h. SLA <=25s p95.

ParametersJSON Schema

Name	Required	Description
`mode`	No	Override auto-detection: "lookup" for exact CVE ID, "search" for product/vendor keyword.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	CVE ID (e.g. "CVE-2024-3094") or product/vendor keyword (e.g. "openssl", "Apache Tomcat"). Mode is auto-detected from the CVE-YYYY-XXXXX pattern.
`max_results`	No	Maximum number of vulnerabilities to return (default 20, max 50).
`severity_min`	No	Minimum CVSS v3 severity to include in results (default: no filter).
`published_after`	No	ISO date YYYY-MM-DD — only include CVEs published after this date. Defaults to 365 days ago for search mode.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`quality_score`	Yes
`vulnerabilities`	Yes

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, openWorldHint), the description adds critical behavioral details: optional env vars for rate-limit increases, cache TTL (6h), async mode with job_id, SLA p95 <=25s, and P0/P1/P2 alert generation logic. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is detailed but well-structured: starts with main purpose, then modes, sources, alerts, compliance, optional keys, cache, SLA. Some minor extra detail (e.g., exact rate-limit numbers) could be trimmed, but overall it is efficient and front-loaded with essential info.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 params, output schema, multiple modes, asynchronous option), the description covers all necessary aspects: purpose, usage modes, data sources, output alerts, optional API keys, caching, performance guarantees, and async behavior. The presence of an output schema means return values need not be detailed, and the description sufficiently fills any gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds value by explaining the auto-detection of mode from query pattern, the meaning of the P0/P1/P2 alerts, and the significance of results (scoring, references, exploitation status). This goes beyond the raw schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool does CVE vulnerability lookup with two explicit modes (exact CVE ID and keyword search). It specifies target users (enterprise security teams, DevSecOps, SOC analysts) and distinguishes from siblings by detailing authoritative sources and alert signals (P0/P1/P2).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context for when to use this tool (security vulnerability lookup) and mentions compliance obligations (NIS2, DORA). However, it does not explicitly state when not to use it or compare to sibling tools like cyber_risk_auditor or pentest_scope_estimator, so guidance on alternatives is absent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

cyber_risk_auditorB

Read-only

Inspect

Auditeur de risque cyber — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Qonto — Audit cyber risque B2B FinTech · Score 58/100 → roadmap 90j · 8 findings critiques/high · économie prime -28%. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`techStack`	Yes
`currentPosture`	Yes

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint. The description adds that inputs are validated server-side and gives a reference case with a score and findings, but does not disclose other behavioral traits like latency, rate limits, or what constitutes the deliverable. It meets the minimum with annotations supplementing.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (three sentences) with a clear first sentence. The reference case adds specificity but could be reduced. The mix of French and English may be slightly confusing. Overall, it is concise but not perfectly front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters with nested objects, no output schema), the description is incomplete. It does not explain the return format, authentication needs, or how to interpret the deliverable. The reference case hints at outputs but is insufficient for full agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, yet the description does not explain any of the five parameters. It merely says 'send the documented case fields' without clarifying what each field means or how to construct them. This forces the agent to rely solely on the schema, which is insufficient for nested objects with many properties.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is an 'Auditeur de risque cyber' that returns a structured, audited deliverable, with a specific reference case. The verb 'audit' and resource 'risque cyber' are explicit, distinguishing it from many sibling tools that deal with other audit areas.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives. It does not mention when not to use it, prerequisites, or alternatives among the many audit-related siblings. The reference case is not a usage guideline.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

deal_coachC

Read-only

Inspect

Coach de deal MEDDIC — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Datadog Enterprise deal Société Générale €1.2M ARR — coaching MEDDIC + escalation plays + 14 next actions. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`deal`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`knownContext`	Yes
`buyingCommittee`	Yes

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true and openWorldHint=true, suggesting no side effects and possible extra outputs. The description adds that inputs are validated server-side and returns a 'structured, audited deliverable', which aligns with read-only behavior. However, it does not elaborate on what the deliverable contains or any rate limits, so the added value beyond annotations is moderate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the tool's primary function but includes a lengthy reference case ('Datadog Enterprise deal Société Générale €1.2M ARR') that adds noise rather than clarity. The phrase 'Gapup agent-payable C-suite expertise' is obscure. A cleaner, more focused sentence would improve conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, 5 parameters, no output schema), the description is insufficient. It lacks information on output structure, error handling, or parameter dependencies. The reference case does not compensate for missing guidance on how to construct valid inputs. Annotations provide minimal help.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 20% (only 'async' has a description). The description says 'send the documented case fields' without explaining the parameters like 'deal', 'buyingCommittee', or 'knownContext'. The agent cannot infer parameter semantics from the description alone, requiring reliance on poorly covered schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states 'Coach de deal MEDDIC' indicating deal coaching using MEDDIC methodology, and mentions returning a structured deliverable. However, the phrase 'Gapup agent-payable C-suite expertise (CRO)' is jargon and unclear. It does not clearly distinguish from sibling tools like meddic_scoring or deal_structurer, leaving the agent uncertain about when to use this specific tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

There is no explicit guidance on when to use this tool versus alternatives. The description includes a reference case but does not specify prerequisites, contexts, or scenarios where deal_coach is preferred over other deal-related tools. The absence of usage recommendations forces the agent to infer applicability.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

deal_structurerB

Read-only

Inspect

Structuration de deal — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: Agicap × Kyriba — Partenariat API Banking · 5 structures comparées · Term sheet 7 clauses · Score 83/100 JV. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`deal`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint and openWorldHint, so the description's mention of returning a deliverable and server-side validation adds some context but does not significantly extend beyond what annotations already imply.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences with no fluff: states purpose, target audience, output type, reference case, and validation note. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Adequate but incomplete: output format and error handling are not described, despite no output schema. The reference case provides some context, but more detail on what the deliverable contains and validation outcomes would be helpful.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 33% (only async parameter documented). The description adds no detail about individual parameters like company or deal beyond their schema definitions, failing to compensate for low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool structures deals and returns a structured, audited deliverable, with a reference case. However, it does not explicitly differentiate from siblings like deal_coach or term_sheet_negotiation, which may perform similar tasks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. It mentions inputs are validated server-side and to send documented case fields, but lacks explicit when-to-use or when-not-to-use instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dependency_vulnerability_scanA

Read-only

Inspect

SCA (Software Composition Analysis) — scans a project dependency manifest and returns known vulnerabilities for each dependency. Supports: package.json (npm), requirements.txt (Python), go.mod (Go), Cargo.toml (Rust), composer.json (PHP), Gemfile.lock (Ruby), CycloneDX SBOM JSON. PRIMARY source: OSV.dev (keyless, free, covers npm/PyPI/Go/crates.io/Packagist/RubyGems + GHSA advisories federated). CVSS enrichment: NVD NIST (when OSV lacks score). Exploitation flag: CISA KEV (known-exploited-vulnerabilities catalog). Returns per-vuln CVE/GHSA IDs, severity, CVSS score, fixed version, and actionable upgrade recommendations. Relevant for EU NIS2 supply chain risk obligations, DORA, SOC 2 vendor assessments. Cache TTL 6h. Parallel OSV queries (concurrency=10). SLA <=30s p95.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Manifest type: "package_json"=npm, "requirements_txt"=pip, "go_mod"=Go modules, "cargo_toml"=Rust, "composer_json"=PHP, "gem_lock"=Ruby, "sbom_cyclonedx"=CycloneDX SBOM JSON.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`severity_min`	No	Minimum severity to include in results (default: "medium").
`manifest_content`	Yes	Raw text content of the manifest file to scan (e.g. full contents of package.json, requirements.txt, etc.).
`include_transitive`	No	Include transitive/indirect dependencies in results (default: true).

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`sources`	Yes
`summary`	Yes
`ecosystem`	Yes
`quality_score`	Yes
`recommendations`	Yes
`vulnerabilities`	Yes
`dependencies_parsed`	Yes

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readonly, not destructive, and non-idempotent. Description adds cache TTL, concurrency, SLA, async support, and data source details, providing rich behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is dense but not overly long; front-loads key purpose and formats. Could benefit from structured formatting (e.g., bullet points) but remains efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With output schema existing and 100% parameter coverage, description adequately covers functionality, behavior, and compliance context. No gaps identified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all 5 parameters. Description adds no extra semantic value for parameters beyond what schema provides, meeting baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool scans dependency manifests for known vulnerabilities, specifies supported formats, and lists data sources. It is distinct from siblings like cve_security_lookup.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context by mentioning compliance frameworks (EU NIS2, DORA, SOC 2), indicating relevance for supply chain audits. However, does not explicitly compare to alternatives or state when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

discovery_prepC

Read-only

Inspect

Préparation discovery — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Discovery Salesforce × Airbus — VP Digital Marc Legrand · Signaux achat confirmés · +28 pts conversion demo. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`contact`	Yes
`ourOffer`	Yes
`prospect`	Yes
`meetingGoal`	No

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and openWorldHint. The description adds that inputs are validated server-side and that a structured deliverable is returned, which aligns with annotations. However, no further behavioral details (e.g., response format, performance) are provided.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (3 sentences) but uses jargon ('Gapup agent-payable C-suite expertise (CRO)') and includes a case study reference that may not be universally understood. It could be more clearly structured for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of the input schema (5 parameters, nested objects) and lack of output schema, the description is insufficient. It does not explain what the deliverable contains, how to interpret results, or provide examples. The reference case is too specific to generalize.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only 'async' has a description). The description does not explain the meaning or expected usage of 'contact', 'ourOffer', 'prospect', or 'meetingGoal'. It simply says 'send the documented case fields', which is unhelpful for an agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a structured audited deliverable, but the exact purpose (discovery preparation) is vague and not clearly differentiated from sibling tools like 'deal_coach' or 'battle_cards_live'. The reference case is specific but does not clarify the general use case.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description does not mention prerequisites, context, or exclusions. The phrase 'send the documented case fields' implies a familiar context but does not help an agent decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

diversity_inclusion_metricsC

Read-only

Inspect

Métriques diversité & inclusion — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: Cas démo — Métriques diversité & inclusion. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`ambitions`	Yes
`currentState`	Yes
`regulatoryContext`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that it returns a 'structured, audited deliverable' and that inputs are validated server-side, providing some behavioral context. However, it does not disclose specifics like output format or processing time, which are missing.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (3 sentences) and front-loads the purpose. However, the first sentence is essentially a title repetition of the tool name and title, which is redundant. It could be more streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex input schema with nested objects and no output schema, the description should explain the return format and content more clearly. It only mentions a 'structured, audited deliverable' and references a demo case without elaboration. This is insufficient for an agent to fully understand the tool's output.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (17%). The description does not elaborate on any parameter beyond stating inputs are validated server-side. It fails to add meaning to parameters like 'async', 'focus', or nested objects, leaving the agent to rely on the sparse schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a structured, audited deliverable on diversity & inclusion metrics for C-suite sustainability expertise, which is clear. However, it does not explicitly differentiate from sibling ESG-related tools, though the unique name and focus provide implicit distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives, when not to use it, or its prerequisites. The description mentions it is for 'Gapup agent-payable C-suite expertise' and includes a reference case, but this is insufficient as usage guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

domain_tech_fingerprintA

Read-only

Inspect

Empreinte tech d'un domaine — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: What is the tech stack of — frontend, CMS, analytics, CRM, CDN, hosting? · What buying signals does 's technology footprint reveal for sales prospecting? · Analyze for supply-chain technology risk and third-party vendor exposure. · What is the best outreach angle for a sales rep targeting based on their detected stack? · Run a CISO-style technology fingerprint on — identify legacy tech, missing security headers, and vendor risk. · Has recently changed their marketing or analytics stack — any vendor adoption signals? Reference case: velora-payments.io · Next.js + Cloudflare + Stripe + GA4 + HubSpot · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`depth`	Yes		standard
`focus`	Yes		tech-buying
`target_domain`	Yes

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds that it returns a 'structured, audited deliverable' and mentions input validation, which are useful behavioral details beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose with multiple questions and an example, which could be condensed. However, it is front-loaded with the primary purpose. A more streamlined version would improve clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description lists questions the tool answers, providing some expectation of output content. However, it does not specify output fields or format, and the complexity is moderate. The description is sufficient but not comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25%, with only 'async' having a description. The main description does not explain the parameters 'depth', 'focus', or 'target_domain' meaningfully, leaving the agent to rely on enums and type hints. The reference case implies usage but does not clarify parameter values.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs a technical fingerprint of a domain's tech stack, providing structured deliverables. It distinguishes itself from siblings by focusing on tech stack analysis, buying signals, and security, with a reference case example.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists specific use cases and questions the tool answers, such as tech stack identification, buying signals, and vendor risk. It does not explicitly state when not to use it or compare to alternatives, but the context is clear enough.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dora_metrics_deep_diveA

Read-onlyIdempotent

Inspect

Analyzes DORA metrics (Deployment Frequency, Mean Time to Recovery, Change Failure Rate) with deep correlation to code review patterns. Designed for CTOs to identify bottlenecks in software delivery pipelines. Inputs include GitHub repository identifiers and optional time ranges. Outputs structured metrics with trend analysis and code review depth insights.

ParametersJSON Schema

Name	Required	Description
`repo`	Yes	GitHub repository in format 'owner/repo'
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`since`	No	Start date for analysis (ISO 8601)
`until`	No	End date for analysis (ISO 8601)
`branch`	No	Branch name to analyze (default: main)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`metrics`	No
`sources`	No
`warnings`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds context about output (structured metrics with trend analysis and code review depth insights), which is beyond the annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (two sentences) and front-loaded with the core purpose, target user, and key inputs/outputs. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With a complex domain (DORA metrics), the description covers purpose, inputs, and output format. It could mention the async parameter behavior, but the output schema exists and the tool has good annotations, so completeness is high.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema already documents all 5 parameters. The description mentions 'GitHub repository identifiers and optional time ranges' but does not add new meaning beyond the schema. Baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool's function: analyzing DORA metrics (Deployment Frequency, Mean Time to Recovery, Change Failure Rate) with deep correlation to code review patterns. It specifies the target user (CTOs) and inputs (GitHub repo and time ranges), distinguishing it from siblings like mttr_breakdown_analyzer or code_review_depth_optimizer.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states the tool is 'Designed for CTOs to identify bottlenecks in software delivery pipelines,' which implies usage but does not provide explicit guidance on when to use this vs. siblings or conditions to avoid. No alternative tools are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dora_operational_resilience_stress_tesA

Read-onlyIdempotent

Inspect

Assess DORA operational resilience by simulating ICT failure scenarios for financial entities. Designed for legal/compliance teams to evaluate ICT risk management under DORA Article 25. Inputs include failure scenario parameters (e.g., ICT service type, duration, impact radius) and entity profile. Outputs structured resilience scores, regulatory gaps, and mitigation recommendations with EUR-Lex/FTC enforcement references.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entityType`	Yes
`impactRadius`	Yes
`ictServiceType`	Yes
`existingMitigations`	No
`failureDurationHours`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`warnings`	No
`regulatoryGaps`	Yes
`resilienceScore`	Yes
`simulationTimestamp`	No
`recommendedMitigations`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, openWorldHint, and idempotentHint. The description adds value by specifying the output includes structured resilience scores, regulatory gaps, and mitigation recommendations with legal references, which goes beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences with no wasted words. The first sentence states the core action, the second identifies the audience, and the third lists inputs and outputs. Perfectly front-loaded and efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of the tool (DORA regulation, 6 parameters, output schema exists), the description covers purpose, users, regulation, inputs, and outputs comprehensively. It leverages the output schema to avoid detailing return format, making it complete for an agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema description coverage at only 17%, the description compensates by listing key parameter categories (ICT service type, duration, impact radius, entity profile) and their role in the simulation. It adds meaning beyond the schema, though it could be more specific about valid values.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly specifies the tool's purpose: assessing DORA operational resilience by simulating ICT failures for financial entities. It identifies the target users (legal/compliance teams) and the regulation (DORA Article 25), distinguishing it from sibling tools like dora_metrics_deep_dive.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for DORA compliance but does not explicitly state when to use this tool versus alternatives, nor does it exclude scenarios where it is inappropriate. It provides context but no clear when/when-not guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dpdp_consent_artifact_generatorA

Read-onlyIdempotent

Inspect

Generates structured consent artifacts compliant with India's Digital Personal Data Protection Act (DPDP). Designed for legal teams to verify or create consent records with timestamped logs, purpose limitation, and data subject rights. Accepts data subject details, processing purpose, and legal basis as inputs. Returns a signed artifact with audit trail and validation status.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`legalBasis`	Yes	Legal basis for processing under DPDP
`dataSubjectId`	Yes	Unique identifier for the data subject
`dataCategories`	No	Categories of personal data being processed
`processingPurpose`	Yes	Specific purpose for data processing
`retentionPeriodDays`	No	Retention period in days

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`artifact`	No
`warnings`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds that it returns a signed artifact with audit trail and validation status, which is beyond annotations. Annotations already provide idempotentHint and readOnlyHint; description clarifies output nature. No clear contradiction despite 'generates' verb.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences: first states purpose, second identifies audience and actions, third lists inputs and outputs. No redundant information, front-loaded with key purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of output schema, description adequately covers purpose, audience, input summary, and output type. It mentions key features (timestamped logs, purpose limitation, data subject rights) without being overly verbose.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage for all 6 parameters, so description does not need to add parameter details. It only lists inputs generically, adding no extra meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it generates structured consent artifacts compliant with India's DPDP Act. Differentiates from siblings by specifying jurisdiction and regulation, e.g., distinct from 'lgpd_data_subject_rights_automator'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly targets legal teams for verifying or creating consent records under DPDP. Implies when to use (Indian DPDP compliance) but does not explicitly state when not to use or name alternative tools for other regions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dual_use_export_risk_mapperA

Read-onlyIdempotent

Inspect

As a COO, quickly assess export compliance risks for components in your supply chain. This tool analyzes bills of materials (BOMs) against EU dual-use export control lists and ICAO/IMO restricted items data. Input a list of part numbers, descriptions, or HS codes to receive a risk assessment with actionable insights. Output includes risk levels, applicable regulations, and source references for audit trails.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`bomItems`	Yes
`includeSources`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`results`	No
`sources`	No
`warnings`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, and open-world behavior. The description adds context by specifying the data sources (EU dual-use lists, ICAO/IMO) and output fields (risk levels, regulations, sources), which go beyond the annotations. No contradictions are present.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, each delivering clear information: audience and purpose, operation, and output. No redundant words; front-loaded with the action verb.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, inputs, and outputs adequately for a risk assessment tool, given that an output schema exists. It does not mention asynchronous behavior or the 'includeSources' parameter, but those are detailed in the schema. Overall, it provides sufficient context for an agent to select the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 33%, requiring compensation. The description explains the main parameter 'bomItems' by mentioning part numbers, descriptions, or HS codes, but does not cover 'async' or 'includeSources'. Value is added for the primary input but incomplete for other parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool assesses export compliance risks for components in a supply chain, analyzing BOMs against specific control lists. The verb 'assess' and resource 'export compliance risks' are specific, and it distinguishes from siblings like 'dual_use_tech_diversion_monitor' by focusing on BOMs and actionable insights.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for supply chain compliance but does not explicitly state when to use this tool versus alternatives like sanctions screeners or other risk mappers. No 'when not to use' or direct comparisons to siblings are provided, though the input and output descriptions give some context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

dual_use_tech_diversion_monitorA

Read-onlyIdempotent

Inspect

Asynchronous T5-level tool for COO persona to detect unauthorized diversion of dual-use technologies. Cross-references shipment manifests, EU sanctions lists, and ICAO/IMO transport data to identify suspicious transfers. Inputs: shipment IDs, company identifiers, or geographic routes. Outputs structured diversion risk assessment with source provenance. Requires async:true to avoid 402 timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`route`	No
`companyId`	No	Company registration number or tax identifier
`shipmentId`	No	Unique shipment identifier (e.g., bill of lading number)
`techCategory`	No	Dual-use technology category

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`matches`	No
`sources`	No
`warnings`	No
`diversionRisk`	No	Calculated diversion risk score (0-100)

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds valuable behavioral details: it is asynchronous, requires async:true to avoid timeout, and outputs a structured risk assessment with source provenance. This goes beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences with no fluff. The first sentence states purpose, the second details functionality, the third lists inputs, and the fourth covers output and usage requirement. Every sentence earns its place, and it is front-loaded with the most critical information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and comprehensive annotations, the description adequately covers what an agent needs: purpose, inputs, behavioral constraints (async), and output nature. It is complete for a tool with this complexity and structured documentation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 80%, and the description compensates by listing input categories (shipment IDs, company identifiers, geographic routes) that map to parameters. It adds semantic context on what each input is used for, enhancing understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'detect unauthorized diversion of dual-use technologies' for a specific persona (COO). It uses a specific verb-resource pair and distinguishes from siblings like 'dual_use_export_risk_mapper' by emphasizing cross-referencing of transport data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides usage context (asynchronous, requires async:true) and lists input types, but does not explicitly specify when to use this tool versus alternatives like sanctions_screener_multi or dual_use_export_risk_mapper. The guidance is implicit rather than explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

earnings_reviewerB

Read-only

Inspect

Earnings Reviewer — Gapup agent-payable C-suite expertise (FUNDRAISING). Returns a structured, audited deliverable. Reference case: Salesforce Q3 FY2026 — call transcript + 10-Q + guidance → analyst note. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`quarter`	Yes
`analystFocus`	No
`secFilingContext`	No
`transcriptExcerpt`	Yes

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds that inputs are validated server-side and output is a structured deliverable. It doesn't detail what happens with the async parameter or data retention, but doesn't contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise (3 sentences) with front-loaded purpose and a concrete reference example. Every sentence adds value with no redundancy or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 6 parameters including nested objects and an async option, the description is too brief. It doesn't explain the output structure, how to handle async polling, or the validation rules. The reference case helps but leaves many practical gaps for agents.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is very low (17%). The description only generically mentions 'send the documented case fields' without explaining the meaning or usage of individual parameters like company, quarter, or async. The async parameter has a good description, but others lack semantic context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's verb ('Reviews') and resource ('Earnings'), and adds context about the deliverable and a reference case. However, it does not differentiate from sibling tools like 'earnings_transcript_signals' or 'sec_filing_decoder', which could overlap.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description implies usage for fundraising and C-suite analysis, but fails to provide when-not-to-use or prerequisite conditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

earnings_transcript_signalsA

Read-only

Inspect

Earnings call transcript signal extractor for equity research analysts, catalyst-driven hedge funds, and BD teams. Parses earnings transcripts (fetched or provided) to surface:

• signals (P0/P1/P2): guidance raise/cut, miss/beat vs consensus, buyback, dividend change, new product, executive change, capex shift, M&A intent, regulatory risk, competitive threat, supply chain, hiring • kpis_mentioned: Revenue, EBITDA, EPS, FCF, Gross Margin, Operating Margin with YoY/QoQ % • guidance: raised / maintained / cut / new_initiated items extracted • q_and_a_topics: top Q&A themes detected (AI strategy, China exposure, M&A pipeline, macro, etc.) • overall_tone: bullish / neutral / bearish

Sources fetched automatically: SEC EDGAR 8-K filings, Yahoo Finance earnings news, Motley Fool transcripts. If no transcript can be retrieved from any source, returns status:'failed' with an explicit warning and empty signals — never fabricated data. Accepts transcript_text override for direct analysis. Supports multilingual transcripts (de/fr/es/zh). European tickers (SAP.DE, BMW.DE) mapped to EDGAR-compatible equivalents automatically.

ParametersJSON Schema

Name	Required	Description
`lang`	No	Language hint for the transcript. Affects mock transcript language when fetch fails.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`quarter`	No	Fiscal quarter in format Q1-2026. Defaults to the most recent past quarter.
`transcript_text`	No	If provided, skips all external fetches and analyses this text directly. Minimum 100 characters.
`company_or_ticker`	Yes	Company name or ticker symbol (e.g. 'Tesla', 'TSLA', 'SAP', 'SAP.DE', 'Sanofi', 'SNY'). European tickers (SAP.DE, BMW.DE) are mapped to their ADR equivalents for EDGAR lookup.

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true and openWorldHint=true. The description adds substantial behavioral detail: automatic source fetching (SEC EDGAR, Yahoo Finance, Motley Fool), explicit 'never fabricated data', failure mode (returns status:'failed' with empty signals), support for async polling, and European ticker conversion. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-organized with bullet points for outputs, but it is relatively long. It front-loads the core purpose and then lists features. A slightly more condensed version would improve conciseness without losing information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters, absence of output schema, and high schema coverage, the description covers all essential aspects: tool purpose, data sources, failure behavior, multilingual support, async mode, and European ticker mapping. It is complete for an agent to understand and use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with descriptions for all 5 parameters. The description reiterates some parameter intent (e.g., language support, async behavior, transcript_text override) but does not add significant new meaning beyond what the schema already captures.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as an 'Earnings call transcript signal extractor' for specific user groups, listing distinct outputs such as signals, KPIs, guidance, Q&A themes, and tone. This specificity distinguishes it from sibling tools like 'earnings_reviewer' which likely focuses on review rather than concrete signal extraction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states target users (equity research analysts, hedge funds) and covers key scenarios: automatic fetching, transcript_text override, multilingual support, European ticker mapping, and failure behavior. It lacks explicit 'when not to use' guidance but provides sufficient context for appropriate selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

economic_indicatorA

Read-only

Inspect

Return a precise macroeconomic indicator for a country — the exact figure for a market-sizing, finance or strategy workflow. Indicators: gdp_usd, gdp_per_capita, gdp_growth, inflation, unemployment, population. Source: World Bank. When to use: an agent's analysis needs an authoritative country-level economic figure. Inputs: country (ISO-2 or ISO-3 code) and indicator name.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`country`	Yes	Country code, ISO-2 or ISO-3 (e.g. FR, USA)
`indicator`	Yes	Macroeconomic indicator name

Output Schema

ParametersJSON Schema

Name	Required	Description
`year`	Yes
`value`	Yes
`source`	Yes
`country`	Yes
`indicator`	Yes
`source_url`	No
`indicator_code`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds the source (World Bank) and asserts precision ('exact figure'), which provides useful context beyond annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise with three short sentences plus a list and usage note. Every sentence adds value, and the structure is front-loaded and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown), the description covers purpose, usage, inputs, and source. It could mention data recency or time range, but is otherwise sufficiently complete for the tool's function.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema already covers 100% of parameters with descriptions. The description re-lists the indicator enum and mentions ISO codes, but adds no new meaning beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a precise macroeconomic indicator for a country, listing specific indicators and source. However, it does not explicitly distinguish from sibling tools like fx_rate or interest_rate, though the context implies it's for broad macro figures.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use: when an agent's analysis needs an authoritative country-level economic figure. It does not provide exclusions or alternatives, but the guidance is clear and contextual.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

email_domain_health_checkA

Read-onlyIdempotent

Inspect

Comprehensive email domain health check: MX routing, SPF authentication, DKIM signing, DMARC policy enforcement, DNSBL blacklist status (Spamhaus/SpamCop/Barracuda), TLS certificate validity, and WHOIS registration age. Aggregates a reputation score 0-100 and generates P0/P1/P2 deliverability signals. Accepts a domain (stripe.com) or email address (info@stripe.com). Detects role-based addresses (info@, support@, admin@, noreply@) that have higher bounce rates. Detects email provider (Google Workspace, Microsoft 365, Amazon SES, etc.). P0 signals: blacklisted / no MX / TLS expired / no SPF + DMARC none. P1 signals: SPF soft-fail / no DKIM selector / DMARC no reporting. P2 signals: role-based address / TLS expires <30d / domain age <90 days. All checks are keyless (no API keys required). Cache TTL 1h. SLA <=10s p95.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`email`	No	Full email address for additional checks: format validity, role-based detection (e.g. "ceo@stripe.com").
`checks`	No	Subset of checks to run. Defaults to all 8: ["mx","spf","dkim","dmarc","blacklist","whois","tls","reputation"]. Use a subset for faster responses (e.g. ["mx","spf","dmarc","reputation"] for quick scoring).
`domain`	Yes	Domain to check (e.g. "stripe.com" or "@stripe.com"). If an email address is provided here, the domain is extracted automatically.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mx`	Yes
`spf`	Yes
`tls`	No
`dkim`	Yes
`dmarc`	Yes
`whois`	No
`domain`	Yes
`status`	Yes
`sources`	Yes
`blacklist`	Yes
`email_valid`	No
`quality_score`	Yes
`reputation_score`	Yes
`email_is_role_based`	No
`deliverability_signals`	Yes

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint=true), the description details behavioral traits: runs up to 8 checks, defines P0/P1/P2 signals, states keyless operation, cache TTL (1h), SLA (<=10s p95), and async behavior. This adds significant context beyond annotations, fully aligning with them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is informative and mostly front-loaded with the purpose and check list. While slightly verbose, every sentence adds relevant detail. It could benefit from more explicit structuring (e.g., bullet points) but is acceptable in paragraph form.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple checks, signals, async mode), the description covers inputs, check details, output signals, performance characteristics, and usage hints. With output schema present, return values are handled. No gaps are apparent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so descriptions exist for each parameter. The description adds value by explaining the checks parameter's default and speed benefit, the domain/email acceptance, role-based detection, and async option. This enhances semantic understanding beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs a comprehensive email domain health check, listing specific checks (MX, SPF, DKIM, DMARC, DNSBL, TLS, WHOIS) and outputs (reputation score, deliverability signals). It distinguishes from siblings like 'domain_tech_fingerprint' by focusing on email deliverability.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for assessing email domain health but does not explicitly state when to use versus alternatives or provide exclusions. It mentions keyless operation and detection of role-based addresses, but lacks direct guidance on when not to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

enps_autoC

Read-only

Inspect

eNPS automatisé — Gapup agent-payable C-suite expertise (CHRO). Returns a structured, audited deliverable. Reference case: BlaBlaCar — eNPS pulse mensuel · 700 FTE 8 pays · segments × tenure × manager · plays correctifs ciblés. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`context`	Yes
`toolStack`	Yes
`segmentation`	Yes
`presenterScript`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, which match the description's output focus. The description adds server-side validation and deliverable return, but does not detail auth needs, rate limits, or processing time. With annotations, additional context is minimal.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, with a single sentence followed by a reference case. It front-loads the primary action, though the jargon slightly reduces clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, multiple nested objects, no output schema), the description is too brief. It omits details about the deliverable's structure, how to interpret results, or error handling, making it insufficient for reliable agent use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 14%, yet the description adds no new parameter explanations beyond 'send the documented case fields.' Most parameters lack descriptions in both schema and text, leaving agents with unclear semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool automates eNPS analysis and returns a structured deliverable, with a reference case for context. While the phrasing is somewhat jargon-heavy, the core purpose is identifiable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. Sibling tools include many HR-related options (e.g., 'champion_mapping', 'comp_benchmark_geo_delta'), but no differentiation or usage context is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

esg_audit_multiA

Read-only

Inspect

Multi-mode ESG intelligence for ESG analysts, sustainability officers and impact investing fund managers. Aggregates live data from CDP, SBTi, Wikipedia, Yahoo Finance and web search across five modes: • company_score — ESG score 0-100 with E/S/G breakdown + heuristic rating (AAA-CCC), from CDP grade + SBTi + sector profile • controversy_check — controversies detected via web search, classified P0/P1/P2 by type (greenwashing, emissions fraud, labour, governance) • emissions — GHG Scope 1/2/3 estimates, SBTi validation flag, net-zero target year, carbon intensity per M€ revenue • esrs_readiness — CSRD gap across 12 standards (E1-E5, S1-S4, G1-G3): readiness % + gap list + CSRD deadline + effort man-days • sfdr_classification — suggested SFDR Article 6/8/9 with rationale and sustainability indicators met

Signals: P0=critical (controversy/score<40), P1=significant (score<55/SBTi missing/ESRS<50%), P2=watch. Cache 24h.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analysis mode.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Company name, ticker, ISIN or LEI (e.g. "Microsoft", "Sanofi", "Volkswagen").
`pillar`	No	ESG pillar filter (optional, default: all).
`framework`	No	ESG framework filter (optional, default: all).

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`emissions`	No
`company_score`	No
`controversies`	No
`quality_score`	Yes
`esrs_readiness`	No
`sfdr_classification`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description explains that the tool aggregates data from live sources (CDP, SBTi, etc.) and mentions a 24-hour cache. This aligns with the readOnlyHint=true annotation and adds behavioral detail beyond the annotations. No contradictory information is present.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear introductory sentence and bullet-pointed mode list. It is informative without being verbose; every sentence adds value. Minor redundancy could be trimmed, but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's multi-mode complexity and existence of an output schema, the description covers all modes, signal levels, and data sources comprehensively. It omits mention of the async parameter's behavior (returning job_id) and the job_result polling mechanism, but these are covered by the input schema annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description adds meaningful context for the 'mode' parameter by explaining each mode's output, but the other parameters (async, query, pillar, framework) are only briefly described in the schema itself, without additional explanation in the description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the purpose: aggregating live ESG data from multiple sources across five distinct modes. Each mode is explicitly listed with its output, making the tool's function unambiguous and well-differentiated from generic siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

While the description implies usage for comprehensive ESG analysis, it does not provide explicit guidance on when to use this tool versus alternative ESG tools like supplier_esg_audit or carbon_footprint_calculator. No when-not-to-use or comparative context is given.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

esrs_narrative_builderC

Read-only

Inspect

Architecte du narratif ESRS / CSRD — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: L'Oréal France — narratif ESRS E1+E5 + S1 + G1 · CSRD reporting 2025-2026 · double-matérialité chiffrée. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`scope`	Yes
`company`	Yes
`context`	Yes
`presenterScript`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and implies a read-only generation (returns a deliverable). No additional behavioral traits like error handling or rate limits are disclosed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph that includes a reference case and a phrase about server-side validation. It is somewhat wordy and includes extraneous details (e.g., the L'Oréal example) that may not be essential for the agent.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested objects, no output schema), the description is incomplete. It does not explain the output structure, how to use the async parameter, or the purpose of the focus field. The lack of output schema information leaves a significant gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (17%), and the description does not explain any parameters. The schema defines types and constraints but lacks descriptions for most fields (e.g., company, scope, context). The description adds no parameter-level meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it builds ESRS/CSRD narratives and returns a structured audited deliverable, with a reference case. It is clear in purpose but could be more specific about the output format to differentiate from similar sibling tools like sustainability_report.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives such as sustainability_report or rse_policy_builder. The description lacks context on prerequisites or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

event_marketingC

Read-only

Inspect

Marketing événementiel — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Pennylane (€120k/an budget événements) — 7 événements sélectionnés · coût-MQL -38% vs année précédente. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`teamSize`	Yes
`geography`	Yes
`objectives`	Yes
`currentEvents`	Yes
`targetAudience`	Yes
`annualBudgetEur`	Yes

Tool Definition Quality

C2.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint and openWorldHint, which the description does not contradict. Description mentions server-side validation and structured deliverable, adding some context. However, it doesn't disclose if the tool is slow or costs money despite 'agent-payable' wording.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is relatively short but includes a fluffy brand statement and a specific reference case that may not generalize. Could be more direct about the tool's functionality.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With complex nested schema and no output schema, the description is insufficient for an agent to fully understand invocation context and expected results. Missing details on how to use async parameter and return format.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Description adds no parameter-specific information. Input schema coverage is only 13%, and description fails to explain any of the 8 parameters beyond telling to send documented case fields.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description vaguely states it provides CMO-level event marketing expertise and returns a structured deliverable, but lacks specificity on what the tool actually computes or recommends. Reference case hints at event selection and cost improvement but doesn't clearly define the output.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this vs sibling tools (e.g., marketing_roi_dashboard, customer_marketing). The description focuses on input requirements but doesn't help an agent decide context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

executive_comp_peer_benchmarkA

Read-onlyIdempotent

Inspect

As a Chief Human Resources Officer (CHRO), benchmark executive compensation packages against peer companies using public SEC filings and private compensation data from Equilar and Bloomberg. Inputs include executive name, title, company ticker, and peer group criteria. Outputs structured compensation metrics (base salary, bonus, equity, total compensation) with source attribution and confidence scores.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`peerGroup`	No
`fiscalYear`	No
`companyTicker`	Yes
`executiveName`	Yes
`executiveTitle`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`compensation`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, and idempotentHint. The description adds behavioral context: it uses public and private compensation data, outputs confidence scores and source attribution, enhancing transparency beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with purpose and context. It uses no wasted words, efficiently covering the tool's role, inputs, and outputs.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested peerGroup, 6 parameters, low schema coverage) and presence of an output schema, the description provides a solid overview of purpose, data sources, input types, and output metrics. However, it lacks details on parameter semantics for peerGroup and fiscalYear, leaving some gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (17%), with only 'async' documented. The description lists main inputs (executive name, title, company ticker, peer group criteria) but lacks details on peerGroup structure, fiscalYear format, or required fields, providing partial but incomplete compensation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool benchmarks executive compensation against peer companies using specific data sources (SEC filings, Equilar, Bloomberg), and lists inputs and outputs. It distinguishes itself from siblings like comp_benchmark_geo_delta by focusing on peer benchmarking for CHROs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for executive compensation benchmarking but does not explicitly state when to use this tool versus alternatives or provide exclusions. No guidance on prerequisites or when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

financial_model_3statementA

Read-only

Inspect

Pure-compute 3-statement financial model builder (Income Statement + Balance Sheet + Cash Flow). Feed assumptions (revenue growth, COGS%, OpEx, CapEx, working capital, tax rate, depreciation, debt schedule) → receive a full 3-5 year projection with integrated DCF valuation. Supports IFRS / US_GAAP / PRC_GAAP (中国会计准则) norms with bilingual ZH+EN labels for PRC. Modes: build (full 3-statement model) | scenario_analysis (base/bull/bear ±20% growth) | sensitivity (1 KPI × 1 input, 5-point grid). No external data needed — all computed from assumptions. ICP: VC due diligence, M&A analysts, CFO SMB, startup founders pitching investors, biotech/SaaS modeling. Returns balance_check_ok per year, DCF enterprise/equity value, and coherence warnings.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	build = full 3-statement model \| scenario_analysis = base/bull/bear \| sensitivity = 1 KPI × 1 input
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`assumptions`	Yes	Financial assumptions for the model
`sensitivity_kpi`	No	KPI to observe in sensitivity mode.
`sensitivity_input`	No	Assumption param to vary in sensitivity mode. E.g. 'growth_rates_pct[0]' or 'cogs_pct_of_revenue'.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`norms`	Yes
`status`	Yes
`sources`	No
`warnings`	Yes
`cash_flow`	No
`scenarios`	No
`sensitivity`	No
`balance_sheet`	No
`quality_score`	Yes
`valuation_dcf`	No
`income_statement`	No

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, indicating no destructive actions. The description adds transparency by stating it is 'pure-compute', explaining what it returns ('balance_check_ok per year, DCF enterprise/equity value, and coherence warnings'), and noting 'No external data needed'. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is informative but somewhat verbose (multiple sentences). However, it is well-structured: core function first, then input list, modes, output summary, and ICP. The first sentence clearly states the tool's purpose, and each subsequent part adds useful context. It earns its length but could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple modes, assumptions output, DCF valuation) and the existence of an output schema (which precludes needing to describe return format), the description is thorough: it covers modes ('build', 'scenario_analysis', 'sensitivity'), output specifics, and even the accounting standards supported (IFRS, US_GAAP, PRC_GAAP). It addresses the ICP and states 'No external data needed', which is crucial for user understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description lists assumptions (e.g., 'revenue growth, COGS%') but does not add explanatory value beyond the schema descriptions. It does not clarify format or constraints for parameters like sensitivity_kpi or sensitivity_input beyond what schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with 'Pure-compute 3-statement financial model builder (Income Statement + Balance Sheet + Cash Flow)', clearly specifying the verb ('builds'), resource ('3-statement financial model'), and scope. It distinguishes itself from sibling tools (e.g., margin_doctor_finance, tax_optimization) by focusing on a full integrated model with DCF valuation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states when to use: 'Feed assumptions → receive a full 3-5 year projection with integrated DCF valuation' and lists the ICP (VC due diligence, M&A analysts, CFO SMB, etc.). It does not explicitly say when not to use or mention alternatives, but it implies the tool is for building a complete model from scratch, which is a distinct use case among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

fraud_detectorC

Read-only

Inspect

Détecteur de fraude — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: TechManu SAS — Industriel FR €32M CA, 148 FTE · 30j · 21 anomalies · €487k à risque. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`analysisPeriodDays`	Yes
`transactionVolumes`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and that the tool returns a structured, audited deliverable, along with a concrete reference case showing expected output metrics (anomalies, risk amount). This adds moderate behavioral context beyond the annotations but does not detail side effects or scope.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences plus a reference case, with no filler. It front-loads the purpose and then adds a concrete example. Efficient, though the reference case could be integrated more seamlessly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has high complexity (nested objects, 5 parameters, many siblings, no output schema) but the description is minimal. It does not specify the return format, pagination, or how to interpret the structured deliverable. The reference case helps but is not a full specification.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, so the description should compensate by explaining parameters, but it only says 'send the documented case fields' without elaborating on the required company, analysisPeriodDays, or transactionVolumes objects. The optional async and focus parameters are not mentioned. The reference case implies certain fields but does not formally define them.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool is a fraud detector that returns a structured, audited deliverable, and provides a reference case (TechManu SAS) illustrating the output. This gives a clear verb (detect) and resource (fraud), and hints at the target (C-suite risk). However, it does not explicitly differentiate from sibling fraud detection tools like affiliate_fraud_clickstream_detector or x402_payment_fraud_detector.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes 'Gapup agent-payable C-suite expertise (RISK)' suggesting a high-level context, but provides no explicit guidance on when to use this tool versus the many siblings. There are no when-to-use or when-not-to-use conditions, nor mention of alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_business_ideasA

Read-only

Inspect

Return vetted, automation-scored business ideas from the FTG idea bank — each with an autonomy score, monetization model and conservative/median/optimistic MRR projections. When to use this tool: an agent or founder wants ranked, buildable business ideas. Input: optional category and limit.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No
`category`	No	Optional category filter

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`ideas`	Yes

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, openWorldHint) indicate safe read operation and dynamic data. The description adds detail about output (autonomy score, monetization model, projections) beyond annotations. No contradiction, but the description does not mention any additional behavioral traits like pagination or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences plus a usage sentence. Information is front-loaded, no redundant text. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown but indicated) and annotations covering safety, the description adequately covers purpose, usage, and basic parameters. Complexity is low, and the description is complete for an agent to understand and use the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 67% (async and category described, limit not). Description mentions 'optional category and limit' but does not explain limit's meaning beyond schema's min/max. async usage is fully explained in schema. Description adds minimal value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns 'vetted, automation-scored business ideas' with specific attributes (autonomy score, monetization model, MRR projections). It distinguishes itself from siblings like ftg_business_plan and ftg_market_gap by focusing on a list of ranked ideas.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'When to use this tool: an agent or founder wants ranked, buildable business ideas.' This provides direct guidance on the appropriate context, and the sibling list offers alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_business_planA

Read-only

Inspect

Return the business plan for a market-gap opportunity — direct-trade or local-production, with CAPEX, OPEX, ROI, payback period, automation level and the full plan. Cache-first: returns the stored plan when available, otherwise reports that generation is required (the FTG platform produces plans on demand). When to use this tool: an agent has an opportunity_id (from ftg_market_gap) and needs the investable plan. Input: an opportunity_id.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`opportunity_id`	Yes	Opportunity id obtained from ftg_market_gap

Output Schema

ParametersJSON Schema

Name	Required	Description
`plans`	No
`status`	Yes
`message`	No
`plan_count`	No
`opportunity_id`	Yes

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and openWorldHint=true. Description adds cache-first behavior: returns stored plan or reports generation required, which is useful beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is three sentences, front-loaded with purpose, then cache behavior, then usage. Minor redundancy (third sentence partly restates first) but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers purpose, cache mechanism, prerequisite (opportunity_id from ftg_market_gap). Output schema exists so return values are documented. Could mention error handling but adequate for this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with both parameters described. Description only reiterates 'Input: an opportunity_id' without adding new semantics. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states verb 'Return' and resource 'business plan for a market-gap opportunity', specifies what the plan includes (CAPEX, OPEX, ROI, etc.), and distinguishes from sibling tools like ftg_market_gap by requiring its opportunity_id.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says 'When to use this tool: an agent has an opportunity_id (from ftg_market_gap) and needs the investable plan', providing clear context. Does not mention when not to use or alternatives, but the use case is well-defined.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_country_regulationsA

Read-only

Inspect

Return import, trade and production regulations for a country — category, title, summary and source. When to use this tool: an agent checks regulatory or compliance requirements before trading or producing in a market. Input: a country, with an optional category.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No
`country`	Yes	Country ISO code or name
`category`	No	Optional regulation category filter

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`regulations`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true and openWorldHint=true, and the description is consistent. It adds behavioral context by specifying the output structure (category, title, summary, source) and indicating that the tool returns information rather than performing actions. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences with no extraneous information. It is front-loaded with the core functionality, followed by usage guidance and input specification. Every sentence contributes value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and annotations, the description is largely complete. It covers what the tool does, when to use it, and the input parameters. It could optionally mention that multiple regulations may be returned, but this is implied and not a significant gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 75%, so baseline is 3. The description highlights the key parameters (country, category) and their roles, but does not add detail beyond the schema. It omits async and limit, which are in the schema but not explained in the description. The description reinforces the main parameters adequately.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb 'Return' and specifies the resource 'import, trade and production regulations' for a country. It clearly distinguishes from sibling tools like ftg_country_study (which likely provides broader country info) and ftg_market_gap (focus on market opportunities).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use: 'an agent checks regulatory or compliance requirements before trading or producing in a market.' This provides clear context, but it does not explicitly mention when not to use or suggest alternatives, which would improve the score.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_country_studyA

Read-only

Inspect

Return the in-depth FTG country study — multi-part structured analysis of a country's trade and production landscape. When to use this tool: an agent needs deep country context before a sourcing, export or investment decision. Input: a country.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`country`	Yes	Country ISO code or name

Output Schema

ParametersJSON Schema

Name	Required	Description
`note`	No
`parts`	Yes
`country`	Yes
`part_count`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations readOnlyHint and openWorldHint already indicate safe read behavior. Description adds that it's a multi-part structured analysis, but does not elaborate on latency, pagination, or other traits. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no fluff, front-loaded purpose. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read tool with output schema, the description is sufficient. Could mention if there are limits, but not necessary given clarity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description's 'Input: a country' adds minimal value. Parameters are fully documented in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses specific verb 'Return' and resource 'in-depth FTG country study' with scope 'country's trade and production landscape'. It clearly distinguishes from siblings like ftg_country_regulations.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'needs deep country context before a sourcing, export or investment decision'. Does not mention when not to use, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_investor_directoryA

Read-only

Inspect

Return investors from the FTG directory — VC, PE and impact funds with type, firm, website, ticket-size range, sectors and stages of interest. When to use this tool: an agent builds a fundraising shortlist. Input: optional country and limit.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No
`country`	No	Optional country ISO code or name

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`investors`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, so the safety profile is clear. The description adds input guidance (optional country and limit) but does not disclose behavioral traits beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is extremely concise at two sentences plus a short phrase, with no unnecessary words. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the existence of an output schema, the description adequately covers purpose, inputs, and use case. It is complete for a simple lookup tool, though it could mention what it does not cover (e.g., non-FTG data).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 67%, and the description mentions 'optional country and limit' but does not add significant detail beyond the schema. The baseline of 3 is appropriate as the schema does most of the work.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool returns investors from the FTG directory, including specific attributes (type, firm, website, ticket-size range, sectors, stages). The verb 'Return' is clear, and the FTG prefix helps distinguish it from generic investor tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description clearly states when to use: 'an agent builds a fundraising shortlist.' It does not provide exclusions or alternatives, but the context is explicit and actionable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_market_gapA

Read-only

Inspect

Return the import/production market-gap opportunities for a country — commodities where local demand outpaces local supply. Each opportunity carries the gap value (USD/year), the gap volume (tonnes/year), a 0-100 opportunity score and the potential margin. When to use this tool: an agent needs to know what a country structurally under-produces or over-imports, for trade sourcing, import/export or local-production investment decisions. Input: a country (ISO-2 code or name).

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No	Maximum opportunities to return (default 20)
`country`	Yes	Country ISO-2 code (e.g. 'SN', 'KE') or name (e.g. 'Senegal')

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`country`	Yes
`opportunities`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint, so the description adds value by detailing the output fields (gap value, volume, score, margin) without contradicting annotations. It does not cover pagination or error handling, but the added context is sufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (5 sentences) and front-loads the purpose, then provides usage guidelines and input summary. Every sentence adds value with no fluff or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers when to use, input format, and output structure. It does not address error handling (e.g., invalid country) or interpretation of scores, but given the presence of an output schema, it is mostly complete for a lookup tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, and the description largely repeats parameter details from the schema (e.g., country input as ISO-2 code or name, limit default 20). It adds no new meaning beyond what the schema provides, so baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns 'market-gap opportunities' for a country, specifying the verb 'return' and the resource (import/production gaps). It details the output fields (gap value, volume, score, margin), making the purpose unambiguous and distinguishing it from sibling ftg_* tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes an explicit 'When to use this tool' section that outlines scenarios (trade sourcing, import/export decisions). While it does not name alternative tools, the context is clear enough for an agent to decide when this tool is appropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_opportunity_scoutA

Read-only

Inspect

Rank the best countries for a given commodity — where the market gap, opportunity score and potential margin are highest. Cross-country scouting. When to use this tool: an agent has a commodity and needs to know WHERE to sell, export to or set up local production. Input: a commodity name.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No	Maximum countries to return (default 20)
`commodity`	Yes	Commodity name (e.g. 'rice', 'soybean', 'poultry')

Output Schema

ParametersJSON Schema

Name	Required	Description
`note`	No
`count`	Yes
`commodity`	Yes
`countries`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true and openWorldHint=true. The description adds that it ranks countries, which is consistent and provides a bit more context, but does not disclose additional behavioral traits beyond what annotations indicate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three sentences, front-loading the main action and including a clear usage guideline. No unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description, combined with the schema and annotations, provides sufficient context for using the tool. It explains the input and output (ranking countries), though it could specify what 'opportunity score' means or the ranking criteria.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and descriptions for all parameters are already detailed. The tool description only mentions 'commodity name' as input, adding no further meaning to the parameter descriptions already in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to rank countries for a given commodity based on market gap, opportunity score, and potential margin. It uses specific verbs (rank, sell, export) and distinguishes from sibling tools like ftg_country_study by emphasizing cross-country scouting.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool ('an agent has a commodity and needs to know WHERE to sell, export to or set up local production'). It does not explicitly mention when not to use it or provide alternative tools, but the context is clear given the sibling names.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_production_economicsA

Read-only

Inspect

Return production cost benchmarks (CAPEX/OPEX per unit, value ranges, scenarios, quality tiers) and agronomic yields (t/ha, cycles per year) for a commodity. When to use this tool: an agent sizes the economics of producing a commodity. Input: a commodity, with an optional country.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No
`country`	No	Optional country ISO code or name
`commodity`	Yes	Commodity name or slug

Output Schema

ParametersJSON Schema

Name	Required	Description
`yields`	Yes
`commodity`	Yes
`cost_benchmarks`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, so the bar is lower. The description adds no extra behavioral context (e.g., data freshness, authentication needs, rate limits). It does not contradict annotations, but it doesn't enrich beyond them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no wasted words. It front-loads the key returns, then provides usage guidance and input specification. Every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, return values are not needed. The description covers the main functionality (cost benchmarks and yields) and mentions the required and optional inputs. It is complete for the complexity level, though it omits explanation of the limit parameter.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 75%, so the description adds marginal value. It restates that commodity is required and country is optional, which is already in the schema. It does not explain limit or async parameters, but the schema covers them. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns production cost benchmarks and agronomic yields for a commodity, specifying exact items like CAPEX/OPEX per unit, value ranges, and yield metrics. This is a specific verb+resource combination that distinguishes it from siblings like ftg_production_methods.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'When to use this tool: an agent sizes the economics of producing a commodity.' This provides a clear use case, though it does not mention when not to use or alternative tools, which would elevate it to a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_production_methodsA

Read-only

Inspect

Return the production methods for a commodity — each with a description, ordered process steps, pros/cons and a popularity rank. Methods are commodity-canonical: one curated set per commodity, reusable across every country. When to use this tool: an agent evaluates HOW a commodity is produced or processed, compares techniques, or builds a production plan. Input: a commodity slug or name.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`commodity`	Yes	Commodity slug or name (e.g. 'rice', 'tomato', 'cashew')

Output Schema

ParametersJSON Schema

Name	Required	Description
`note`	No
`methods`	Yes
`commodity`	Yes
`method_count`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true. Description adds that methods are 'commodity-canonical' but does not disclose other behavioral traits like rate limits or performance. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each with distinct purpose: return content, canonical nature, usage guidance. No redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a read-only tool with output schema, description covers purpose and usage adequately. Missing details on limitations or examples but sufficient given annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers both parameters fully. Description only restates 'Input: a commodity slug or name' which does not add new semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns production methods for a commodity, listing included details (description, steps, pros/cons, rank). It distinguishes from siblings by emphasizing commodity-canonical and cross-country reusability.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit usage guidance: 'When to use this tool: an agent evaluates HOW a commodity is produced...' This gives clear condition, though no exclusion or alternative mention.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_seller_catalogA

Read-only

Inspect

Return seller catalogues registered on FTG — exporters and producers with their commodity, monthly capacity, certifications and target export markets. When to use this tool: an agent builds a supplier or sourcing shortlist. Input: optional seller country and commodity.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No
`country`	No	Optional seller country ISO code or name
`commodity`	No	Optional commodity filter

Output Schema

ParametersJSON Schema

Name	Required	Description
`count`	Yes
`sellers`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true (safe read) and openWorldHint=true (results may not be exhaustive). Description adds context about data content (exporters, producers, etc.) but does not discuss behavioral traits like pagination or rate limits. Adds some value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences: first describes output, second describes usage and inputs. Front-loaded, no wasted words. Efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown in input but context indicates true), the description need not detail return format. It covers purpose and input guidance well. Minor gap: does not mention the async parameter, but overall complete for a simple read tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 75% (3 of 4 parameters have descriptions). Description mentions optional country and commodity filters, matching schema. The async parameter is not mentioned. Description adds little beyond schema definitions, so baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns seller catalogs with specific fields (commodity, capacity, certifications, target markets) and explicitly mentions when to use it (building supplier/sourcing shortlist), distinguishing it from numerous diverse siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states 'When to use this tool: an agent builds a supplier or sourcing shortlist.' Provides clear context for usage, though does not mention exclusions or alternatives, which is acceptable given sibling diversity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ftg_sourcing_buyersA

Read-only

Inspect

Return verified local buyers in a country — companies sourcing a given commodity, with buyer type, city, website, annual volume range and certification requirements. When to use this tool: an agent builds a sourcing or export shortlist, or needs real B2B demand contacts in a market. Input: a country and an optional commodity filter.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`limit`	No	Maximum buyers to return (default 20)
`country`	Yes	Country ISO-2 code or name
`commodity`	No	Optional commodity slug to filter buyers by

Output Schema

ParametersJSON Schema

Name	Required	Description
`buyers`	Yes
`country`	Yes
`commodity`	No
`buyer_count`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true, so the description does not need to state it is read-only. It adds value by disclosing that buyers are 'verified' and specifying the output fields (buyer type, city, etc.). However, it lacks details on data freshness or pagination behavior beyond the limit parameter.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, front-loading the main action and purpose, with no unnecessary words. It is efficient and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given that an output schema exists and describes the return fields, the description is complete. It covers the tool's purpose, usage context, and input requirements without needing to detail output structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description only briefly mentions 'Input: a country and an optional commodity filter,' which does not add much beyond the schema descriptions of those parameters. It does not provide additional context for the async or limit parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns verified local buyers in a country, with specific fields like buyer type, city, website, annual volume range, and certification requirements. It distinguishes itself from sibling tools like ftg_seller_catalog (sellers) and ftg_investor_directory (investors) by focusing on buyers and sourcing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool: 'when an agent builds a sourcing or export shortlist, or needs real B2B demand contacts in a market.' This is clear context, but it does not mention when not to use it or provide explicit alternatives, which would be stronger.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

funding_hunterC

Read-only

Inspect

Chasseur de financements — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: PME deeptech cleantech FR €8M CA — top 30 dispositifs BPI+France2030+EU+VC. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`project`	Yes
`financials`	Yes
`preferences`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true and openWorldHint=true. Description adds 'inputs are validated server-side' and 'returns a structured audited deliverable', offering moderate transparency. No contradiction but could detail cost or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is short (3 sentences) but the first sentence mixes French and English ('Chasseur de financements — Gapup agent-payable C-suite expertise (CFO)') which is cryptic. Not optimally front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given complex nested inputs, no output schema, and many sibling tools, the description lacks details on the deliverable's structure, processing time, or how inputs affect results. Insufficient for an agent to fully understand the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 20% schema description coverage (only 'async' field has a description), the description merely says 'send the documented case fields' without explaining the nested object parameters. Fails to add meaningful semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The name 'funding_hunter' and description 'Returns a structured, audited deliverable' clearly indicate the tool finds funding opportunities. However, it does not explicitly distinguish from siblings like 'capital_strategy' or 'investor_list', and the deliverable nature is vague.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives such as 'capital_strategy' or 'investor_list'. The description lacks any when-to-use or when-not-to-use context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

fx_rateA

Read-only

Inspect

Get the current or historical foreign-exchange rate for any currency pair — the exact exchange rate, FX rate or conversion rate an agent needs to convert a currency amount or feed a finance, trading, invoicing or pricing workflow. Covers EUR/USD, USD/JPY, GBP/EUR and every ISO-4217 currency pair. Returns the latest spot rate, or a historical rate by date. Use when a workflow needs a precise live or past currency exchange rate, or to convert money between two currencies. Source: European Central Bank reference rates via Frankfurter. Inputs: from/to ISO-4217 currency codes, optional date (YYYY-MM-DD).

ParametersJSON Schema

Name	Required	Description
`to`	Yes	Quote currency, ISO-4217 (e.g. USD)
`date`	No	Optional YYYY-MM-DD for a historical rate
`from`	Yes	Base currency, ISO-4217 (e.g. EUR)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.

Output Schema

ParametersJSON Schema

Name	Required	Description
`to`	Yes
`from`	Yes
`rate`	Yes
`as_of`	Yes
`source`	Yes
`source_url`	No

Tool Definition Quality

A3.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds that it returns spot or historical rates and sources from ECB, but fails to mention the async parameter behavior (returns job_id when async=true). This omission reduces transparency for a critical behavioral trait.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the main action and use cases. It is informative but slightly verbose. Every sentence adds value, though some redundancy exists (e.g., multiple mentions of currency pairs). Still, it's well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, return values need not be explained. The description covers purpose, usage, parameters, source, and date format. However, it omits the async parameter behavior, which is a notable gap for a tool with low latency and optional async.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description mentions from/to codes and date format, matching the schema. However, it does not add meaning beyond the schema, nor does it cover the async parameter. Thus, score remains at baseline.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it gets current or historical foreign-exchange rates for any currency pair, using specific verbs like 'Get' and listing concrete use cases. It distinguishes from sibling tools by focusing on FX rates and mentioning ISO-4217 coverage.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use when a workflow needs a precise live or past currency exchange rate, or to convert money between two currencies,' providing clear context. It does not explicitly mention when not to use or alternatives, but the context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

geographic_expansionC

Read-only

Inspect

Expansion géographique — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Expansion 4 marchés (DE/UK/ES/NL) · €1.8M budget · ARR cible €3.2M Y2. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`product`	Yes
`financials`	No
`constraints`	No
`targetMarkets`	Yes
`preferredEntryMode`	No
`expansionHorizonMonths`	No

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint and openWorldHint. The description adds that inputs are validated server-side, which is minor. It does not disclose error handling or rate limits. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with a reference case; the first sentence is verbose and includes unclear jargon. Could be more direct and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 8 parameters, nested objects, and no output schema, the description lacks detail on return format, async behavior, and parameter semantics. Insufficient for an agent to use effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%. The description says only "send the documented case fields" without explaining any parameters, leaving the agent to interpret the complex nested schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a structured, audited deliverable for geographic expansion, with a reference case. However, it uses jargon ("Gapup agent-payable C-suite expertise (CSO)") that may confuse. It distinguishes from siblings like market_entry_strategist by focusing on a specific deliverable and case study, but not clearly.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like market_entry_strategist or growth_path_architect. The description merely presents a case study without exclusions or context for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

geo_logistics_intelA

Read-only

Inspect

Geospatial logistics intelligence for supply chain, maritime and transport agents. Four modes: (1) geocode_batch — resolve up to 50 addresses to lat/lon with confidence scores (OSM Nominatim + Open-Meteo fallback, 1 req/s rate-limit respected); (2) routing — road/cycling/walking route with distance_km, duration_seconds and ETA ISO timestamp between two addresses or lat/lon points (OSRM public, keyless, global); (3) port_congestion — congestion status for any UN/LOCODE port (e.g. NLRTM, SGSIN, CNSHA) with waiting vessel count, severity (low/medium/high/extreme) and average wait hours; (4) ship_tracking — AIS position, speed, course, destination and ETA for a vessel by its 9-digit MMSI. No API key required for geocode/routing/port. Optional env: AIS_STREAM_API_KEY for live ship data (otherwise MarineTraffic scrape best-effort). SLA: <=25s p95. Cache: 24h geocoding / 1h routing / 30min port / 5min ship. Quality score 0-100. Status: final/partial/failed.

ParametersJSON Schema

Name	Required	Description
`to`	No	routing only: destination address or 'lat,lon'
`from`	No	routing only: origin address or 'lat,lon'
`mode`	Yes	'geocode_batch': address -> lat/lon. 'routing': route + ETA. 'port_congestion': UN/LOCODE port state. 'ship_tracking': vessel by MMSI
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Primary input: address for geocode/routing, UN/LOCODE (e.g. NLRTM) for port_congestion, 9-digit MMSI for ship_tracking
`addresses`	No	geocode_batch only: up to 50 addresses (overrides query if provided)
`mode_transport`	No	routing only: transport mode. Default: driving

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`routing`	No
`sources`	Yes
`geocode_batch`	No
`quality_score`	Yes
`ship_tracking`	No
`port_congestion`	No

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false. The description goes beyond by disclosing SLA (≤25s p95), cache durations (24h geocoding, 1h routing, 30min port, 5min ship), rate limits (1 req/s for geocode), quality score (0-100), and status values (final/partial/failed). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is comprehensive but slightly verbose (approx. 150 words). It packs much information into a single paragraph; using bullet points or clearer mode separation would improve readability. Every sentence adds value, but some details (cache, SLA) could be more structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 modes, external APIs, caching, error states), the description covers SLA, cache durations, quality scoring, API key requirements, fallback sources, and output fields. With an output schema present, it does not need to detail return values. All critical aspects are addressed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% description coverage for all 7 parameters, so the schema itself provides full meaning. The description adds context about how modes use parameters (e.g., 'addresses' overrides 'query' in geocode_batch) but does not significantly enhance each parameter beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly defines the tool as geospatial logistics intelligence with four distinct modes (geocode_batch, routing, port_congestion, ship_tracking). Each mode is explained with its verb and resource (e.g., 'resolve up to 50 addresses to lat/lon'). This specificity distinguishes it from siblings, which are predominantly business or AI tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear usage context for each mode, including input types (addresses, UN/LOCODE, MMSI) and fallback mechanisms. It mentions API key requirements for ship tracking. However, it does not explicitly state when not to use this tool or compare with alternatives, which would elevate it to a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

global_salary_inflation_adjusterA

Read-onlyIdempotent

Inspect

Adjusts salary benchmarks for local inflation using OECD, IMF, and World Bank data. Designed for CHROs to normalize compensation across regions with accurate inflation adjustments. Inputs include country codes, base salary, and reference year. Outputs inflation-adjusted salary with data sources and warnings.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`baseSalary`	Yes
`targetYear`	No
`countryCode`	Yes
`referenceYear`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`targetYear`	No
`countryCode`	No
`inflationRate`	No
`referenceYear`	No
`adjustedSalary`	No

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and openWorldHint, so the description does not need to repeat these. However, beyond these, the description adds no additional behavioral context such as auth requirements, rate limits, or side effects. It only mentions output format.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, each earning its place: action, target audience, inputs/outputs. No redundancy or fluff. Ideal length for a tool with moderate complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of output schema and annotations, the description covers purpose, inputs, target audience, and output summary. It lacks details on the targetYear parameter and error handling, but for a CHRO audience, the level of detail is sufficient. The mention of data sources adds credibility.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only async described). The description lists three required inputs (countryCode, baseSalary, referenceYear) but does not explain the optional targetYear param or the async flag. It adds partial value by clarifying the required parameters' roles but leaves gaps.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool adjusts salary benchmarks for local inflation using OECD, IMF, and World Bank data, with a specific target audience (CHROs) and goal (normalize compensation across regions). This distinguishes it from sibling tools like comp_benchmark_geo_delta which focus on geographic deltas without inflation adjustment.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for inflation adjustment but does not explicitly state when to use this tool vs. alternatives or when not to use it. It provides context (CHROs normalizing compensation) but lacks explicit guidance on exclusion or comparison with related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

gl_reconcilerB

Read-only

Inspect

GL Reconciler — Réconciliation grand livre — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Answers: Identify the root causes of the GL breaks in 's ledger for — cluster them and rank by materiality. · For Q close: which accounts have unreconciled items over €? Provide a sign-off routing and resolution plan. · Run an automated GL reconciliation for — AR/AP/intercompany entries — flag open items, suggest journal entries. · What are the top 5 systemic control weaknesses causing recurring GL breaks at ? Recommend preventive controls. · Generate a month-end close reconciliation report for — breaks by account type, aging analysis, sign-off assignments. Reference case: Acme SaaS Q4 2026 — 47 breaks GL, €1.4M variance non postée. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`entity`	Yes
`ledgerContext`	Yes

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds context about server-side validation and that the tool returns a structured deliverable. However, it does not disclose other behavioral traits such as authorization requirements, rate limits, or what happens to input data.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is overly long and mixes French and English. It includes multiple example questions and a reference case, which adds clutter without clear structure. It is not efficiently organized for quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of nested objects and absence of output schema, the description provides a general overview but lacks details on return format, error handling, and parameter constraints. It is adequate for basic understanding but incomplete for robust usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (only one parameter described). The description does not explain the meaning of the entity and ledgerContext parameters beyond what their names imply. It refers to 'documented case fields' but does not clarify them, leaving ambiguity for the agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states it returns a structured, audited deliverable for GL reconciliation, identifying root causes of breaks, unreconciled items, and control weaknesses. It provides concrete examples and distinguishes the tool's purpose clearly from generic financial tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not provide any guidance on when to use this tool versus alternatives. It lists questions that the tool can answer but lacks explicit when-not or comparison to sibling tools, which is problematic given the large set of sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

gov_procurement_multiA

Read-only

Inspect

Aggregate public procurement tenders (calls for tender / appels d'offres) from multiple government sources simultaneously: TED Europa v3 (27 EU countries, keyless API), BOAMP France (opendatasoft, keyless), UK Contracts Finder (OCDS standard, keyless), SAM.gov United States (requires SAM_GOV_API_KEY env var), and bund.de Germany (HTML scraping, partial). Returns structured tender records with buyer authority, EU CPV sector code, estimated contract value converted to EUR via live FX rates, submission deadlines, and direct notice URLs. Use when: a B2G agent needs to find government contract opportunities matching keywords across multiple jurisdictions; building a pipeline of public tenders for bid/no-bid qualification; monitoring a domain by CPV code; market sizing public sector spend. Key inputs: query (keywords), countries (ISO-2 array), cpv_codes (EU standard codes, e.g. 72000000=IT services, 45000000=construction, 79000000=business services), min_value_eur (filter), published_after (ISO date, defaults to 30 days ago). SLA: <=25s p95 (all sources fetched in parallel, 8s budget per source). Optional env var SAM_GOV_API_KEY enables US federal tenders (free key at api.sam.gov). Quality score: 25 pts if TED EU retrieved, 15 pts per other source retrieved (max 60), 10 pts if >= 10 tenders returned, 5 pts if aggregates computed. Status: failed < 30 / partial 30-59 / final >= 60.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Keywords to search for tenders (e.g. "cybersecurity audit", "construction", "consulting AI")
`countries`	No	Countries to search. Defaults to ["EU","US","FR","UK","DE"]. Use "EU" for all 27 EU member states via TED Europa.
`cpv_codes`	No	EU Common Procurement Vocabulary codes (e.g. ['72000000'] for IT services, ['45000000'] for construction). Optional.
`min_value_eur`	No	Minimum contract value in EUR. Tenders below this are excluded. Optional.
`published_after`	No	ISO date YYYY-MM-DD. Only return tenders published after this date. Defaults to 30 days ago.

Output Schema

ParametersJSON Schema

Name	Required	Description
`query`	Yes
`status`	Yes
`sources`	Yes
`tenders`	Yes
`by_source`	Yes
`by_country`	Yes
`quality_score`	Yes
`countries_searched`	Yes

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, openWorldHint), the description adds critical behaviors: quality score system (25/15/10/5 points), status levels (failed/partial/final), SLA <=25s, async option, and env var requirement. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with sections (sources, use when, key inputs, SLA, quality score) and front-loaded. Slightly verbose but every sentence adds value; no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multi-source, async, quality scoring, status, SLA, env var), the description covers all essential aspects: what it does, sources, inputs, output interpretation, async workflow, and performance expectations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All six parameters have schema descriptions at 100% coverage, and the description enriches them with usage details: example keywords, country defaults, CPV code examples, value filter, and date defaults. The async parameter is also explained.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states it aggregates public procurement tenders from multiple government sources simultaneously, listing specific sources and return fields. It clearly distinguishes from sibling multi-source tools by focusing on procurement.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a 'Use when' section with four specific scenarios (B2G agent, pipeline building, CPV monitoring, market sizing), providing explicit guidance on appropriate usage contexts.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

growth_path_architectB

Read-only

Inspect

Architecte de croissance — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: Pennylane (€30M ARR) — 3 voies de croissance · Mix recommandé : Organique + Geo EU · ARR cible €120M en 36 mois. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`constraints`	Yes
`growthTarget`	Yes
`currentDrivers`	Yes

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds that it returns an 'audited deliverable' and mentions server-side validation, which is consistent with read-only behavior. It does not contradict annotations but adds little beyond them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, front-loaded with the tool's purpose. Every sentence adds value, no redundant or filler content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite complex nested parameters and no output schema, the description only hints at output via a reference case. It does not describe the deliverable's structure or return format, leaving the agent underinformed about what to expect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only async has a description). The tool description does not elaborate on any of the 5 parameters or nested fields. It only says 'send the documented case fields', which is insufficient to clarify meaning for complex objects like constraints or growthTarget.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a growth architect tool for C-suite, returning a structured, audited deliverable. It references a specific case study (Pennylane) implying output includes growth paths and mix recommendations. However, it does not explicitly distinguish itself from sibling tools like strategic_options_analyzer or market_entry_strategist.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions inputs are validated server-side but gives no guidance on when to use this tool vs alternatives. No when-not-to-use or alternative tools are provided, leaving the agent to infer usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

hallucination_confidence_meterA

Read-onlyIdempotent

Inspect

Evaluates the likelihood of hallucination in LLM responses by comparing against HuggingFace model confidence scores. Designed for risk assessment personas to quantify response reliability. Accepts text snippets or model outputs, returns confidence metrics and potential hallucination warnings. Cross-references with top-performing models from the HuggingFace leaderboard.

ParametersJSON Schema

Name	Required	Description
`text`	Yes	The LLM-generated text to evaluate for hallucination risk
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`model_id`	No	Optional specific HuggingFace model ID to use for evaluation
`threshold`	No	Confidence threshold below which hallucination warnings are triggered

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`confidence_scores`	No
`hallucination_likelihood`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, openWorld, and idempotent hints. The description adds that it returns confidence metrics and potential hallucination warnings, and cross-references with top HuggingFace models. This provides behavioral context beyond annotations, though it does not detail rate limits or data persistence.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, efficiently covering purpose, intended audience, and output. No superfluous content, though it could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity and the presence of an output schema, the description adequately covers input (text) and output (confidence metrics, warnings). It provides context for risk assessment personas, though it could better differentiate from similar AI evaluation tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with clear descriptions for all 4 parameters. The description adds little beyond the schema, such as mentioning 'text snippets or model outputs' which aligns with the 'text' parameter. With high coverage, baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it evaluates hallucination likelihood in LLM responses using HuggingFace model confidence scores. This specific verb+resource combination distinguishes it from sibling tools like 'bias_amplification_tracker' or 'model_behavior_drift_monitor'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions it is designed for 'risk assessment personas to quantify response reliability', providing some context. However, it does not explicitly guide when to use this tool over alternatives like 'ai_act_incident_response' or 'ai_governance_full_report_async', nor does it state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

historical_price_seriesA

Read-onlyIdempotent

Inspect

Fetch historical OHLCV price series for any ticker: stocks (AAPL, SAP.DE, 7203.T), ETFs, indices, commodities (GC=F for gold) or cryptocurrencies (BTC-USD). Returns a full date-indexed series of open/high/low/close/volume plus pre-computed statistics: total return, annualised return (CAGR), annualised volatility, max drawdown and Sharpe estimate (rf=4%). Automatically detects crypto tickers (→ CoinGecko) vs traditional assets (→ Yahoo Finance primary, Stooq fallback). Adjusts for dividends and splits when adjusted=true (default). Use cases: backtesting, factor analysis, performance attribution, charting, financial modelling. Sources: Yahoo Finance, CoinGecko, Stooq. All keyless. Optional env: AICI_RESEARCH_PROXY_URL for Bright Data routing (lifts Yahoo 429), TWELVE_DATA_API_KEY for higher Twelve Data quota.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`period`	No	Look-back period. Default: 1y.
`ticker`	Yes	Yahoo Finance ticker symbol. Examples: AAPL (US stock), SAP.DE (Frankfurt), 7203.T (Tokyo), BTC-USD (Bitcoin), GC=F (gold futures), ^GSPC (S&P 500).
`metrics`	No	Subset of fields to include (informational — all fields always returned).
`adjusted`	No	Adjust close prices for dividends and splits. Default: true.
`interval`	No	Bar interval. Default: 1d (daily).

Output Schema

ParametersJSON Schema

Name	Required	Description
`stats`	Yes
`period`	Yes
`series`	Yes
`status`	Yes
`ticker`	Yes
`sources`	Yes
`currency`	Yes
`interval`	Yes
`data_points`	Yes
`quality_score`	Yes
`splits_detected`	No
`resolved_exchange`	No
`dividends_detected`	No

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, destructiveHint=false, idempotentHint=true, openWorldHint=true. Description adds key behaviors: auto-detects crypto vs traditional assets, adjusts for dividends/splits, sources (Yahoo, CoinGecko, Stooq), keyless access, optional env vars for proxy/quota. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Paragraph is moderately long but front-loads core purpose. Every sentence adds value. Could be slightly more concise but not overly verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema, description appropriately explains return format (date-indexed series with stats) and sources. Covers all aspects for a financial data tool, leaving no gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so every parameter has a description. The description adds context like default period and examples, but these are also in the schema. No additional meaning beyond schema is provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it fetches historical OHLCV price series for various ticker types. Examples include stocks, ETFs, indices, commodities, and crypto. It distinguishes itself from sibling tools like fx_rate or interest_rate by focusing on historical price series.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly lists use cases: backtesting, factor analysis, performance attribution, charting, financial modelling. It does not mention when not to use or alternatives, but context is clear enough for an AI agent to decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

hr_benefits_esg_alignerA

Read-onlyIdempotent

Inspect

Asynchronous tool for Chief Human Resources Officers (CHROs) to align employee benefits packages with ESG (Environmental, Social, Governance) goals. Uses Eurostat HR data, MSCI ESG ratings, and Sustainalytics metrics to generate actionable recommendations. Inputs include company location, industry, and current benefits structure. Outputs ESG-aligned benefits adjustments with sustainability impact scores. Requires async:true to avoid timeout errors.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`esgFocus`	No	Primary ESG pillars to prioritize
`industryCode`	Yes	NACE or ISIC industry classification code
`companyLocation`	Yes	ISO 2-letter country code of company headquarters
`currentBenefits`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`recommendations`	No
`overallESGAlignmentScore`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds the async nature and data source context, which go beyond the annotations. Annotations already cover readOnly, openWorld, and idempotent hints, so the description adds useful behavioral details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, front-loaded with purpose, and every sentence adds value. It efficiently covers purpose, data sources, inputs, outputs, and async note.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and the presence of an output schema, the description covers key aspects: inputs, outputs, async, and target user. It is complete enough for an agent to use correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description mentions inputs (location, industry, current benefits) but does not add significant meaning beyond the input schema, which already describes each parameter. With 80% schema coverage, baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it aligns employee benefits with ESG goals, specifying the target user (CHROs), data sources, inputs, and outputs. It distinguishes itself from sibling tools by its specific focus on benefits alignment.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives clear context on when to use (benefits alignment with ESG) and notes the async requirement. However, it does not explicitly exclude alternatives or state when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

incident_response_evidence_collectorA

Read-onlyIdempotent

Inspect

As a CTO, gather forensic evidence (logs, network flows, MITRE TTPs) from public breach reports and threat intelligence sources to support incident response post-mortems. Inputs include incident identifiers, date ranges, or MITRE technique IDs. Outputs structured evidence with attack patterns, indicators of compromise, and source references. — pass async:true REQUIRED to avoid x402 timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`date_range`	No
`incident_id`	Yes	Unique identifier for the incident (e.g., CVE, GitHub Advisory ID)
`mitre_technique_ids`	No	List of MITRE ATT&CK technique IDs (e.g., T1059)
`include_network_flows`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`timeline`	No
`warnings`	No
`indicators`	No
`incident_id`	No
`network_flows`	No
`attack_patterns`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds valuable behavioral information: the mandatory async usage to prevent timeouts. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two substantive sentences plus an imperative note on async usage. It front-loads the main purpose, then lists inputs/outputs, and ends with a crucial operating instruction. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, nested objects, output schema exists), the description covers the essential purpose, inputs, and a critical usage note. The output schema handles return types, so the description's lack of detail there is acceptable. It could mention the read-only nature, but that's annotated.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 60%, and the description reiterates parameter categories (incident identifiers, date ranges, MITRE technique IDs) without adding deeper semantic meaning. The async requirement is emphasized, but overall does not significantly enrich parameter understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool collects forensic evidence from public breach reports and threat intelligence sources for incident response post-mortems. It specifies input types (incident identifiers, date ranges, MITRE technique IDs) and output structure (attack patterns, IOCs, references), distinguishing it from siblings like cve_security_lookup or vuln_exploitability_forecast.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly requires passing async:true to avoid x402 timeout, providing a critical usage guideline. It does not explicitly state when not to use the tool or compare to alternatives, but the context is clear enough for an AI agent to decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

india_market_dataA

Read-only

Inspect

Indian capital market intelligence for the IN diaspora (30M+) and investors. Covers NSE, BSE, and MCA corporate registry across four modes:

• company — full company profile: name, CIN, exchange, NSE/BSE tickers, industry, incorporation date, paid-up capital, registered office, status, directors • market_quote — real-time quote: price (INR), change%, volume, market cap, P/E ratio. Sources: Yahoo Finance (primary), BSE API, NSE API (proxy-gated) • sector_overview — Nifty/Sensex sector snapshot: top 5 companies by market cap. Supported sectors: it, banking, pharma, energy, auto, fmcg, realestate, metals, telecom, consumer • mca_filing — Ministry of Corporate Affairs filings. Requires CIN for direct lookup.

Input formats accepted: • NSE ticker (e.g. 'RELIANCE', 'TCS.NS') • BSE 6-digit code (e.g. '500325' for Reliance) • CIN 21-char (e.g. 'L17110MH1973PLC019786') • Company name EN (e.g. 'Reliance Industries', 'Tata Consultancy') • Sector keyword (e.g. 'IT services', 'banking', 'pharma')

ENV: AICI_RESEARCH_PROXY_URL with country-in routing unlocks NSE direct API and MCA.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analysis mode.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	NSE/BSE ticker, CIN (21 chars), company name (EN), or sector keyword.
`exchange`	No	Exchange filter. Default: all.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`status`	Yes
`company`	No
`sources`	Yes
`mca_filings`	No
`market_quote`	No
`quality_score`	Yes
`sector_overview`	No

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true and destructiveHint=false. The description adds context on data sources (Yahoo Finance, BSE API, NSE API), proxy requirements, and async behavior. No contradictions. Could mention rate limits but overall good.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with bullet points for modes and input formats. It is concise (approx. 150 words) and front-loaded with the overall purpose. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (four modes, multiple input types, async, proxy), the description covers all essential aspects. An output schema exists, so return values are not needed. The description is thorough and includes source and environment specifics.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds significant value by detailing the meaning of each mode, accepted input formats for the query parameter, and the specific requirement for CIN in mca_filing mode.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides 'Indian capital market intelligence' covering NSE, BSE, and MCA, and enumerates four distinct modes with specific details. This distinguishes it from sibling tools like 'china_market_data' and 'corporate_registry_lookup'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear guidance on when to use each mode, including input formats and sources. It also explains the async parameter for avoiding timeouts. However, it does not explicitly state when not to use or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

industry_classifier_naics_sicB

Read-only

Inspect

Classificateur d'industrie NAICS/SIC/NACE — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: What is the NAICS code for a company that does ? · Give me NAICS + SIC + NACE classification for this company description. · Which industry sector (GICS) does this company belong to for equity analysis? · What HS code applies to products manufactured by this company? · For EU procurement compliance, what NACE Rev. 2 code applies to this company? · Classify this business into NAICS + SIC + ISIC + GICS + NACE + HS with hierarchy and confidence. · I need to segment my ICP list by NAICS 4-digit subsector — classify these company descriptions. Reference case: Helios Cold Chain EU — Freight forwarding maritime réfrigéré · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company_url`	No
`company_name`	No
`company_description`	Yes
`focus_classifications`	No
`primary_revenue_source`	No

Tool Definition Quality

B3.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds value beyond annotations by noting 'returns a structured, audited deliverable' and 'inputs are validated server-side'. Annotations already provide readOnlyHint and openWorldHint, and description does not contradict them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is lengthy with multiple example questions and a reference case that could be condensed. Front-loaded with purpose but includes repetitive and less essential details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers basic purpose and examples but lacks details on output structure, error handling, or async behavior (async only in schema). Without an output schema, more description of the deliverable format would be beneficial.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (only async has a description). The tool description provides examples but fails to explain most parameters' semantics (e.g., company_description, focus_classifications) beyond what the schema shows.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it classifies industries using NAICS/SIC/NACE and other codes, supported by example queries. However, it includes extraneous text ('Gapup agent-payable C-suite expertise (CMO)') and a reference case that dilute focus.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides example queries implying usage, but lacks explicit guidance on when not to use this tool or how it contrasts with sibling tools (e.g., market_sizing, market_entry_strategist).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

infra_blueprint_designerA

Read-only

Inspect

Architecte infra cloud — Gapup agent-payable C-suite expertise (CTO). Returns a structured, audited deliverable. Answers: Design a cloud infrastructure blueprint for a app with expected traffic and requirements. · What is the recommended AWS vs GCP vs Azure architecture for a SaaS multi-tenant app with EU data residency and SOC2? · How should I architect my cloud infra to stay under €5k/month with GDPR compliance and a junior DevOps team? · What cloud services do I need for a with load — compute, DB, cache, CDN, observability? · Give me an end-to-end cloud architecture with scaling plan, security baseline, and IaC tool recommendation. Reference case: Spinora fintech B2B SaaS — saas-multi-tenant · medium load (1k-100k req/d) · eu-west · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`team_size`	No
`expected_load`	Yes
`workload_type`	Yes
`business_context`	No
`cloud_preference`	No
`region_preference`	Yes
`budget_monthly_eur`	No
`compliance_required`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds context: returns a 'structured, audited deliverable' and validates inputs server-side, which aligns with the read-only nature. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately long but structured with a clear purpose, example queries, and a reference case. Every sentence adds value, though some redundancy exists.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

No output schema is provided, so the description should detail the deliverable format. It mentions 'structured, audited deliverable' but lacks specifics on the output structure, leaving some gaps for a tool with moderate complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (11%). The description adds value by giving example inputs and referencing parameters like workload_type, load, compliance, but does not fully explain each of the 9 parameters. It partially compensates but not completely.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool designs cloud infrastructure blueprints with specific verb 'Architecte infra cloud' and resource 'cloud blueprint'. It lists example queries and distinguishes from siblings like cloud_cost_ri_optimizer which focuses on cost optimization.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides example queries and implies usage for cloud architecture design. However, it does not explicitly state when not to use or mention alternative tools, though sibling list shows no direct equivalent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

insurance_coverage_analyzerC

Read-only

Inspect

Analyseur de couvertures d'assurance — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Gapup Hub — 3 polices · €24k prime · Score 58/100 · 3 gaps critiques · RFP template. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`arrEur`	Yes
`sector`	Yes
`objectives`	Yes
`companyName`	Yes
`riskProfile`	Yes
`jurisdiction`	Yes
`employeeCount`	Yes
`currentPolicies`	Yes

Tool Definition Quality

C2.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint and openWorldHint, which are consistent. The description adds that inputs are validated server-side and that it returns a structured deliverable, providing context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short but contains unnecessary marketing fluff ('Gapup agent-payable C-suite expertise') and a specific reference case that may not generalize. It could be more concise and front-loaded with the tool's core function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex input schema (9 parameters, nested objects) and no output schema, the description lacks essential details: parameter formats, error handling, required data formats, and what 'structured deliverable' means. It is insufficient for an agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 11% (only 'async' documented). The description does not explain any required fields, their meaning, or valid values. It merely says 'send the documented case fields', which is insufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it's an insurance coverage analyzer and mentions a deliverable, but the specific action is unclear due to marketing language and lack of a clear verb. The French phrasing and reference case add confusion rather than clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus the numerous sibling tools. The description does not specify use cases, prerequisites, or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

interest_rateA

Read-only

Inspect

Return a precise reference interest rate — the exact figure an agent injects into a treasury, lending, valuation or trading model. Available rates: fed_funds, sofr, us_10y, us_2y, us_3m, ecb_main, euribor_3m. Source: FRED (Federal Reserve Bank of St. Louis). When to use: an agent's computation needs a current benchmark rate as a precise input.

ParametersJSON Schema

Name	Required	Description	Default
`rate`	Yes	Reference rate name
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.

Output Schema

ParametersJSON Schema

Name	Required	Description
`rate`	Yes
`unit`	Yes
`as_of`	Yes
`value`	Yes
`source`	Yes
`series_id`	No
`source_url`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true, which the description does not contradict. The description adds useful behavioral context: it returns a precise figure for model injection, lists exact rates, and names the source (FRED).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences plus a list, front-loaded with purpose. Every sentence earns its place; no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity, presence of output schema, and annotations, the description covers the purpose, available rates, usage context, and source completely.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description lists the enum values for 'rate' and implies the role of 'async', but adds little beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns a precise reference interest rate, lists all available rates (fed_funds, sofr, us_10y, etc.), and specifies the source (FRED). It distinguishes itself from many sibling tools that serve different purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'an agent's computation needs a current benchmark rate as a precise input.' Does not mention alternatives or when not to use, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

internal_communicationC

Read-only

Inspect

Communication interne — Gapup agent-payable C-suite expertise (CHRO). Returns a structured, audited deliverable. Reference case: Cas démo — Communication interne. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`context`	Yes
`audienceSegments`	Yes

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotations already declare readOnlyHint=true and openWorldHint=true. The description adds minimal extra behavior: 'returns a structured, audited deliverable' and 'inputs are validated server-side'. No contradictions with annotations, but little additional insight beyond them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is only two sentences, but it includes French phrases ('Communication interne', 'Cas démo') that may not be universally understood. It is relatively concise but could be more streamlined and front-load critical information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, 5 parameters, no output schema) and low coverage, the description is incomplete. It does not explain what the deliverable contains, how to use the parameters, or how to interpret results. The agent lacks sufficient context for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is very low (20%). The description provides no additional meaning for any parameter (company, context, audienceSegments). It merely says 'send the documented case fields', which is too vague to help the agent understand input semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a structured, audited deliverable related to communication interne and C-suite expertise (CHRO). However, the verb is implicit and the domain is muddled by jargon like 'Gapup agent-payable'. It does not clearly differentiate from siblings, but the name itself suggests internal communication.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description only mentions a reference case and that inputs are validated server-side. There is no hint about prerequisites, exclusions, or context where other tools would be preferable.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

investor_listC

Read-only

Inspect

Liste d'investisseurs + warm intros — Gapup agent-payable C-suite expertise (FUNDRAISING). Returns a structured, audited deliverable. Reference case: Agicap Série D — 25 VCs matchés · Tier A: Balderton/Accel/Partech · Warm intro path chaque investisseur. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`round`	Yes
`company`	Yes
`existingInvestors`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and openWorldHint=true, consistent with a data retrieval tool. Description adds 'returns a structured, audited deliverable' but doesn't detail what gets destroyed or auth needs. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is concise (3 sentences) and front-loaded with purpose. Includes a reference case and validation note. However, mixing French and English slightly reduces clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema and 25% schema coverage, the description should provide more context about expected output format and how inputs are used. The reference case helps but 'documented case fields' is vague. Insufficient for agent to reliably invoke.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 25% (low), and the description does not explain any parameters beyond 'send the documented case fields'. It adds no meaning to the schema's 'company', 'round', 'async', or 'existingInvestors' fields, failing to compensate for low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns a list of investors with warm intros for fundraising, and provides a reference case (Agicap) for context. It distinguishes from siblings like 'investor_shortlist' by emphasizing 'warm intros' and 'audited deliverable', but doesn't explicitly differentiate.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives like 'investor_shortlist' or 'funding_hunter'. Implies it's for fundraising use cases but lacks when-not or exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

investor_shortlistA

Read-only

Inspect

Shortlist d'investisseurs ciblés — Gapup agent-payable C-suite expertise (FUNDRAISING). Returns a structured, audited deliverable. Reference case: Aleph AI — Series B €30M · 60 investisseurs EU/US matchés par stage/thèse · fit score + warm intro path + first message angle. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`round`	Yes
`company`	Yes
`preferences`	Yes

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, so description aligns but adds little: 'Returns a structured, audited deliverable' confirms non-mutation. No additional behavioral traits beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with purpose and includes a useful reference case, but the 4-sentence paragraph could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has nested required objects and no output schema; description mentions deliverable types but does not address async behavior (even though an async parameter exists), leaving potential timeout issues unresolved.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only 'async' described). The description does not explain the other parameters (focus, round, company, preferences) beyond mentioning 'send the documented case fields', leaving semantics unclear.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: generating a targeted investor shortlist for fundraising (FUNDRAISING). It mentions key outputs like fit score, warm intro path, and first message angle, and gives a reference case, distinguishing it from simpler list tools like 'investor_list'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for fundraising but does not explicitly contrast with siblings like 'investor_list' or state when not to use. The context of sibling tools suggests alternatives, but no guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ip_contract_clause_extractorA

Read-onlyIdempotent

Inspect

For CHRO use: analyzes employment contract text to identify and extract IP-related clauses such as invention assignment, confidentiality, non-compete, and patent rights. Returns structured data with clause types, risk levels, and relevant legal context. Ideal for contract review workflows, compliance checks, and IP protection strategy. Sources: USPTO PatFT and EPO Espacenet public datasets. Keywords: employment contract, IP clause, invention assignment, confidentiality agreement, non-compete, patent rights, CHRO tool.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`contractText`	Yes	Full text of the employment contract to analyze
`jurisdiction`	No	Country/state jurisdiction for legal context (e.g., 'US-CA', 'DE')
`includeContext`	No	Whether to include legal context for each clause

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`clauses`	Yes
`sources`	No
`summary`	Yes
`warnings`	Yes

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds that it returns structured data with clause types, risk levels, and legal context, and cites external data sources (USPTO PatFT, EPO Espacenet). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the purpose and target user, and is generally concise. However, it includes a keywords list which adds little value for an AI agent, making it slightly less efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of annotations, output schema, and 100% parameter schema coverage, the description adequately covers purpose, use cases, sources, and output structure. It could be more explicit about return values but is sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description does not elaborate on parameter usage beyond schema, but it does mention output includes legal context which relates to the includeContext parameter. No additional parameter semantics are provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it analyzes employment contract text to extract IP-related clauses like invention assignment, confidentiality, and non-compete. It specifies the target user ('For CHRO use') and distinguishes from sibling tools like legal_clause_extractor by focusing on IP clauses.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear use cases ('contract review workflows, compliance checks, IP protection strategy') and implies context (CHRO use). However, it does not explicitly mention when to avoid using it or name alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ip_employee_invention_trackerA

Read-onlyIdempotent

Inspect

For CHROs: tracks employee patent filings and flags unassigned inventions. Input employee name or ID to retrieve their patent applications from USPTO and WIPO databases. Returns list of inventions with assignment status, filing dates, and potential ownership gaps. Useful for IP audits, inventor onboarding, and compliance checks. Keywords: patents, IP ownership, employee inventions, USPTO, WIPO.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`endDate`	No	Filter patents filed before this date (YYYY-MM-DD)
`startDate`	No	Filter patents filed after this date (YYYY-MM-DD)
`employeeId`	No	Internal employee ID (optional if name provided)
`companyName`	Yes	Exact legal name of company for assignment check
`employeeName`	Yes	Full name of employee to track (e.g., 'John Doe')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`patents`	Yes
`sources`	No
`warnings`	Yes
`employeeId`	No
`companyName`	Yes
`employeeName`	Yes
`totalPatents`	Yes
`unassignedPatents`	Yes

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, and idempotentHint. The description adds value by specifying that the tool retrieves data from external databases (USPTO, WIPO) and outputs ownership gaps, which goes beyond the safety profile in annotations. No contradictions exist.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a concise paragraph of 4 sentences, front-loaded with the target user and purpose. It is well-structured, covering action, input, output, and use cases. While effective, it could be slightly more structured with bullet points for clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, output schema exists), the description adequately covers purpose, input, output categories, and use cases. It does not mention prerequisites or error handling, but annotations and schema fill most gaps, making it sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with descriptions for all 6 parameters. The description mentions 'employee name or ID' and 'companyName', but does not add significant new meaning beyond the schema's descriptions. With full schema coverage, baseline is 3, and the description does not elevate it.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool tracks employee patent filings and flags unassigned inventions, targeting CHROs. It specifies the input (employee name or ID), databases (USPTO, WIPO), and output (list with assignment status, filing dates, and ownership gaps), distinguishing it from sibling tools like patent_landscape or patent_ownership_audit by focusing on individual employee-level tracking.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives clear usage context (CHROs, IP audits, inventor onboarding, compliance checks) and implies the tool is for employee-specific patent tracking. However, it lacks explicit exclusions or comparisons to alternative sibling tools, such as patent_ownership_audit, which might handle broader ownership audits.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ip_protection_pilotC

Read-only

Inspect

Pilote de protection IP — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Carbios SA — Deeptech FR recyclage PET enzymatique · 14 brevets EP/US/FR · 5 concurrents · licensing €2-8M potentiel. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`competitors`	Yes
`targetMarkets`	Yes
`patentPortfolioSummary`	Yes

Tool Definition Quality

C2.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. Description adds 'returns a structured, audited deliverable' and 'inputs validated server-side', which is mildly helpful but doesn't disclose risk or cost implications.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is relatively short but includes a long example case and mixed French/English. Structure could be improved by front-loading the purpose and separating the example.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 6 parameters, 4 required, and no output schema, the description should explain output format and parameter roles. Only a brief example and no mention of async behavior or focus usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (only async described). Description does not explain the meaning of company, patentPortfolioSummary, targetMarkets, competitors, or focus parameters beyond their names and constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a 'structured, audited deliverable' for IP protection pilot, but lacks a specific verb-action (e.g., 'analyze', 'create report'). The example case provides context but the core purpose remains vague.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus siblings like patent_landscape or ip_contract_clause_extractor. Only mentions inputs are validated server-side, no prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

jailbreak_attempt_detectorA

Read-onlyIdempotent

Inspect

Detects potential LLM jailbreak attempts by analyzing user input against NIST AI Risk Management Framework adversarial patterns. Designed for persona risk assessment, this tool evaluates text for common jailbreak techniques such as prompt injection, role-playing, or obfuscation. Inputs include the user message and optional context, returning a risk assessment with confidence scores and pattern matches. Ideal for real-time moderation in chat applications or API gateways.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`context`	No	Optional conversation context for better pattern matching
`message`	Yes	User input text to analyze for jailbreak attempts
`threshold`	No	Confidence threshold for flagging attempts

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`riskScore`	No	Confidence score of jailbreak attempt
`patternsMatched`	No	List of detected adversarial patterns
`isJailbreakAttempt`	No	Whether the input exceeds the risk threshold

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds that it analyzes text and returns a risk assessment with confidence scores and pattern matches, which aligns with and complements the annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loading the purpose and key details. Every sentence is essential, with no filler or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Annotations and output schema (mentioned) cover safety and idempotency. The description provides sufficient context for inputs, outputs, and ideal use case, making the tool self-contained for agent understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all four parameters. The description mentions 'user message and optional context' but does not detail 'async' or 'threshold', which are already described in the schema. The description adds marginal value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool detects LLM jailbreak attempts using NIST AI RMF patterns, specifies techniques like prompt injection, and targets persona risk assessment. This distinguishes it from sibling tools like 'adversarial_input_stress_tester' or 'safety_guardrail_breach_analyzer'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises real-time moderation in chat applications or API gateways, but does not explicitly exclude other uses or mention alternatives. Given the sibling list includes related security tools, the guidance is adequate but could be more directive.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_postings_intelligenceA

Read-only

Inspect

Agrégation d'offres d'emploi publiques pour inférer les tendances de recrutement. Trois modes : (1) company_hiring — analyse des postings d'une société : volume, fonctions (engineering/sales/marketing/ops/finance/hr), seniorité, géographie, croissance vs période précédente, signaux stratégiques inférés ; (2) role_market — volume marché global pour un rôle (open positions estimate, top employeurs, compétences demandées, médiane seniorité) ; (3) competitor_hiring_comparison — comparaison multi-sociétés (total postings, growth%, focus areas). Sources : Adzuna (ADZUNA_APP_ID/KEY env), RemoteOK (keyless), Himalayas (keyless), baseline statique 40 top employeurs. Usages : due diligence VC, intelligence compétitive, benchmarks RH, signaux pivots stratégiques. Cache 6h. SLA ≤15s.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Mode d'analyse : 'company_hiring' \| 'role_market' \| 'competitor_hiring_comparison'
`role`	No	Intitulé de poste à analyser (pour role_market, ex. 'data scientist', 'compliance officer')
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	No	Nom de la société (pour company_hiring ou comme 1er concurrent)
`location`	No	Pays ou ville (ex. 'France', 'United States', 'London')
`competitors`	No	Liste de sociétés à comparer (pour competitor_hiring_comparison, min 2)
`period_days`	No	Fenêtre d'analyse en jours (défaut 30)

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`sources`	Yes
`role_market`	No
`quality_score`	Yes
`company_hiring`	No
`competitor_comparison`	No

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint=true, destructiveHint=false) already indicate safe read operation. Description adds valuable behavioral details: data sources with auth requirements, 6-hour cache, ≤15s SLA, and three distinct analysis modes. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is well-structured with bullet-style listing of modes and uses. It is somewhat lengthy but every sentence adds value and it is front-loaded with the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, description covers all necessary context: data sources, caching, SLA, mode-specific behaviors, and explicit use cases. No gaps for a tool of this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with each parameter described. The description adds context for the 'mode' parameter by explaining what each mode does, but does not add significant meaning beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states aggregation of public job postings to infer recruitment trends, and details three distinct modes (company_hiring, role_market, competitor_hiring_comparison) with specific outputs. This clearly distinguishes it from any sibling tool.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly lists use cases: due diligence, competitive intelligence, HR benchmarks, strategic pivots. While it doesn't explicitly state when not to use, the detailed mode descriptions provide clear context for appropriate usage.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

job_resultA

Read-onlyIdempotent

Inspect

Poll the result of any tool called with async:true. Returns status=pending while running, status=completed with the full result once done, status=failed on error, or status=not_found if the job_id is unknown or expired (TTL 24h).

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	Yes	The job_id returned by an async tool call

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations: it explains possible statuses (pending, completed, failed, not_found) and TTL (24h). Annotations already indicate readOnlyHint=true and idempotentHint=true, so no contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that lists all possible outcomes. It is highly concise with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description sufficiently covers the tool's behavior: polling with statuses and TTL. It is complete for a simple polling tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

There is one parameter (job_id). The input schema description already states 'The job_id returned by an async tool call.' The tool description does not add new information beyond what the schema provides. Schema coverage is 100%, so a baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Poll the result of any tool called with async:true.' It uses a specific verb (poll) and resource (result), distinguishing it from sibling tools that are also async result retrievers (e.g., ai_governance_full_report_result) by being generic.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage after an async call ('any tool called with async:true'). It does not explicitly list when not to use, but the context is clear. No alternative tools are mentioned, but the role as a generic poller is well-defined.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

knowledge_base_autoC

Read-only

Inspect

Base de connaissance automatique — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Klarna — knowledge base auto · Slack+Notion+Drive · 12 articles seed + structure 8 catégories. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`sources`	Yes
`topPainPoints`	Yes

Tool Definition Quality

C2.2/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint=true and openWorldHint=true, but description says 'returns a structured, audited deliverable' and 'inputs are validated server-side'. It does not clarify if the tool stores state, impacts external systems, or requires specific permissions. The readOnlyHint suggests no mutation, which aligns with 'returns', but the creation of a knowledge base might imply side effects, causing ambiguity.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is relatively short (3 sentences) but includes unnecessary jargon ('Gapup agent-payable C-suite expertise') and a detailed case reference that may be more verbose than needed. Core purpose is front-loaded, but efficiency could be improved.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 5 parameters including nested objects, no output schema, and sparse annotations, the description fails to explain the deliverable structure, input constraints, or output format. The case reference provides a minimal example but does not achieve completeness for effective agent invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only 'async' described). The description adds no parameter information beyond 'send the documented case fields', which is vague. Required parameters like 'company', 'sources', 'topPainPoints' lack any semantic hints about format or content.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a structured audited deliverable and references a use case (Klarna knowledge base auto), but the core action of automatically building a knowledge base is implied rather than explicitly stated. The phrase 'Gapup agent-payable C-suite expertise (COO)' is jargon-heavy and does not clearly convey purpose. It does not distinguish from many sibling tools like 'discovery_prep' or 'content_catalog'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The reference case (Klarna) hints at a scenario but does not state prerequisites, when not to use, or alternative tools. Context signals show many siblings, but description provides no differentiation criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kyc_screenerC

Read-only

Inspect

Screening KYC / AML / Sanctions — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Q4 2026 onboarding — 8 entités (UBO chain LLC + SPV offshore), sanctions/PEP/adverse media. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entities`	Yes
`riskAppetite`	Yes		standard
`screeningScope`	Yes
`onboardingPacket`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, confirming no state modification. The description adds that inputs are validated server-side and returns an audited deliverable, but does not disclose additional behavioral traits like rate limits or auth needs.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short (two sentences plus a reference case) but includes jargon like 'Gapup agent-payable C-suite expertise (RISK)' which may confuse. It could be clearer and more front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex nested schema (5 parameters, no output schema) and low schema coverage, the description is insufficient. It lacks details on output format, async usage, and differentiation from many similar sibling tools.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 20% schema description coverage, the description should compensate but only vaguely references 'documented case fields'. It does not explain the meaning or usage of key parameters like entities, screeningScope, or riskAppetite.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs KYC/AML/sanctions screening and returns a structured deliverable. It references a specific case, but does not differentiate from sibling tools like kyc_screener_batch.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool vs alternatives. While the description mentions 'Gapup agent-payable C-suite expertise (RISK)', it does not compare with other risk-related siblings or provide exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kyc_screener_batchA

Read-only

Inspect

Async batch variant of kyc_screener. Accepts 1-100 names and returns immediately (<300ms) with a job_id. The screening runs in the background (up to 10 parallel KYC calls). Poll the result with kyc_screener_batch_result(job_id) after the eta_seconds hint. Each entry can specify name, type (person/company/any), and an optional birthdate hint. Use for bulk client onboarding, UBO list screening, or periodic AML refresh batches. Async tool — register a webhook via webhooks_manage(register, url, [job.completed]) to receive callbacks instead of polling. Faster + lighter.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`names`	Yes	List of entities to screen (1-100). Each entry requires at minimum a name.

Output Schema

ParametersJSON Schema

Name	Required	Description
`job_id`	Yes	Unique job identifier — pass to kyc_screener_batch_result
`status`	Yes
`batch_size`	Yes	Number of names queued for screening
`eta_seconds`	Yes	Estimated seconds until result is ready
`submitted_at`	Yes	ISO-8601 submission timestamp

Tool Definition Quality

A4.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false. The description adds behavioral details: returns in <300ms, runs up to 10 parallel KYC calls, provides eta_seconds hint, and mentions webhook registration. No contradiction exists.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with core functionality. It uses clear sentences but could be slightly more concise. It includes essential details without being overly verbose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For an async batch tool with polling and webhook options, the description covers input limits, response time, parallelism, polling, webhook registration, and use cases. It references sibling tools for polling and webhook setup. Output schema exists, so return value details are not required.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline 3 applies. The description reiterates parameter capabilities (name, type, birthdate) but does not add significant meaning beyond the schema. It mentions the batch size (1-100) which is already in the schema's minItems/maxItems.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as an async batch variant of kyc_screener, specifying input (1-100 names), immediate return with job_id, and background screening. It distinguishes from sibling tools by naming the result polling tool and webhooks management.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit use cases are listed: bulk client onboarding, UBO list screening, periodic AML refresh batches. It also advises on when to use webhooks vs polling, and compares to the synchronous variant ('Faster + lighter').

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

kyc_screener_batch_resultA

Read-onlyIdempotent

Inspect

Poll the result of a kyc_screener_batch job. Returns status=pending while running, status=completed with the full array of KYC results once done, status=failed on error, or status=not_found if the job_id is unknown or expired (TTL 24h). Call this after the eta_seconds hint returned by kyc_screener_batch.

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	Yes	The job_id returned by kyc_screener_batch (prefix: kycb_)

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, idempotent, non-destructive. Description adds polling behavior, status transitions (pending/completed/failed/not_found), and TTL of 24h, which is valuable beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with verb 'poll', efficient and no fluff. Every sentence adds essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a polling tool, description covers what, when, possible statuses, TTL, and error cases. Output schema exists, so return values are not needed in description. Complete and actionable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema describes job_id with 100% coverage. Description adds the prefix 'kycb_' detail, providing minor extra value. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool polls the result of a kyc_screener_batch job, listing all possible statuses. It distinguishes from siblings 'kyc_screener_batch' (initiation) and 'kyc_screener' (individual).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises 'Call this after the eta_seconds hint returned by kyc_screener_batch', providing clear when-to-use guidance. Does not explicitly state when not to use, but context is sufficient.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

labor_law_alert_geoA

Read-onlyIdempotent

Inspect

Provides CHROs with daily alerts on new labor law changes by jurisdiction (state/country). Inputs include jurisdiction (ISO country/state code) and optional date range. Outputs structured legislative updates with summaries, effective dates, and source links. Useful for compliance monitoring, risk assessment, and policy adjustments. Keywords: labor law, compliance, legislation, jurisdiction, CHRO, HR policy.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`since`	No	Optional start date for changes (YYYY-MM-DD). Defaults to 7 days ago.
`until`	No	Optional end date for changes (YYYY-MM-DD). Defaults to today.
`jurisdiction`	Yes	ISO 3166-1 alpha-2 country code or ISO 3166-2 state/province code (e.g., 'US-CA', 'FR')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`changes`	Yes
`sources`	Yes
`warnings`	Yes
`last_updated`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint, openWorldHint, and idempotentHint, indicating safe, non-destructive behavior. The description adds that outputs are structured with summaries, effective dates, and source links, providing additional behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the main purpose, but includes a trailing list of keywords that may be redundant. It is reasonably concise and well-structured for its content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers core purpose, inputs, and outputs. With an output schema present, the return value explanation is sufficient. It may lack some edge-case details, but is complete enough for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are fully described in the schema. The description adds minor context about jurisdiction format and date range defaults, but does not significantly enhance understanding beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides daily alerts on new labor law changes by jurisdiction, specifying inputs (jurisdiction ISO code, optional date range) and outputs (structured legislative updates). It distinguishes from siblings by its specific focus on labor law compliance monitoring.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions usefulness for compliance monitoring, risk assessment, and policy adjustments, providing context for when to use it. However, it does not explicitly state when not to use or name alternative tools, so it lacks explicit exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ld_architectC

Read-only

Inspect

Architecte formation & développement — Gapup agent-payable C-suite expertise (CHRO). Returns a structured, audited deliverable. Reference case: Pennylane (180 FTE) — Catalogue 8 formations · 3 parcours individuels · ROI €480k · Payback 7 mois. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`team`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`budget`	Yes
`company`	Yes
`learningNeeds`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, so the description's mention of 'returns a structured, audited deliverable' adds minimal behavioral context. The note about server-side validation is useful but not comprehensive. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with purpose, then a case example, then a validation note. It is efficient with no filler, though the structure could be improved by grouping related info.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 5 parameters (nested objects) and no output schema, the description is insufficient. It does not clarify what the 'structured, audited deliverable' contains or how to interpret results, leaving significant gaps for an AI agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only 'async' has a description). The tool description does not explain the meaning or usage of parameters beyond the schema, failing to compensate for the low coverage. The reference to 'case fields' is vague.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is an L&D architect tool for building training programs, with a specific target (CHRO). It gives a reference case and mentions a structured deliverable. However, it does not explicitly differentiate from sibling tools like lnd_ai_skill_forecast or lnd_skill_taxonomy_builder.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'Inputs are validated server-side — send the documented case fields' but provides no guidance on when to use this tool vs alternatives. There is no mention of prerequisites, exclusions, or alternative tools despite many L&D-related siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lead_magnetsC

Read-only

Inspect

Aimants à leads — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Spendesk — Guide trésorerie startup SaaS B2B FR/EU (2024). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`icp`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`brand`	Yes
`leadMagnet`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and openWorldHint=true. The description adds that the deliverable is structured and audited, but does not disclose additional behavioral traits like whether the tool has side effects or needs permissions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short but the reference case takes space. It could be more concise if the reference case were omitted or integrated.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Even with strong annotations, the description fails to specify the output structure or how to use the nested objects. For a tool with 4 parameters including complex objects, the description is too brief.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (only async is described). The description does not explain the meaning or usage of the three required objects (icp, brand, leadMagnet) or their subfields, leaving the agent with minimal guidance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description indicates it generates lead magnets for C-suite expertise, with a reference case. It clearly states it returns a structured deliverable, but lacks a specific action verb like 'generates' or 'creates'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs. sibling tools. No mention of prerequisites or when not to use it. The reference case is an example but not a usage rule.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

legal_clause_extractorA

Read-onlyIdempotent

Inspect

Structured extraction of clauses, obligations and deadlines from legal documents (SaaS contracts, NDAs, employment agreements, loan agreements, leases, M&A deals, IP licences). Complements contract_risk_scanner with granular per-clause output.

ICP: legal ops, M&A lawyers, paralegals, contract managers, compliance officers.

Capabilities: • Auto-detects document type (7 types) and language (EN/FR/DE/ES/PT) • Extracts parties with roles (buyer, seller, licensor, employee, etc.) • Splits document into sections and classifies 16+ clause types • Per-clause: 20 obligation patterns (EN/FR/DE), 10 deadline patterns, 18 risk detectors • Document-level: red flags (liability cap, auto-renewal, IP overreach, etc.), missing clauses per doc type • Global deadline calendar with P0/P1/P2 severity • Cross-reference map between sections • Cache: 7 days (legal docs stable once provided)

100% pure compute — no external fetch required. Accepts 10k–100k char documents.

ParametersJSON Schema

Name	Required	Description
`lang`	No	Optional. Language hint (e.g. 'en', 'fr', 'de'). Defaults to auto-detection.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`document_text`	Yes	Full text of the legal document (10k–100k chars typical). Plain text or lightly HTML-formatted. EN/FR/DE/ES/PT supported.
`document_type`	No	Optional. Document type hint. Defaults to auto-detection. Use "auto" or omit to let the tool detect from content.
`target_clauses`	No	Optional. Filter extraction to specific clause types. E.g. ["term", "termination", "liability", "ip", "confidentiality", "governing_law", "indemnification"]. If omitted or empty, all clauses are extracted.

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`red_flags`	Yes
`word_count`	Yes
`jurisdiction`	No
`governing_law`	No
`lang_detected`	Yes
`quality_score`	Yes
`effective_date`	No
`cross_references`	Yes
`parties_detected`	Yes
`clauses_extracted`	Yes
`key_deadlines_global`	Yes
`document_type_detected`	Yes
`missing_clauses_expected`	Yes

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations: it states the tool is '100% pure compute — no external fetch required', mentions auto-detection of document type and language, caching duration (7 days), and async support via parameter. These details align with annotations (readOnlyHint, idempotentHint) and provide a clear operational picture.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a front-loaded purpose statement, followed by ICP in a short sentence, then a bullet-point list of capabilities. Every sentence provides valuable information without redundancy, making it both concise and informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multi-language, multi-document-type, numerous extraction patterns), the description covers all essential aspects: document types, languages, capabilities, caching, async mode, and computational purity. An output schema exists, so return value details are not needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description does not add new meaning beyond the schema; it merely echoes constraints (e.g., document length, supported languages) already present in parameter descriptions. It adds no additional semantics to the parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with a specific verb-resource combination: 'Structured extraction of clauses, obligations and deadlines from legal documents'. It lists document types and explicitly distinguishes from a sibling tool (contract_risk_scanner) by noting it provides 'granular per-clause output', making its unique purpose clear.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description identifies the target ICP (legal ops, M&A lawyers, etc.) and mentions it complements contract_risk_scanner, giving implicit guidance on when to use it. However, it does not explicitly state when not to use it or list alternatives, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lgpd_data_subject_rights_automatorB

Read-onlyIdempotent

Inspect

Automates LGPD Data Subject Access Requests (DSARs) for legal teams, handling Brazil-specific data retention, erasure, and access workflows. Accepts user identifiers, request type (access/rectification/deletion), and optional scope filters. Returns structured response with compliance status, warnings, and source references to Brazilian LGPD and CNIL decisions.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`scope`	No	Optional list of data categories to limit the request
`urgency`	No	Priority level for processing
`requestType`	Yes	Type of LGPD request
`userIdentifier`	Yes	CPF, email, or other unique identifier for the data subject

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`dataCategories`	No
`erasureDeadline`	No
`complianceStatus`	No
`retentionPeriodDays`	No

Tool Definition Quality

B3.2/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description claims the tool handles deletion and rectification (write operations), but annotations include readOnlyHint: true, which indicates read-only behavior. This is a direct contradiction. Additionally, openWorldHint and idempotentHint are not explained in the description.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, front-loaded with purpose and key details. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having an output schema, the description lacks crucial behavioral context (read-only contradiction, external calls from openWorldHint, idempotency implications). For a tool handling data subject rights, this is a significant omission.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline of 3 is appropriate. Description adds minimal extra meaning beyond paraphrasing parameter names and types, but does not enrich understanding of values or usage nuances.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it automates LGPD DSARs for legal teams, handling Brazil-specific workflows. This distinguishes it from many sibling tools, which cover diverse domains.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Specifies inputs (user identifiers, request type, scope filters) and output (compliance status, warnings, references), but provides no guidance on when to use this tool versus alternatives like privacy_compliance_audit or dpdp_consent_artifact_generator.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lnd_ai_skill_forecastA

Read-onlyIdempotent

Inspect

Forecasts AI skill demand trends for CHROs by analyzing patent filings (USPTO PatFT) and job postings (BLS API). Returns 12-month skill demand projections with confidence scores, helping HR leaders prioritize workforce upskilling. Inputs: target AI skills (e.g., 'machine learning', 'NLP'), geographic focus (US state/country), and forecast horizon. Outputs include skill growth rates, patent filing trends, and job posting volumes. Keywords: AI workforce planning, skill gap analysis, talent strategy, patent trends, labor market data.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	Yes	Geographic focus (US state code or 'US' for national, e.g., 'CA', 'US')
`skills`	Yes	List of AI-related skills to forecast (e.g., ['machine learning', 'computer vision'])
`horizon_months`	No	Forecast horizon in months (3-24, default 12)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`forecast`	No
`metadata`	No
`warnings`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint, idempotentHint, and openWorldHint. The description adds behavioral context by specifying it returns growth rates, patent filing trends, and job posting volumes, and uses confidence scores. It does not contradict annotations and provides additional detail beyond them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately long but well-structured, starting with the core function and then detailing outputs, inputs, and keywords. It is front-loaded with the primary purpose, but some redundancy exists in repeating input details already in the schema. Overall concise for the information conveyed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown) and 100% schema coverage, the description is quite complete, covering purpose, data sources, inputs, outputs, and keywords. It does not address rate limits or authentication, but annotations provide some context. The description adequately supports an AI agent in understanding the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and all parameters have descriptions. The description restates these in natural language ('target AI skills', 'geographic focus', 'forecast horizon') but does not add significant semantic value beyond what the schema provides. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool forecasts AI skill demand trends for CHROs, using specific data sources (patent filings, job postings) and providing 12-month projections with confidence scores. It distinguishes itself from siblings like job_postings_intelligence and patent_landscape by focusing on forecasting and skill demand specifically.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implicitly suggests usage for workforce planning and skill gap analysis but lacks explicit guidance on when to use this tool versus alternatives like job_postings_intelligence or lnd_skill_taxonomy_builder. No when-not-to-use or specific context is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lnd_skill_taxonomy_builderA

Read-onlyIdempotent

Inspect

Generates a dynamic skill taxonomy for CHROs by cross-referencing patent filings (USPTO), job postings (BLS), and learning & development data (OECD). Inputs include industry codes, job roles, or skill clusters; outputs structured skill hierarchies with demand trends and competency gaps. Essential for workforce transformation, talent pipeline optimization, and future-proofing organizational capabilities. — pass async:true REQUIRED to avoid x402 timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`jobRole`	No	Target job role or occupation (e.g., 'Data Scientist')
`industry`	Yes	NAICS industry code or sector name (e.g., '541511' for IT services)
`timeRange`	No	Time range for trend analysis
`skillCluster`	No	Optional skill cluster to focus taxonomy (e.g., 'AI/ML')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`skillTaxonomy`	No
`industryTrends`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and idempotentHint, ensuring safety. Description adds crucial behavioral context: risk of x402 timeout and required async:true parameter. This exceeds what annotations provide, though return format is not detailed (output schema covers that).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is three sentences plus a critical note, front-loading purpose and data sources. Every sentence adds value, though minor redundancy ('Essential for workforce transformation...') could be tightened. Overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema, the description adequately covers sources, inputs, outputs, and the async requirement. It doesn't detail the output structure, which is appropriate as the schema handles that. For a multi-parameter tool, it's sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are already documented. Description reiterates the types of inputs (industry codes, job roles, skill clusters) but adds no new meaning beyond the schema. The imperative note on async adds value but does not enhance parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states the tool generates a dynamic skill taxonomy for CHROs by cross-referencing USPTO, BLS, and OECD data. Specifies inputs (industry codes, job roles, skill clusters) and outputs (skill hierarchies with demand trends and competency gaps), distinguishing it from similar sibling tools like lnd_ai_skill_forecast.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides a use case (workforce transformation, talent pipeline optimization) but lacks explicit when-to-use or when-not-to-use instructions relative to siblings. No mention of alternatives or exclusions, which would improve guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

logistics_esg_incident_trackerA

Read-onlyIdempotent

Inspect

Tracks real-time ESG incidents in logistics networks for COOs, including supply chain disruptions, regulatory violations, and sustainability risks. Inputs: geographic region, incident type (e.g., emissions, labor, deforestation), and time range. Outputs: structured incident data with severity, location, and source verification. Uses CDP open data and UNCTAD STAT for comprehensive coverage. Keywords: ESG, logistics, supply chain, sustainability, compliance, risk management.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	Yes	Geographic region filter (e.g., 'Europe', 'Asia', 'Global')
`endDate`	No	End date for incident search (ISO 8601)
`severity`	No	Minimum severity level to include
`startDate`	No	Start date for incident search (ISO 8601)
`incidentType`	Yes	Type of ESG incident to track

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`summary`	No
`warnings`	No
`incidents`	No

Tool Definition Quality

A3.6/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds behavioral context by mentioning real-time tracking, data sources (CDP, UNCTAD STAT), and output structure (severity, location, source verification), enhancing transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with the main purpose and is concise (4 sentences). It includes examples and keywords without unnecessary repetition, earning its length.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 6 parameters (2 required) and an output schema, the description covers inputs, outputs, data sources, and target user. It complements the schema well, providing a complete understanding for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds some context by mentioning inputs (region, incident type, time range) but does not cover all parameters (e.g., async, severity). The added value is limited.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool tracks real-time ESG incidents in logistics networks, specifying the target user (COOs) and providing examples of incidents. It distinguishes from sibling ESG tools by focusing on logistics, but lacks explicit differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives or when not to use it. The description only covers what the tool does, not the context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ma_arbitrage_hunterA

Read-onlyIdempotent

Inspect

As a CFO, identify cross-border M&A arbitrage opportunities by comparing target company valuations across different jurisdictions. Inputs include target company ticker, primary and secondary jurisdictions, and valuation metrics. Outputs include valuation gaps, FX-adjusted multiples, and jurisdiction-specific premiums/discounts. Uses real-time ECB FX rates, Yahoo Finance market data, and SEC EDGAR filings for public companies. Ideal for quick assessment of potential arbitrage in M&A scenarios.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`sector`	No	Industry sector for peer comparison (e.g., 'Technology')
`targetTicker`	Yes	Target company ticker symbol (e.g., 'AAPL')
`valuationMetric`	No	Valuation multiple to use for comparison
`primaryJurisdiction`	Yes	Primary jurisdiction for valuation comparison (e.g., 'US')
`secondaryJurisdiction`	No	Secondary jurisdiction for valuation comparison (e.g., 'DE')

Output Schema

ParametersJSON Schema

Name	Required	Description
`fxRate`	No
`status`	Yes
`sources`	No
`warnings`	No
`valuationGap`	No
`peerMultiples`	No
`targetCompany`	No
`primaryValuation`	No
`secondaryValuation`	No
`jurisdictionPremium`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, idempotentHint, openWorldHint), the description adds behavioral context: uses real-time ECB FX rates, Yahoo Finance, and SEC EDGAR, and lists output types. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (4 sentences), with no redundancies. It front-loads the core purpose and logically flows to inputs, outputs, and sources.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, output schema exists), the description covers all essentials: user, purpose, inputs, outputs, data sources, and use case context. Output schema handles return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description summarizes inputs generally but does not add detail beyond schema descriptions for individual parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'identify cross-border M&A arbitrage opportunities by comparing target company valuations across different jurisdictions'. It specifies the user role (CFO), inputs, outputs, and data sources, making the purpose highly specific and actionable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes 'Ideal for quick assessment of potential arbitrage in M&A scenarios', which implies when to use. However, it does not explicitly exclude other contexts or mention alternatives like ma_deal_screener, missing the full guidance spectrum.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ma_deal_screenerB

Read-only

Inspect

M&A Deal Screener — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: Salesforce M&A targets — 12 cibles screened · fit score + valuation + integration risk. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`acquirer`	Yes
`criteria`	Yes

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds that it returns a structured deliverable and validates inputs, which is consistent. No contradictions, but no additional behavioral details beyond what annotations suggest.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise but includes jargon ('Gapup agent-payable C-suite expertise (CSO)') that adds noise. Front-loading is acceptable, but the extra phrase reduces clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex schema with nested objects and no output schema, the description lacks crucial details about output structure and parameter semantics. It does not sufficiently compensate for the schema's low coverage, leaving gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25%, and the description does not explain the parameters (acquirer, criteria, focus) beyond 'send the documented case fields'. It fails to clarify the nested structures or the purpose of each field, leaving the agent underinformed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is an M&A Deal Screener that returns a structured deliverable with fit score, valuation, and integration risk, and provides a reference case. However, it does not differentiate from sibling tools such as 'ma_arbitrage_hunter' or 'ma_tax_efficiency_mapper', which could confuse the agent.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for M&A deal screening and mentions server-side validation, but lacks explicit guidance on when to use or not use this tool versus alternatives. No exclusions or alternative tools are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

manufacturing_esg_compliance_mapperA

Read-onlyIdempotent

Inspect

As a COO, quickly identify ESG compliance gaps across manufacturing facilities using EPA TRI emissions data and GRI sustainability standards. Input facility identifiers or geographic regions to receive a prioritized remediation roadmap with risk scores, regulatory violations, and suggested corrective actions. Ideal for sustainability reporting, regulatory risk assessment, and operational improvement planning. Keywords: ESG compliance, manufacturing facilities, EPA TRI, GRI standards, sustainability reporting, regulatory risk.

ParametersJSON Schema

Name	Required	Description
`year`	No	Reporting year (default: current year - 1)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	No	Geographic region (state, county, or ZIP code) for facility search
`includeGri`	No	Include GRI standards analysis (default: true)
`facilityIds`	Yes	List of EPA facility identifiers (e.g., TRIFID)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`summary`	No
`warnings`	Yes
`facilities`	Yes

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint, openWorldHint, idempotentHint, covering safety and repeatability. Description adds async behavior context (via parameter) but no other behavioral traits beyond what annotations provide. With annotations present, a score of 3 is appropriate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single paragraph of 4 sentences, front-loaded with purpose. No redundant or filler content. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Mentions output (remediation roadmap, risk scores) and input types. With output schema present, description does not need to detail return format. Could mention data freshness or limitations, but overall sufficient for a complex mapping tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

100% schema coverage with parameter descriptions. The tool description adds general context (e.g., using EPA TRI) but does not explain parameter-specific semantics beyond what schema provides. Baseline 3 is correct.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool identifies ESG compliance gaps across manufacturing facilities using EPA TRI and GRI standards, distinguishing it from siblings like supplier_esg_audit or climate_roadmap. Mentions input (facility IDs/regions) and output (remediation roadmap with risk scores).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Targets COOs for sustainability reporting and regulatory risk assessment. Provides context and keywords but lacks explicit when-not-to-use or alternative tools. Sibling list includes related ESG tools, but the description itself does not guide selection away from them.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

manufacturing_waste_heatmapA

Read-onlyIdempotent

Inspect

Generates manufacturing waste heatmaps for COOs using EPA TRI and FAOSTAT data. Input manufacturing site identifiers or geographic regions to analyze waste streams, emissions, and resource inefficiencies. Outputs include waste intensity maps, circular economy opportunity rankings, and cost-saving potential. Ideal for sustainability strategy and operational efficiency improvements. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`year`	Yes	Analysis year (2010-2023)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	No	Geographic region (country code or sub-national region) for aggregated analysis
`site_ids`	No	List of manufacturing site identifiers (EPA TRI IDs or FAO facility codes)
`waste_types`	No	Specific waste types to analyze (e.g., ['metals', 'chemicals', 'energy'])

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`heatmap_data`	No
`opportunities`	No
`benchmark_data`	No

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, and idempotentHint, assuring safe and idempotent behavior. The description adds value beyond these by mentioning the use of EPA TRI and FAOSTAT data, output specifics, and the suggestion to use async to avoid timeouts, implying potential slowness. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at four sentences, front-loaded with the main action. Each sentence contributes essential information: purpose, inputs, outputs, ideal use case, and async tip. No redundant or wasteful content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 5 parameters (1 required), high schema coverage, and an output schema, the description covers purpose, inputs, outputs, and usage context. It lacks details about data source limitations or how to interpret outputs, but the presence of an output schema mitigates this. Overall, it is complete for most use cases.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so all parameters are documented in the schema. The description adds minimal extra meaning by grouping parameters into 'site identifiers or geographic regions' and mentioning waste types implicitly. Since schema_description_coverage is high, baseline is 3, and the description does not significantly enhance understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool generates manufacturing waste heatmaps using EPA TRI and FAOSTAT data, targeting COOs. It specifies inputs (site identifiers or geographic regions) and outputs (waste intensity maps, circular economy opportunity rankings, cost-saving potential), clearly distinguishing it from the sibling tool 'manufacturing_esg_compliance_mapper' which likely focuses on compliance. The verb 'generates' combined with the resource 'waste heatmaps' is specific.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates when to use the tool: for sustainability strategy and operational efficiency improvements, and advises using async:true to avoid timeout. However, it does not explicitly state when not to use it or directly compare to alternatives like 'manufacturing_esg_compliance_mapper', leaving some ambiguity. The guidance is clear but lacking exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

margin_doctorC

Read-only

Inspect

Marge par deal — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — 8 deals pipeline · €28k ARR sous-marge détecté · Récupération €4.2k/an · Playbook 4 scénarios. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`deals`	Yes
`company`	Yes
`product`	Yes

Tool Definition Quality

C2.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, indicating no state changes. The description adds that inputs are validated server-side and a deliverable is returned, but does not elaborate on what the deliverable contains, processing time, or other behavioral traits. Minimal added value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but mixes languages and jargon (e.g., 'Gapup agent-payable C-suite expertise'). It includes a reference case that adds length without clarity. It is passable in conciseness but could be more structured and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (4 parameters with nested objects, array, no output schema), the description is insufficient. It does not explain return format, error handling, or how to use results. The reference case provides a hint but does not cover the full scope of inputs and outputs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (only async parameter has a description). The description does not explain the meaning of company, product, or deals fields beyond what the schema constraints imply. It references 'documented case fields' but gives no specifics, failing to compensate for the low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose2/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description mentions 'Marge par deal' and a reference case, but does not clearly state what the tool does in a general sense. The verb is implied ('Returns a structured, audited deliverable'), but it's vague and does not distinguish this tool from siblings like margin_doctor_finance or pricing_in_deal.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description only says 'send the documented case fields' and mentions server-side validation. No guidance on when to use this tool versus alternatives, no exclusions, and no context about prerequisites or best practices.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

margin_doctor_financeC

Read-only

Inspect

Médecin des Marges — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Alan — ARR €60M · marge brute 68% → 79% · €3,2M fuites identifiées · Rule of 40 : 14→38. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`costBreakdown`	Yes
`marginTargets`	Yes
`unitEconomics`	Yes
`incomeStatement`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, so the description adds marginal behavioral context: server-side validation and structured output. However, lacks details on data handling, side effects, or limitations beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes a non-essential reference case (Alan) and French text that may not aid an English-speaking agent. Front-loads the purpose but could be more streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of nested objects and 6 parameters with no output schema, the description is insufficient. It does not explain the deliverable's structure or the tool's scope, leaving key gaps for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, and the description does not explain any parameter meanings. It merely says to send the case fields, leaving the agent to infer parameter semantics from the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool provides C-suite financial expertise for margin improvement, returns a structured audited deliverable, and cites a reference case. However, it does not explicitly differentiate from the sibling 'margin_doctor' tool, leaving some ambiguity about its specific niche.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like margin_doctor or other financial tools. The description only instructs to 'send the documented case fields' without context on prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

market_entry_strategistB

Read-only

Inspect

Stratégie d'entrée marché — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: OpenAI Inde 2026 — entrée marché 1.4Md utilisateurs · 5 forces Porter + 4 entry modes + 18-month roadmap + risk register. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`preferences`	Yes
`targetMarket`	Yes

Tool Definition Quality

B3.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true and openWorldHint=true. The description adds context about the tool being a paid, C-suite expertise service that returns a structured deliverable, which aligns with the read-only annotation. It does not contradict annotations, and provides useful behavioral context beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise at 4 sentences, but the opening line is somewhat jargon-heavy ('Gapup agent-payable C-suite expertise (CSO)') which may confuse agents. It is front-loaded with the tool's purpose, but the reference case sentence adds bulk without immediate clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite complexity (nested objects, no output schema), the description mentions the deliverable includes Porter's 5 forces, entry modes, roadmap, and risk register. However, it does not specify the exact output structure, pagination, or how to handle the async parameter for long running executions. This leaves gaps for an agent to understand the full behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, meaning most parameters lack descriptions in the schema. The description only generically tells to 'send the documented case fields' without explaining each parameter's meaning, purpose, or how to structure the nested objects. This is insufficient compensation for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides a market entry strategy deliverable (Stratégie d'entrée marché) and mentions a reference case. It uses a specific verb ('Returns a structured, audited deliverable') and resource. However, it does not differentiate from sibling tools like market_sizing or market_research_brief, so it misses distinction.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description guides users to 'send the documented case fields' and notes inputs are validated server-side, implying a specific input format. It does not explicitly state when to use this tool versus alternatives, nor when not to use it, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

marketing_roi_dashboardC

Read-only

Inspect

Dashboard ROI marketing — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Gapup Hub — H1 2026 · 5 canaux · ROI 3.2× · Attribution W-shaped · Budget €60k. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`arpuEur`	Yes
`channelData`	Yes
`companyName`	Yes
`periodLabel`	Yes
`totalRevenueAttribEur`	Yes
`targetAttributionModel`	Yes
`currentAttributionModel`	Yes
`totalMarketingBudgetEur`	Yes

Tool Definition Quality

C2.3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds that inputs are validated server-side and returns a structured deliverable, providing some extra context but not comprehensive behavioral traits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes a specific reference case that may be unnecessary. It is front-loaded somewhat but could be more concise and structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (9 params, nested array, no output schema), the description is inadequate. It does not explain return format, field meanings, or usage context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 11% for 9 parameters, but the description provides no parameter-level explanation beyond 'send the documented case fields.' Fails to compensate for low schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it's a dashboard for ROI marketing and returns a structured deliverable, but lacks a clear verb+resource distinction. It references a case but doesn't explicitly differentiate from sibling tools like attribution calibrators.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives like programmatic_attribution_calibrator or retail_media_attribution_bridge. The mention of CMO-level suggests a target audience but not when to prefer it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

market_research_briefA

Read-only

Inspect

Generate a structured, sourced market research brief on any market, sector or industry. Returns a machine-readable note with six sections: an executive overview, a market-size estimate (with assumptions and sources — no invented figures), key players, demand & technology trends, risk factors, and a traceable source list. When to use this tool: an agent needs to assess a new market, validate a business opportunity, prepare a pitch, or benchmark a sector before a strategic decision. Data is assembled live from keyless public sources: Wikipedia (sector context), World Bank (macro GDP/population for market sizing), REST Countries (geo context). Fields that cannot be sourced are marked 'unavailable' rather than estimated. Inputs: topic (required), geo and sector (optional refinements).

ParametersJSON Schema

Name	Required	Description
`geo`	No	Optional geography to scope the brief (country name, region, or continent — e.g. 'France', 'Southeast Asia')
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`topic`	Yes	Market or sector to research (e.g. 'electric vehicle batteries', 'B2B SaaS CRM Europe', 'telemedicine Africa')
`sector`	No	Optional parent sector to disambiguate the topic (e.g. 'healthcare', 'energy', 'software')

Output Schema

ParametersJSON Schema

Name	Required	Description
`geo`	Yes
`risks`	Yes
`topic`	Yes
`sector`	Yes
`trends`	Yes
`sources`	Yes	All sources consulted, with URL and retrieval status
`overview`	Yes	Executive summary of the market
`key_players`	Yes
`generated_at`	Yes	ISO-8601 timestamp of generation
`market_size_estimate`	Yes	Market size estimate with hypotheses. All figures sourced or marked unavailable.

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds behavioral context: data is assembled live from public sources, no invented figures, and fields marked 'unavailable' if not found. This goes beyond the annotations without contradicting them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and concise, covering purpose, output, usage guidance, data sources, behavior, and inputs in a single paragraph. Every sentence contributes value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown) and the detailed description covering purpose, usage, sources, and behavior, the description provides sufficient context for an agent to correctly invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so each parameter is already documented. The description summarizes inputs (topic required, geo and sector optional) and provides example topics, adding value beyond the schema alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates a structured market research brief on any market, sector, or industry, listing six specific sections. It implicitly distinguishes itself from the sibling 'market_sizing' by being a comprehensive brief rather than just sizing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes explicit guidance on when to use the tool: for assessing new markets, validating opportunities, preparing pitches, or benchmarking sectors. It does not mention when to avoid it or provide alternative tools, but the usage context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

market_sizingC

Read-only

Inspect

Dimensionnement marché TAM/SAM/SOM — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Gapup Hub — TAM/SAM/SOM IA décisionnelle C-suite Europe · TAM €48Md · SOM €280M Year-3. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`target`	Yes
`horizon`	No
`product`	Yes
`approach`	No
`competitorComps`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds that it returns an audited deliverable and that inputs are validated server-side. It does not contradict annotations but provides limited additional behavioral insight beyond what annotations already convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, relatively concise, and front-loads the core purpose. However, it includes jargon ('Gapup agent-payable') that may not be universally clear, and could be streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested objects, no output schema), the description is incomplete. It does not describe the returned deliverable structure, error handling, or prerequisites beyond input validation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, and the description does not elaborate on parameter meanings. It mentions 'send the documented case fields' but does not explain them. The schema has complex nested objects, but the description adds no extra semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool sizes TAM/SAM/SOM markets and returns a structured deliverable. It specifies the domain (C-suite, CMO) and provides a reference case. However, it does not differentiate from sibling tools like market_entry_strategist or market_research_brief.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description does not specify when to use this tool versus alternatives. It mentions 'Gapup agent-payable C-suite expertise (CMO)' which hints at the target audience but lacks explicit guidance on context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ma_tax_efficiency_mapperA

Read-onlyIdempotent

Inspect

For CFOs evaluating cross-border M&A deals: analyzes tax efficiency by mapping withholding tax rates, transfer pricing regulations, and permanent establishment risks across specified jurisdictions. Inputs include acquirer/target jurisdictions, deal structure, and transaction value. Outputs jurisdiction-specific tax exposure, efficiency scores, and risk flags. Uses World Bank Tax Rates API, IMF SDR data, and SEC EDGAR filings for corporate tax disclosures. — pass async:true REQUIRED to avoid x402 timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`deal_structure`	No	Type of M&A transaction structure
`transaction_value`	No	Deal value in USD millions
`target_jurisdiction`	Yes	ISO 3166-1 alpha-3 country code of the target entity
`acquirer_jurisdiction`	Yes	ISO 3166-1 alpha-3 country code of the acquiring entity
`include_transfer_pricing`	No	Whether to analyze transfer pricing risks

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`tax_treaties`	No
`efficiency_score`	No
`target_tax_rates`	No
`acquirer_tax_rates`	No
`transfer_pricing_risk`	No
`permanent_establishment_risk`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnly, idempotent, openWorld. Description adds external data sources (World Bank, IMF, SEC) and the need for async to avoid timeout. It does not contradict annotations and discloses timeout risk.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, front-loaded with target audience, and efficiently covers purpose, inputs, outputs, data sources, and a critical usage note (async). No redundant sentences.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and presence of output schema, the description covers audience, function, inputs, outputs, data sources, and async requirement sufficiently. It lacks prerequisites but is otherwise complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description lists inputs but does not add significant detail beyond the schema; it repeats parameter names without additional semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes tax efficiency for cross-border M&A deals by mapping withholding tax rates, transfer pricing, and PE risks. It uses specific verbs (analyzes, maps) and resource (tax efficiency), distinguishing it from sibling tools like ma_deal_screener or tax_optimization.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description targets CFOs evaluating cross-border M&A, providing clear context. It mentions the async requirement but does not explicitly exclude alternatives or state when not to use, though the specificity implies appropriate use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

meddic_scoringC

Read-only

Inspect

Scoring MEDDIC du pipeline — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Pipeline 8 deals · €2.1M · MEDDIC score moyen 62/100 · 3 deals at-risk. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`deals`	Yes
`company`	Yes
`product`	Yes
`salesCycle`	No
`targetWinRate`	No

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true (no mutation) and openWorldHint; the description adds that it returns an 'audited deliverable' and validates inputs server-side. It does not mention the async parameter behavior (the schema has an async boolean) or any side effects. The description provides modest additional behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (3 sentences) and front-loaded with the purpose. It uses a reference case for context. However, it could benefit from structured bullet points or parameter explanations to improve scannability.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested objects, 6 parameters, no output schema) and low schema coverage, the description is incomplete. It lacks output format details, score interpretation, and async behavior guidance. The reference case provides some context but is insufficient for an agent to fully understand the tool's behavior and expected inputs/outputs.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (only async has a description). The description does not explain any of the six parameters or their sub-fields (e.g., deals with required nested fields). It merely says 'send the documented case fields' without elaboration, failing to compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it scores a MEDDIC pipeline and returns a structured deliverable. It mentions a specific use case (Gapup Hub) and distinguishes the tool as focusing on MEDDIC scoring among many sibling tools. However, it does not define MEDDIC or specify the output format beyond 'structured, audited deliverable'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus its many siblings (e.g., deal_coach, sales_pipeline_forecast). It lacks prerequisites, alternatives, or context-based selection criteria. The only mention is that inputs are validated server-side, which is not a usage guideline.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

model_behavior_drift_monitorA

Read-onlyIdempotent

Inspect

Monitors AI model output drift by comparing current model responses against MLCommons safety benchmarks. Designed for risk and compliance personas to detect behavioral deviations that may indicate safety or alignment issues. Accepts model outputs or identifiers and returns structured drift metrics with statistical significance. Sources data from MLCommons public benchmark APIs.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`threshold`	No	Drift threshold for alerting
`currentOutputs`	No	Recent model outputs to analyze for drift
`baselineMetrics`	No
`modelIdentifier`	Yes	Unique identifier for the model being monitored

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`driftMetrics`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, open-world, and idempotent behavior. The description adds that it sources data from external APIs and returns structured metrics with statistical significance, complementing annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three front-loaded sentences, covering purpose, users, inputs, and outputs without redundancy. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and comprehensive annotations, the description adequately covers the tool's behavior. It could optionally mention statistical tests or significance thresholds, but remains fairly complete for drift monitoring.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 80% schema description coverage, the tool description adds minimal parameter detail beyond listing model outputs/identifiers. The threshold and baselineMetrics parameters are not elaborated, so the description provides only modest added meaning.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool monitors AI model output drift against MLCommons safety benchmarks, specifies target personas (risk/compliance), and distinguishes from siblings like model_safety_certification_checker by focusing on drift detection rather than certification checks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for detecting behavioral deviations, but lacks explicit 'when not to use' or comparison with siblings. However, it clearly identifies the data source and purpose, providing sufficient guidance for appropriate selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

model_safety_certification_checkerA

Read-onlyIdempotent

Inspect

Verifies AI model safety certifications against MLCommons and IEEE 7000 standards. Designed for risk management personas to assess model compliance with established safety benchmarks. Accepts model identifiers or certification IDs and returns structured verification results with source references.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`model_id`	Yes	Unique identifier for the AI model
`standard`	No	Safety standard to check against
`certification_id`	No	Specific certification ID to verify

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`compliance`	No
`last_verified`	No

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, idempotentHint, and openWorldHint, which indicate safe, idempotent behavior. The description adds that the tool accepts model identifiers or certification IDs and returns structured verification results with source references. This adds context beyond annotations but does not disclose potential failure modes, rate limits, or behavior when certifications are not found.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, with the verb and purpose in the first sentence. Each sentence adds value: purpose, target audience, and input/output summary. There is no redundancy or filler, making it highly efficient and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 4 parameters (1 required), 100% schema documentation, a parameter with enums, and an output schema, the description covers the essential aspects: purpose, input types, and output nature. It does not detail the output schema explicitly, but since an output schema exists, the agent can infer that from structured data. The description is complete enough for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema already documents all parameters. The description mentions 'model identifiers or certification IDs', but that simply paraphrases the schema parameters (model_id and certification_id). It does not add meaning beyond the schema, such as format constraints or usage examples, resulting in minimal added value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool verifies AI model safety certifications against MLCommons and IEEE 7000 standards. It specifies the verb 'verifies' and the resource 'model safety certifications', providing a clear purpose. However, it does not explicitly distinguish from sibling tools like 'safety_guardrail_breach_analyzer' or 'bias_amplification_tracker', though the specific standards named offer some differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions 'designed for risk management personas', which implies the target user and context. However, it does not provide explicit when-to-use or when-not-to-use guidance, nor does it mention alternatives among the many sibling tools. This leaves the agent with only implied usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

monte_carlo_portfolioA

Read-only

Inspect

Pure-compute Monte Carlo portfolio simulation using Geometric Brownian Motion (GBM). Models a multi-asset portfolio across time with contributions, withdrawals, and annual rebalancing. Returns full probability distribution of terminal wealth, percentile paths, drawdown stats, and Sharpe ratio. Modes: simulate (full Monte Carlo) | glide_path (lifecycle 110-age target-date allocation) | stress_test (4 historical crises: 2008 GFC / 2000 dotcom / 1970s stagflation / 2020 COVID). No external data needed — all computed from asset assumptions. Ticker defaults built-in: SPY/VOO/VTI 7%/15%, QQQ 9%/20%, TLT/BND 3%/6%, GLD 5%/18%, BTC 30%/70%. ICP: asset managers, family offices, retail wealth advisors, robo-advisor agents, retirement planners. 10k simulations × 30 years runs in <3s on V8 JIT.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	simulate = full Monte Carlo GBM \| glide_path = lifecycle target-date allocation \| stress_test = 4 historical crisis scenarios
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`assets`	Yes	Portfolio assets. Weights must sum to 1.0 (auto-normalized if not).
`simulations`	No	Number of Monte Carlo simulations (1000-100000). Default 10000.
`horizon_years`	Yes	Investment horizon in years (1-50).
`target_value_eur`	No	Target terminal portfolio value in EUR. Used to compute probability_target_achieved.
`confidence_intervals`	No	Percentiles to compute in the output distribution. Default [5, 25, 50, 75, 95].
`initial_investment_eur`	Yes	Initial capital in EUR (e.g. 100000 for €100k).
`withdrawals_annual_eur`	No	Annual withdrawal amount in EUR for decumulation phase (e.g. 50000 for €50k/yr).
`contributions_annual_eur`	No	Annual contribution in EUR (e.g. 12000 for €1000/month).

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds performance metrics and the fact that no external data is needed, which contradicts the openWorldHint annotation. While it discloses behavior beyond annotations, the contradiction partially undermines transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is dense and front-loaded, covering purpose, modes, performance, and audience. It is efficient but slightly verbose in listing ticker defaults and mode specifics.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite no output schema, the description fully explains what is returned (distribution, stats). It also covers constraints and performance. No major gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are well-documented in the schema. The description does not add additional semantic details beyond the schema, leading to a baseline score of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a Monte Carlo portfolio simulation using GBM, with specific verbs and resource. It distinguishes from siblings by being a unique financial simulation tool, and lists modes and typical use cases.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies ideal customer profiles and explains three modes, providing context on when to use each. However, it lacks explicit exclusions or alternatives, though siblings are not directly similar.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

mttr_breakdown_analyzerA

Read-onlyIdempotent

Inspect

As a CTO, analyze your team's incident response efficiency by breaking down Mean Time To Recovery (MTTR) into root causes: code defects, infrastructure failures, or process bottlenecks. This tool ingests GitHub issue and pull request data alongside Snyk vulnerability reports to provide a detailed breakdown of MTTR components, helping you identify systemic weaknesses in your incident resolution pipeline. Input your GitHub repository details and time range to receive a structured analysis of MTTR contributors with actionable insights.

ParametersJSON Schema

Name	Required	Description
`repo`	Yes	Full GitHub repository name (owner/repo)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`since`	Yes	Start date for analysis (ISO 8601)
`until`	Yes	End date for analysis (ISO 8601)
`snykToken`	No	Snyk API token for vulnerability data (optional)
`githubToken`	Yes	GitHub personal access token for API access

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`breakdown`	No
`topContributors`	No

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true and idempotentHint=true, so the description does not need to reiterate safety. It adds that the tool ingests GitHub and Snyk data, which provides context, but does not disclose additional behavioral traits like performance or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise: two sentences that front-load the purpose, state the inputs, and summarize the output. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given schema coverage is 100% and an output schema exists, the description is complete enough to understand what the tool does. It could briefly mention how it differs from sibling tools (e.g., change_failure_root_cause_classifier) but is still adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with all parameters described. The description mentions 'repository details and time range' but does not add new meaning beyond the schema. The 'async' parameter is not highlighted in the description, but it is documented in the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: analyzing incident response efficiency by breaking down MTTR into root causes. It specifies the resource (team's incident response efficiency) and the verb (analyze). It distinguishes from siblings like 'change_failure_root_cause_classifier' by focusing on MTTR and using GitHub and Snyk data.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for MTTR breakdown analysis but does not explicitly state when to use this tool versus alternatives. No guidance on when not to use it or what prerequisites exist beyond input parameters.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

nis2_supply_chain_dependency_mapA

Read-onlyIdempotent

Inspect

Generates a visual dependency map of supply chain relationships under the NIS2 Directive, scoring criticality based on regulatory sources like EUR-Lex and CNIL decisions. Designed for legal and compliance teams to identify high-risk third-party dependencies. Inputs include organization identifiers and optional scope filters. Outputs structured dependency data with criticality scores and regulatory references.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`depth`	No	Dependency chain depth to analyze
`scope`	No	Analysis scope: full supply chain or critical dependencies only
`sector`	No	NIS2 sector classification (e.g., 'energy', 'transport')
`organizationId`	Yes	Unique identifier for the organization (e.g., VAT number or LEI)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`dependencies`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, so the description does not need to repeat that. It adds useful behavioral context: uses regulatory sources like EUR-Lex and CNIL, outputs structured data with criticality scores and references. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences with no fluff. First sentence states core function, second targets audience and purpose, third lists inputs and outputs. Front-loaded structure ensures quick comprehension.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a 5-param tool with an output schema, the description covers all essential aspects: purpose, audience, inputs, and output nature. The existence of an output schema relieves the need to detail return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description adds minimal extra meaning. It mentions 'organization identifiers' and 'scope filters' which map to parameters, but doesn't detail async, depth, or sector beyond what the schema provides. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates a visual dependency map of supply chain relationships under NIS2, scoring criticality based on regulatory sources. This specific verb+resource+domain combination distinguishes it from siblings like generic supply chain tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly targets legal and compliance teams for identifying high-risk third-party dependencies and lists required inputs (organization identifier) and optional scope filters. It does not mention when not to use or alternatives, but the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

observability_log_pattern_minerA

Read-onlyIdempotent

Inspect

As a CTO, extract anomalous log patterns from public breach reports (e.g., Verizon DBIR) and MITRE ATT&CK techniques to optimize SIEM rules and observability pipelines. Inputs include threat actor groups, MITRE tactics (e.g., 'TA0005'), or log sources (e.g., 'AWS CloudTrail'). Outputs structured patterns with MITRE mappings, prevalence scores, and detection recommendations. Ideal for reducing false positives and improving breach detection coverage. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`tactic`	Yes	MITRE ATT&CK tactic ID (e.g., 'TA0005')
`technique`	No	MITRE ATT&CK technique ID (e.g., 'T1059')
`log_source`	No	Log source type (e.g., 'AWS CloudTrail', 'Windows Event Log')
`max_results`	No
`threat_actor`	No	Threat actor group name (e.g., 'APT29')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`metadata`	No
`patterns`	Yes
`warnings`	Yes

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint, openWorldHint, and idempotentHint, already conveying safety. The description adds useful behavioral context: mentions async capability to avoid timeout, and describes output structure (structured patterns with MITRE mappings, prevalence scores, detection recommendations). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences, front-loading the purpose. The opening 'As a CTO...' is slightly verbose but not detrimental. Each sentence adds value: purpose, inputs, outputs, ideal use, async hint. Could be slightly more concise, but overall efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and annotations, the description covers key aspects: purpose, input types, output structure, async behavior, and ideal use case. It does not mention errors or rate limits, but the output schema likely handles return structure. Reasonably complete for its complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 83%, so most parameters are well-documented. The description adds examples (e.g., 'TA0005', 'AWS CloudTrail') and mentions the async parameter, but this does not significantly enhance understanding beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'extract anomalous log patterns from public breach reports and MITRE ATT&CK techniques'. It specifies the verb, resource, and unique focus on log patterns from breach reports, distinguishing it from siblings like 'observability_metric_anomaly_detector' which deals with metrics.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a usage hint: 'Pass async:true to avoid timeout' and states it's 'Ideal for reducing false positives'. However, it lacks explicit when-to-use vs alternatives guidance, which is important given the large sibling list. No direct comparison or exclusion conditions are provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

observability_metric_anomaly_detectorA

Read-onlyIdempotent

Inspect

As a CTO, quickly identify anomalous cloud metrics (CPU, latency, memory) by comparing your infrastructure against AWS public benchmarks and CVE-linked hardware risks. Input your observed metrics (e.g., CPU utilization, request latency) and receive a risk assessment with potential root causes. Ideal for performance troubleshooting, security hardening, and capacity planning. Keywords: cloud observability, anomaly detection, CVE hardware risks, AWS benchmark comparison.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	No
`metricType`	Yes
`instanceType`	No
`observedValue`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`cveRisks`	No
`warnings`	No
`anomalyScore`	No
`benchmarkValue`	No
`deviationPercent`	No

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and openWorldHint=true. The description adds valuable context about comparing against AWS benchmarks and CVE-linked hardware risks, explaining the 'open world' aspect. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph of 4 sentences, front-loading the core purpose. It is efficient and avoids redundancy, though it could be slightly more concise by removing the keyword line.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description gives a good overall picture but lacks details on parameters like region and instanceType, and does not explain the async parameter's behavior. Given the output schema exists, return values are not needed, but parameter relationships could be clearer.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only 'async' described). The description partially compensates by explaining metricType (CPU, latency, memory) and observedValue, and hints at instanceType/region via 'infrastructure', but does not fully describe all 5 parameters or their formats.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool identifies anomalous cloud metrics (CPU, latency, memory) by comparing against AWS benchmarks and CVE risks. The verb 'identify' and resource 'anomalous cloud metrics' are specific, and it distinguishes from sibling tools like observability_log_pattern_miner by focusing on metrics rather than logs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists ideal use cases (performance troubleshooting, security hardening, capacity planning) but does not explicitly state when not to use the tool or name alternative tools. Usage is implied but lacks explicit guidance for decision-making.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

onboarding_salariesC

Read-only

Inspect

Onboarding opérationnel des salariés — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Pennylane (FR fintech SaaS, ~250 FTE) — 5 parcours 30/60/90 jours · Engineering / Sales / CS / Design / People Ops. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`roles`	Yes
`company`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true and openWorldHint=true, so the description need not repeat safety or mutation behavior. It appropriately states that the tool returns a structured deliverable, which aligns with the read-only hint. No contradictions. Limited additional behavioral context, but adequate given annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, but the first sentence mixes French and jargon, reducing clarity. The second and third sentences are clear. It is not overly long, but the front-loading is compromised by jargon.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the nested objects in the schema, absence of output schema, and moderate complexity, the description fails to explain input structure (roles, company fields) or output format. The reference case gives some context but does not substitute for a complete specification.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 25%: only the 'async' parameter has a description. The description does not explain the purpose or format of 'focus', 'roles', or 'company' fields. It merely says to send documented case fields, which is vague. Agent cannot infer parameter semantics from description alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is for operational employee onboarding and mentions a structured deliverable, with a reference case that illustrates its scope. However, the jargon 'Gapup agent-payable C-suite expertise (COO)' can confuse agents. The purpose is understandable but could be more direct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool compared to siblings or alternatives. It only mentions that inputs are validated server-side and to send documented case fields, but fails to clarify prerequisites, exclusions, or scenarios where another tool would be better.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

operational_dashboardsC

Read-only

Inspect

Dashboards opérationnels — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Qonto (5 départements · 12 KPIs) — 4 dashboards live en 3 semaines · time-to-décision -55%. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`techStack`	Yes
`departments`	Yes
`kpiRequests`	Yes
`primaryDashboardTool`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint; the description adds that inputs are validated server-side and that the deliverable is audited. This provides some context beyond annotations but does not disclose all behavioral traits (e.g., auth needs, rate limits).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is somewhat verbose, including marketing claims and a reference case. It front-loads the title but could be more concise. Every sentence adds some value, but the overall length is not optimal.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having nested objects and no output schema, the description does not explain return format, error handling, or async usage beyond the async parameter. The reference case provides some context, but completeness is low for an agent to confidently use the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (only async documented). The description mentions fields like company, departments, kpiRequests, techStack but provides no additional meaning, format, or constraints beyond what the schema specifies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it creates operational dashboards for C-suite (COO) and returns a structured, audited deliverable, with a reference case. However, it lacks a precise verb and does not distinguish from sibling tools that may also generate reports or dashboards.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description only mentions input validation, but does not specify when it is appropriate to invoke, prerequisites, or limitations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

oss_dependency_velocity_trackerA

Read-onlyIdempotent

Inspect

As a CTO, track the update velocity of your project's open-source dependencies to assess their impact on DORA metrics like deployment frequency and lead time. This tool fetches release history and version adoption data from npm registry and libraries.io, providing insights into dependency freshness, update frequency, and potential risks. Input a list of package names and optional version ranges to analyze. Outputs structured dependency velocity metrics and warnings about stale or rapidly changing packages.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`packages`	Yes
`lookbackDays`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`metrics`	No
`sources`	No
`warnings`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read-only, idempotent, and open-world hints. The description adds value by specifying it fetches from npm registry and libraries.io, and outputs warnings about stale or rapidly changing packages. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences, front-loaded with purpose, and each sentence adds new information without redundancy. It is concise and well-structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool fetches external data and an output schema exists, the description covers input, data sources, and output type (metrics and warnings). It does not detail error handling or rate limits, but the output schema likely covers return structure, making it sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 33% (async parameter described). The description explains that 'packages' is a list of names with optional versions, adding meaning beyond the schema for that field. However, it does not mention 'lookbackDays', leaving a gap for parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool tracks update velocity of open-source dependencies and assesses impact on DORA metrics. It distinguishes from siblings like 'dependency_vulnerability_scan' by focusing on velocity and freshness, not security.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains the input (list of package names and optional version ranges) and the use case (tracking dependency velocity for DORA metrics). However, it doesn't explicitly state when not to use it or provide alternatives, so a small gap remains.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ossf_scorecard_trend_analyzerA

Read-onlyIdempotent

Inspect

As a CTO, analyze OSSF Scorecard trends for your top 10-50 dependencies to identify security regressions or deteriorating project health. Input GitHub repository names (owner/repo), get structured trend data including score deltas, check failures, and risk flags. Uses OSSF Scorecard API and GitHub Archive for historical context. Ideal for proactive dependency management and risk assessment.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`lookbackDays`	No	Number of days to analyze trends for
`repositories`	Yes	List of GitHub repositories in owner/repo format
`minScoreThreshold`	No	Minimum acceptable score to flag as risky

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`results`	No
`sources`	No
`warnings`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, openWorldHint. The description adds value by revealing it uses external APIs (OSSF Scorecard API and GitHub Archive) and outputs score deltas, check failures, and risk flags. It does not contradict annotations and provides useful behavioral context beyond what annotations offer.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two well-structured sentences covering audience, action, input, output, data sources, and use case. No redundant words; every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given existence of output schema and annotations, description covers purpose, input, output, data sources, and use case. It does not mention async behavior (async parameter exists but not described) or rate limits, but is otherwise complete for a read-only, idempotent tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with all parameter descriptions in schema. The description reinforces 'repositories' input format but adds no new semantic meaning beyond schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's specific verb 'analyze' and resource 'OSSF Scorecard trends'. It explicitly mentions input format (owner/repo), output (structured trend data including score deltas, check failures, risk flags), and distinguishes from siblings by focusing on Scorecard trends specifically, not general vulnerability scanning or dependency velocity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context: 'for your top 10-50 dependencies' and 'Ideal for proactive dependency management and risk assessment.' It implies when to use (for security regression detection) but does not explicitly list when not to use or name alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

outbound_sequencerD

Read-only

Inspect

Séquences outbound — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub → CFO + CRO B2B SaaS France — Séquence 6 touches multi-canal · Taux réponse +180%. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`icp`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`offer`	Yes
`excludedAngles`	No
`targetAccounts`	No

Tool Definition Quality

D1.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true and openWorldHint=true. Description adds that it returns a structured deliverable and inputs are server-validated, but no additional behavioral traits like permission requirements or output format. Minimal addition.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description includes a detailed but irrelevant reference case, making it less concise. It front-loads jargon rather than essential usage information. Could be restructured to be more helpful.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 params, nested objects, no output schema), the description is severely incomplete. It does not describe the return structure, parameter semantics, or required input format beyond a cryptic reference.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description does not explain any parameters beyond a vague 'send the documented case fields.' With only 20% schema description coverage, the description fails to add meaning for the icp, offer, excludedAngles, or targetAccounts fields.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it generates outbound sequences for C-suite expertise, but the purpose is obscured by jargon ('Gapup agent-payable CRO') and lacks clear differentiation from many sibling tools like sales_enablement_architect or battle_plan.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines1/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The reference case is an example, not a usage guideline. No explicit when-to-use or when-not-to-use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

paid_ads_optimizerC

Read-only

Inspect

Optimiseur de publicités payantes — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Spendesk (Google + LinkedIn · €45k/mo) — €9k/mo gaspillés identifiés · ROAS LinkedIn ×2.4. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`campaigns`	Yes
`targetMetric`	Yes
`audienceDescription`	Yes
`totalMonthlyBudgetEur`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint and openWorldHint, and the description aligns by stating it returns a deliverable without modifying anything. However, the description adds little beyond the annotations, merely repeating the output type and input validation.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes a case study that is not essential for tool invocation. It is front-loaded with the title but could be trimmed to focus on critical usage instructions.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has no output schema and multiple required parameters, the description is incomplete. It does not detail the expected deliverable structure or provide enough context for the agent to use the tool effectively without additional information.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, yet the description fails to explain the meaning or usage of the six parameters (e.g., company, campaigns, targetMetric). It only generically says 'send the documented case fields', which does not compensate for the low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it is a paid advertising optimizer ('Optimiseur de publicités payantes') that returns a structured audited deliverable, with a reference case. This clearly identifies the tool's purpose and verb+resource, though it does not explicitly differentiate from sibling marketing tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description provides a case study but no conditions, prerequisites, or exclusions, leaving the agent to infer usage context from the tool name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

partnership_synergiesA

Read-onlyIdempotent

Inspect

Identify and rank strategic partnership opportunities for a company. Returns 5-12 high-fit partnership targets, each scored on revenue lift, time-to-impact, integration complexity and regulatory risk, with a rationale and a recommended first-step outreach playbook. When to use this tool: the user wants business-development or alliance ideas, or M&A target screening before deeper due diligence. Inputs: the user's own company and the strategic axis to unlock through partnership (e.g. enter a new market via distribution, add AI infrastructure without rebuilding). Delivered by Antoine, the AI CSO of the Gapup portfolio.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`constraints`	No
`selfCompany`	Yes
`strategicAxis`	Yes	What strategic axis to unlock through partnership (e.g. 'enter US market via distribution', 'leverage AI infra without rebuild')
`currentPartnerships`	No	Existing alliances to factor in

Output Schema

ParametersJSON Schema

Name	Required	Description
`kpis`	No	3-5 headline KPI bubbles
`sources`	No
`recommendations`	No	Prioritised next steps
`executiveSummary`	Yes	Board-ready partnership opportunity overview
`partnershipTargets`	Yes	5-12 ranked partnership targets

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, etc. The description adds the output format (scored list with rationale and playbook) and the persona ('Delivered by Antoine...'), enhancing transparency beyond annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured, with a clear statement of purpose, output details, usage guidance, and input summary. Every sentence adds value, though it could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested objects, output schema exists), the description covers the core purpose and usage but lacks details on optional parameters (focus, constraints) and how to interpret the output scores. The presence of an output schema partially mitigates this.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description explains the meaning and purpose of the two main parameters (selfCompany, strategicAxis) with examples, complementing the schema's 50% coverage. It does not elaborate on optional parameters like constraints, focus, or currentPartnerships, but these have schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool identifies and ranks strategic partnership opportunities, specifies the output (5-12 targets scored on revenue lift, time-to-impact, etc.), and distinguishes from siblings by targeting business-development/alliance ideas and M&A pre-due-diligence screening.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use this tool ('user wants business-development or alliance ideas, or M&A target screening before deeper due diligence'), providing clear context. However, it does not explicitly state when not to use or name specific alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

patent_landscapeA

Read-only

Inspect

Search, analyze and map patent landscapes across major jurisdictions (US, EP, WO, CN, JP, KR). Three modes: (1) search — find patents by keywords, company name or inventor name; (2) landscape — aggregate distributions: top assignees, top inventors, CPC class breakdown, filings by year, citation leaders, white-space innovation opportunities; (3) lookup — retrieve a specific patent by number (e.g. US10000000B2, EP3456789A1, WO2023/123456). Primary source: WIPO PatentScope (WO PCT, keyless). Optional sources: USPTO PatentsView (US, env PATENTSVIEW_API_KEY), EPO OPS (EP/WO, env EPO_OPS_CONSUMER_KEY + EPO_OPS_CONSUMER_SECRET), Lens.org (global, env LENS_API_TOKEN). Use cases: freedom-to-operate (FTO) analysis, R&D gap identification, VC due diligence IP audit, competitor patent portfolio mapping, inventor network analysis. SLA: <=24s p95 (parallel fetches, 8s per source). Cache: 24h TTL (patent data stable). Quality score: 30 pts per retrieved source (max 90), +10 if >=10 patents, +10 bonus for landscape mode with non-empty top_assignees.

ParametersJSON Schema

Name	Required	Description
`mode`	No	search: keyword/inventor/assignee search; landscape: aggregate distributions; lookup: fetch by patent number. Default: "search"
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Keywords, company/inventor name, or patent number (e.g. "machine learning", "Tesla Inc", "US10000000B2")
`date_to`	No	ISO date YYYY-MM-DD — latest filing date
`date_from`	No	ISO date YYYY-MM-DD — earliest filing date
`max_results`	No	Max patents to return (5-50). Default: 20
`jurisdictions`	No	Jurisdictions to include. Default: ["US","EP","WO"]

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`status`	Yes
`patents`	Yes
`sources`	Yes
`landscape`	No
`quality_score`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read-only, open-world, non-destructive behavior. The description goes beyond by detailing SLA (<=24s p95, parallel fetches), cache (24h TTL), quality scoring (30 pts per source, bonuses), and optional source configurations with environment variables. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured: starts with core purpose, then modes, sources, use cases, and concludes with performance metrics. Every sentence adds value, no repetition or fluff. At ~10 sentences, it is appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, multiple modes, optional sources, async support, performance guarantees), the description covers all essential aspects: modes, sources, env vars, use cases, SLA, cache, quality scoring. Output schema exists so return details are implicit. The async parameter handling is also explained.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the description enriches parameter understanding: query provides examples, mode explains the three modes with default, jurisdictions lists defaults, max_results specifies range. The description adds context beyond schema (e.g., SLA, cache) but not all parameters are elaborated (date_from, date_to lack format hints beyond ISO).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description precisely states the tool's purpose: patent landscape analysis across major jurisdictions with three distinct modes (search, landscape, lookup). It lists concrete use cases (FTO, R&D gaps, VC due diligence) and distinguishes from sibling async/result tools via the async parameter explanation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description clearly explains when to use each mode (search for keywords, landscape for aggregates, lookup for specific patents) and provides use case scenarios. It mentions the async option for slow queries to avoid timeouts. However, it does not explicitly state when not to use this tool (e.g., if data volume is too high), and sibling tools like patent_landscape_async are not directly compared.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

patent_landscape_asyncA

Read-only

Inspect

Async extended variant of patent_landscape. Supports max_results up to 200 (vs 50 in sync mode) and an optional include_citation_graph flag that enriches each patent with its 2-level citation graph (parent patents that cite this one + child patents cited by this one). Returns immediately (<300ms) with a job_id. Poll the result with patent_landscape_result(job_id) after eta_seconds (~180s). Use for deep R&D white-space analysis, freedom-to-operate (FTO) audits, VC due diligence IP mapping, or large-scale competitor portfolio analysis. Async tool — register a webhook via webhooks_manage(register, url, [job.completed]) to receive callbacks instead of polling. Faster + lighter.

ParametersJSON Schema

Name	Required	Description
`mode`	No	search / landscape / lookup. Default: "search"
`query`	Yes	Keywords, company/inventor name, or patent number (e.g. "machine learning", "Tesla Inc")
`date_to`	No	ISO date YYYY-MM-DD — latest filing date
`date_from`	No	ISO date YYYY-MM-DD — earliest filing date
`max_results`	No	Max patents to return (5-200). Default: 20
`jurisdictions`	No	Jurisdictions to include. Default: ["US","EP","WO"]
`include_citation_graph`	No	If true, enriches each patent with a 2-level citation graph (parents + children). Adds significant processing time — use for deep analysis only. Default: false.

Output Schema

ParametersJSON Schema

Name	Required	Description
`job_id`	Yes	Unique job identifier — pass to patent_landscape_result
`status`	Yes
`eta_seconds`	Yes
`submitted_at`	Yes

Tool Definition Quality

A4.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, destructiveHint=false. The description adds async behavior (immediate job_id, ~180s ETA), cites polling/webhook, and notes additional processing time for citation graph. No contradictions; a few more details on failure modes or rate limits would elevate it.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured: it starts with the core async distinction, then key parameters, timing, use cases, and webhook option. Every sentence adds value; no redundancy. Appropriate length for the complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (async, 7 params, citation graph, result retrieval), the description covers all critical aspects: purpose, parameter extensions, timing, result polling/webhook, and use cases. Output schema exists for return structure, so no gap there.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so each parameter has a basic description. The description adds context beyond schema: max_results range difference from sync, citation graph effect, and default jurisdictions. This provides meaningful additional understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as an async variant of patent_landscape, explicitly distinguishing it with larger max_results (200 vs 50) and optional citation graph. It lists specific use cases (R&D white-space, FTO audits, VC due diligence, competitor analysis), providing a clear verb+resource+scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives explicit guidance on when to use this async variant (deep analysis, large-scale needs) and how to retrieve results (polling or webhooks via webhooks_manage). It also mentions the sync variant for smaller queries, effectively framing alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

patent_landscape_resultA

Read-onlyIdempotent

Inspect

Poll the result of a patent_landscape_async job. Returns status=pending while running, status=completed with the full patent landscape report once done, status=failed on error, or status=not_found if the job_id is unknown or expired (TTL 24h). Call this after the eta_seconds hint returned by patent_landscape_async (~180s).

ParametersJSON Schema

Name	Required	Description	Default
`job_id`	Yes	The job_id returned by patent_landscape_async (prefix: patl_)

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds value beyond annotations by detailing the statuses returned (pending, completed, failed, not_found) and the TTL of 24 hours. Annotations already indicate readOnly, idempotent, and non-destructive, so this is additive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two succinct sentences that front-load the purpose and provide essential details. Every sentence is informative, with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a polling tool with an output schema (not required to explain return values in description), it covers statuses, TTL, and optimal calling time. It is complete given the context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a clear description of job_id. The description only adds the prefix patl_, which is already in the schema. No significant additional meaning is provided.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it polls the result of a patent_landscape_async job, using a specific verb and resource. It distinguishes itself from siblings like patent_landscape_async (which initiates the job) by focusing on polling.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly instructs to call after the eta_seconds hint (~180s) from patent_landscape_async, providing good timing context. However, it does not explicitly mention when not to use it or alternatives for different scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

patent_ownership_auditA

Read-onlyIdempotent

Inspect

Audits patent ownership for employees or contractors, identifying gaps where inventors may not have properly assigned patent rights to the company. Designed for CHROs to ensure IP compliance and mitigate legal risks. Inputs: employee/contractor names or IDs, optional date range. Outputs: list of patents, ownership status, flagged gaps, and assignment details. Sources: USPTO PatFT and EPO Espacenet public records. Keywords: patent audit, IP compliance, employee inventions, contractor agreements, CHRO.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`dateRange`	No	Optional date range for patent filings
`employeeIds`	No	List of employee or contractor IDs (optional if names provided)
`employeeNames`	Yes	List of employee or contractor full names to audit

Output Schema

ParametersJSON Schema

Name	Required	Description
`gaps`	No
`status`	Yes
`patents`	No
`sources`	No
`warnings`	No

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint=true, openWorldHint=true, idempotentHint=true. The description adds behavioral context: it audits, identifies gaps, and outputs specific details. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with purpose, then inputs/outputs, then sources. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 params, nested objects, output schema), the description covers inputs, outputs, sources, and keywords. It is sufficient for an agent to understand and invoke correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, placing baseline at 3. The description adds meaning by explaining the purpose of parameters (names/IDs, optional date range) and their roles in the audit workflow.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it audits patent ownership for employees/contractors, identifies gaps in assignment rights, and specifies inputs/outputs and target users (CHROs). It distinguishes itself from siblings like patent_landscape by focusing on ownership gaps rather than broader landscape analysis.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly targets CHROs for IP compliance and risk mitigation, and lists inputs and outputs. It does not explicitly contrast with alternatives like ip_employee_invention_tracker, but the context is clear enough for selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

payment_rails_cost_analyzerA

Read-onlyIdempotent

Inspect

As a CFO, compare cross-border payment rail costs (SWIFT, SEPA, local ACH, stablecoins) with FX conversion fees and settlement times. Input source/destination countries and amount, receive cost breakdown, FX rates, and settlement time estimates. Uses ECB FX rates and World Bank remittance price data for accurate cost analysis. Ideal for optimizing international payment strategies and reducing transaction expenses.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`amount`	Yes	Transaction amount in source currency
`source_country`	Yes	ISO 3166-1 alpha-2 country code of payment origin
`source_currency`	No	ISO 4217 currency code of source amount
`destination_country`	Yes	ISO 3166-1 alpha-2 country code of payment destination
`destination_currency`	No	ISO 4217 currency code of destination amount

Output Schema

ParametersJSON Schema

Name	Required	Description
`amount`	No
`status`	Yes
`fx_rate`	No
`sources`	No
`warnings`	No
`total_cost`	No
`source_country`	No
`settlement_time`	No
`source_currency`	No
`destination_country`	No
`destination_currency`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint. The description adds context about data sources (ECB FX rates, World Bank data) and outputs (cost breakdown, FX rates, settlement times), enhancing behavioral transparency without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (four sentences) and front-loads the primary purpose. No redundant information, every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists, description covers key aspects: inputs, outputs, data sources, and use case. Minor omission: async parameter not mentioned, but overall complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description adds limited extra meaning. It mentions 'source/destination countries and amount' but does not describe all parameters (e.g., async, currencies) beyond what schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it compares cross-border payment rail costs including FX fees and settlement times, with specific inputs and outputs. It distinguishes from sibling tools by focusing on a unique niche (payment rail cost analysis).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for CFOs optimizing international payments but does not explicitly specify when to use vs alternatives or when not to use it. No exclusions or alternative tools are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pentest_scope_estimatorB

Read-only

Inspect

Estimateur de scope pentest — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Answers: For a pentest on with assets, what is the effort and cost estimate? · How much should I budget for a web application + API penetration test for SOC 2 Type II compliance? · What is the standard engagement plan (PTES phases + deliverables) for a pentest? · Which engagement type (black-box/grey-box/white-box/red-team) is recommended for my context? · What are the prerequisites and risks for a pentest engagement on my cloud infrastructure? Reference case: Acme SaaS Inc — Fintech B2B EU · web-app + API REST · 12 microservices Node.js AWS · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`scope_type`	Yes
`tech_stack`	Yes
`asset_count`	No
`target_geos`	No
`engagement_type`	No
`retest_included`	No
`business_context`	Yes
`compliance_frameworks`	No

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark the tool as readOnly and openWorld, indicating it is safe and may use external data. The description adds that inputs are validated server-side and that the tool returns an 'audited deliverable,' aligning with annotations. No contradictions, but behavioral details beyond annotations are minimal.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is lengthy and includes a mix of French and English, a block of example questions, and a reference case. It is not front-loaded with the essential purpose and lacks clear structure, making it harder to parse quickly.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite having annotations and a detailed input schema, the description omits output format details (e.g., what the 'structured, audited deliverable' contains), error handling, rate limits, or authentication requirements. OpenWorld behavior is implied but not explained, leaving gaps for an agent to fully understand the tool's usage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema covers all 9 parameters with descriptions, resulting in 11% coverage for additional context. The description provides example usage for some parameters (scope_type, engagement_type, compliance_frameworks) but does not systematically explain each one, so added value beyond the schema is modest.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool estimates penetration testing scope, effort, and cost, providing a structured deliverable. It answers specific user questions and references a case study. The purpose is specific and distinguishes from sibling tools like 'attack_surface_monitor' or 'vuln_exploitability_forecast'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lists example questions that indicate when to use the tool (budgeting, engagement planning, compliance). However, it does not explicitly state when not to use it or provide alternatives among siblings, leaving the agent to infer usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pitch_deck_storylineA

Read-onlyIdempotent

Inspect

Build a complete investor pitch-deck storyline for a company. Returns an 8-20 slide narrative tailored to the target audience (seed-vc / series-a-vc / growth-vc / strategic / bank / grant) — each slide carrying a title, key points, a speaker note and a visual hint — plus a Q&A bank of 10-15 likely board questions and traps to avoid. Output is deck JSON ready to export to Google Slides, Notion or Pitch.com. When to use this tool: the user is preparing a fundraise, a board meeting, or an investor presentation. Inputs: the company profile and the target audience type. Delivered by Sarah, the AI Fundraising lead of the Gapup portfolio.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`audience`	Yes	Target audience — adapts tone + emphasis + Q&A bank
`keyFacts`	Yes	Hard facts to weave into the deck (traction numbers, milestones, awards)
`slideCount`	Yes	12 = standard VC deck, 15 = bank-friendly with annexes, 20 = growth/strategic

Output Schema

ParametersJSON Schema

Name	Required	Description
`kpis`	No	3-5 headline KPI bubbles surfaced from keyFacts
`slides`	Yes	8-20 slide objects ready to export to Google Slides / Notion / Pitch.com
`qaBanks`	Yes	10-15 anticipated investor questions with recommended answers
`recommendations`	No	Fundraising preparation actions
`executiveSummary`	Yes	One-paragraph elevator pitch distilled from the deck

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly and idempotent, so the tool is safe. The description adds behavioral details like output structure (slides with components, Q&A bank) and export readiness to Google Slides, Notion, or Pitch.com. However, it does not explain the async parameter's effect on return behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured, front-loading the core action and then detailing output and usage context. It is concise with only a minor extraneous branding sentence at the end.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, nested object, output schema), the description covers the main use case and output. It could mention async behavior or slideCount selection guidelines, but with schema and annotations, it is sufficient for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has high coverage (80%) with clear descriptions for parameters like audience and slideCount. The description reiterates main inputs (company profile and target audience) but adds little new semantic value beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool builds a complete investor pitch-deck storyline, specifying the output structure (8-20 slides with titles, key points, speaker notes, visual hints, and Q&A bank) and target audience types. It differentiates from sibling tools like deal_coach or investor_shortlist by focusing specifically on slide creation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool: when the user is preparing a fundraise, board meeting, or investor presentation. It provides clear context for invocation, though it does not mention when not to use or list alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

positioning_strategistC

Read-only

Inspect

Stratège de positionnement — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Gapup Hub vs Tableau/Pigment/Looker — Angle de différenciation + 5 piliers messaging + battle plan. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`market`	Yes
`company`	Yes
`product`	Yes
`aspirations`	No
`competitors`	Yes
`customerPains`	Yes
`currentWeaknesses`	No

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, so the description adds limited behavioral context. It mentions the tool returns a structured deliverable and is agent-payable, but this does not significantly extend beyond what annotations imply. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise (three sentences) and includes key output details, but it lacks structure such as bullet points or clear separation of sections. It front-loads the tool name but does not prioritize critical parameter guidance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, 8 parameters, no output schema), the description is incomplete. It does not explain the return format, data constraints, or how to handle optional fields, leaving significant gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%, and the description adds no parameter-level information beyond a vague directive to 'send the documented case fields.' With 8 parameters including nested objects, the description fails to explain their meanings or usage, making it hard for the agent to fill inputs correctly.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it generates a positioning strategy deliverable with differentiation angle, messaging pillars, and battle plan, referencing a concrete case (Gapup Hub vs Tableau/Pigment/Looker). However, it does not explicitly distinguish itself from sibling strategist tools like pricing_strategist or market_entry_strategist, leaving some ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It mentions a reference case but lacks explicit when-to-use or when-not-to-use instructions, leaving the agent to infer applicability.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

press_influencerC

Read-only

Inspect

Presse & influenceurs — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Agicap (levée Série C €70M) — CP + 12 contacts presse Tier-1 · plan de diffusion 14 jours. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`budget`	No
`company`	Yes
`targetMedia`	Yes
`announcement`	Yes
`targetAudience`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, openWorldHint) indicate no state modification and external data access. The description adds that it returns a structured, audited deliverable and references a case with contacts, which is consistent. However, it does not disclose payment requirements or response format beyond 'deliverable'.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short and includes a relevant reference case. It is front-loaded with the core purpose. However, the mix of French and English and the inclusion of a specific case example may slightly reduce clarity for some agents.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (nested objects, multiple required fields) and the lack of an output schema, the description is insufficient. It does not explain what the deliverable looks like, how to handle the async parameter, or what the budget field means. The tool seems to require more context to be used correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, and the description does not explain the parameters. It merely says 'send the documented case fields' without detailing the inputs. Most parameters (company, announcement, targetAudience, targetMedia) lack descriptions, leaving the agent uninformed.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool is about press and influencer relations ('Presse & influenceurs') for C-suite expertise and returns a structured deliverable. A reference case is provided, making the purpose clear, but it could be more explicit about the deliverable's contents.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is given on when to use this tool versus alternatives. The description only mentions server-side validation and to send documented case fields, but does not compare with sibling tools or specify prerequisites.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pricing_in_dealC

Read-only

Inspect

Pricing en Deal — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Agicap × Groupe Rocher — Deal €38k · stade négociation · contre-offre -30% · 3 scénarios pricing · ROI 12×. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`deal`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`redLines`	Yes
`negotiationContext`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark readOnlyHint=true. Description adds that it returns a deliverable and validates inputs server-side, which is useful but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is short but includes a verbose header and reference case. Could be more concise, but front-loaded with key info.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex nested schema (5 params, no output schema), the description provides insufficient context about return format or behavior beyond 'structured, audited deliverable.'

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 20%, and the description does not explain parameters. Only the async parameter has a description in schema; others rely on names alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a 'structured, audited deliverable' for pricing deals, with a concrete reference case. The purpose is clear but lacks differentiation from sibling tools like pricing_strategist or deal_coach.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. Only mentions input validation, not context or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

pricing_strategistC

Read-only

Inspect

Stratège de pricing — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: Vercel Pricing 2026 — 4 tiers + usage metering · 3 scenarios pricing chiffrés · ARPU +28% target. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`competitors`	Yes
`currentPricing`	Yes
`valueProposition`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint=true, indicating no state mutation, and openWorldHint=true. The description adds that inputs are validated server-side and it returns a deliverable, but does not elaborate on side effects, authentication, or rate limits. The description is consistent with annotations, adding modest value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief (three sentences) and front-loaded with the tool's role. The inclusion of a reference case adds context without excessive length. However, the marketing-like phrasing ('Gapup agent-payable C-suite expertise') could be trimmed for clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, 6 parameters, no output schema), the description is insufficient. It does not explain the return format, how to interpret the deliverable, or what the 'structured' output looks like. Critical details for proper usage are missing.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, meaning most parameters lack descriptions in the schema. The description does not compensate; it only generically mentions 'send the documented case fields' and lists a few required fields in the reference case. It adds no specific parameter semantics beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as a pricing strategist that returns a structured, audited deliverable. It gives a concrete reference case (Vercel Pricing 2026) illustrating its output. However, it does not explicitly differentiate from sibling tools like 'pricing_in_deal' or 'competitor_pricing_radar', leaving some ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It does not mention prerequisites, context, or exclusions. The only hint is 'Inputs are validated server-side', which is not usage guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

privacy_compliance_auditC

Read-only

Inspect

Audit conformité vie privée — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Lemlist SAS — SaaS outreach B2B, transferts UE→US Schrems II, RGPD + CCPA + LGPD + UK GDPR. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`presenterScript`	No
`targetFrameworks`	Yes
`processingActivities`	Yes

Tool Definition Quality

C2.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and the tool returns a deliverable, which is consistent. It does not contradict annotations and provides mild additional context about behavior, though it lacks detail on authentication or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is cluttered with marketing jargon ('Gapup agent-payable C-suite expertise (RISK)') and mixed languages, making it less concise. It could be more streamlined and front-loaded with essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested objects, no output schema), the description is insufficient. It does not describe the return value format or any post-invocation behavior, leaving the agent to guess the output structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, so the description should compensate, but it merely says 'send the documented case fields' without explaining any parameter meanings. It adds no value beyond the schema's properties.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a privacy compliance audit that returns a structured deliverable. It distinguishes from siblings by hinting at a specific reference case (Lemlist SAS) and mentioning multiple frameworks (RGPD, CCPA, etc.), though it doesn't explicitly compare with other audit tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no explicit guidance on when to use this tool versus alternative tools in the sibling list. It mentions a reference case but does not clarify scenarios where this audit is appropriate or inappropriate.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

process_mappingC

Read-only

Inspect

Mapping des process opérationnels — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Decathlon France — process Retour produit en magasin · 1700 magasins · 200 retours/j/magasin · -30 à -50% temps cible. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`processes`	Yes
`presenterScript`	No

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds 'Returns a structured, audited deliverable' which aligns but does not disclose additional behavioral traits such as potential long runtimes (despite the async parameter in schema) or side effects. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is one paragraph that mixes French and English and includes an example with specific numbers. It is moderately concise but could be trimmed by removing extraneous details (like the full reference case) to focus on core functionality. Front-loads the main phrase but loses clarity with additional text.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, 5 parameters, no output schema), the description is incomplete. It does not describe the return format, any constraints on inputs beyond validation, or the structure of the deliverable. The async parameter is present in schema but ignored in description, leaving gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 20% schema description coverage (only async has a description), the description fails to explain the semantic meaning of parameters like 'focus', 'company', 'processes', or 'presenterScript'. It only says 'send the documented case fields', which is vague. The description does not compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool maps operational processes and returns a structured, audited deliverable, with a specific reference case (Decathlon product return process). This gives a clear verb+resource, but the mix of French and English and the term 'Gapup' may obscure the purpose for non-French speakers. It distinguishes from sibling 'process_mining' but not explicitly.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives. It only mentions server-side validation and to send documented case fields, but does not specify prerequisites, when-not-to-use, or refer to sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

process_miningC

Read-only

Inspect

Mining des process — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Gapup Hub — 4 process · €320k gaspillage identifié · 3 quick wins · 5 automations. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`objectives`	Yes
`companyName`	Yes
`mainSystems`	Yes
`topProcesses`	Yes
`employeeCount`	Yes
`revenueLostEstimateEur`	No

Tool Definition Quality

C2.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, and the description aligns by stating it returns a deliverable. The description adds minimal behavioral context beyond annotations, such as no mention of rate limits or error handling.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short but includes confusing jargon and a reference case that may not be universally relevant. Mixed languages reduce clarity. Could be more concise and better structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 7 parameters, no output schema, and no parameter descriptions, the description is severely lacking. It does not explain the deliverable structure, processing time, or any critical constraints, making it insufficient for correct tool invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 14%, and the description adds no parameter explanations. It merely says 'send the documented case fields' without defining them, leaving the agent to infer from names and schema types.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description suggests a process mining tool that returns a structured deliverable with waste identification and automation opportunities, but uses mixed languages and ambiguous references ('Gapup agent-payable C-suite expertise'). It does not clearly differentiate from siblings like 'process_mapping' or 'procurement_six_sigma_waste_hunter'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. Lacks context on prerequisites or exclusions. Only mentions server-side validation, which is not a usage guideline.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

procurement_okr_esg_alignerA

Read-onlyIdempotent

Inspect

Aligns procurement OKRs with ESG targets for COOs using GRI standards and EU TED procurement benchmarks. Inputs include procurement objectives and ESG focus areas (e.g., carbon reduction, supplier diversity). Outputs structured alignment scores, gap analysis, and actionable recommendations. Essential for COOs integrating sustainability into procurement strategy. Keywords: procurement, ESG, GRI, EU TED, OKR alignment, sustainability metrics.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`esgFocusAreas`	Yes
`industrySector`	No
`procurementObjectives`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`alignmentScores`	No
`recommendations`	No
`benchmarkComparison`	No
`overallAlignmentScore`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and openWorldHint. The description adds behavioral context: it uses GRI standards and EU TED benchmarks, and outputs structured alignment scores, gap analysis, and actionable recommendations. This is valuable beyond annotations, though it doesn't detail performance or side effects (unnecessary for read-only).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences: first states purpose and method, second lists inputs and outputs, third states use case and keywords. No fluff, all sentences contribute value. Front-loaded with the core purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 4 parameters (including nested objects), output schema exists, and annotations are rich (readOnly, idempotent, openWorld), the description is complete enough. It covers purpose, inputs, outputs, standards, and target audience. The output schema handles return details, and annotations cover behavior. No gaps are apparent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (25%), so the description should compensate. It mentions 'procurement objectives and ESG focus areas' but does not explain the structure of the procurementObjectives object or the async and industrySector parameters. The enum values for esgFocusAreas are partly echoed, but no new meaning is added. The mention of GRI and EU TED provides some context not in the schema, but overall it's insufficient to fully bridge the coverage gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Aligns procurement OKRs with ESG targets for COOs using GRI standards and EU TED procurement benchmarks.' It specifies the verb (aligns), resources (procurement OKRs and ESG targets), audience (COOs), and methodology (GRI, EU TED). This distinguishes it from siblings like 'supplier_esg_audit' and 'esg_audit_multi' by focusing on procurement OKR alignment rather than supplier-level audit or broad ESG assessment.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description states it is 'Essential for COOs integrating sustainability into procurement strategy,' providing clear usage context. However, it does not explicitly compare to sibling tools or state when not to use it. The sibling list includes many ESG tools, but no exclusion guidance is given, lowering slightly from a 5.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

procurement_six_sigma_waste_hunterA

Read-onlyIdempotent

Inspect

Analyzes procurement waste for COOs using Six Sigma DMAIC framework and EU TED tender data. Identifies non-value-added activities, overprocessing, and inefficiencies in procurement workflows. Inputs include procurement category, time period, and organizational unit. Outputs waste classification, cost impact estimates, and process improvement recommendations. — pass async:true REQUIRED to avoid x402 timeout.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`time_period`	Yes	Time period for analysis (e.g., '2023-01-01/2023-12-31')
`six_sigma_tool`	No		DMAIC
`include_ted_data`	No
`organizational_unit`	No	Specific business unit or department (e.g., 'EMEA', 'Global Operations')
`procurement_category`	Yes	Specific procurement category to analyze (e.g., 'IT hardware', 'facilities')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`ted_data_coverage`	No
`cost_impact_estimate`	No
`waste_classification`	No
`process_improvement_recommendations`	No

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, idempotent, and openWorld behavior. The description adds the critical note about passing async:true to avoid x402 timeouts, but does not disclose other behavioral aspects like potential data latency from EU TED or authentication requirements.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with three sentences plus a separate note. It front-loads the primary purpose and covers key elements without unnecessary fluff. The async note could be integrated more seamlessly, but overall structure is efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and the existence of an output schema, the description adequately covers purpose, inputs, outputs, and the Six Sigma framework. It mentions waste classification, cost impact, and recommendations. However, it could briefly explain DMAIC phases or data source limitations for completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 67% (4/6 parameters described). The description repeats input names (procurement category, time period, organizational unit) but adds limited new semantic meaning beyond the schema. It emphasizes the async requirement, which is already documented in the schema parameter description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool's purpose: analyzing procurement waste for COOs using Six Sigma DMAIC and EU TED data. It specifies the framework, target users, and outputs (waste classification, cost impact, recommendations), differentiating it from sibling tools like procurement_spend_optim.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description lacks explicit guidance on when to use this tool versus alternatives. While it implies suitability for procurement waste analysis, it does not mention when not to use it or contrast with similar sibling tools such as procurement_okr_esg_aligner or procurement_spend_optim.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

procurement_spend_optimB

Read-only

Inspect

Optimisation des achats / Spend strategy — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Tech SaaS €60M ARR — 200 fournisseurs analysés · 20 leviers chiffrés · -€2.4M opex/an target. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`topSuppliers`	Yes
`spendCategories`	Yes

Tool Definition Quality

B3.4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, openWorldHint) are present but description adds minimal behavioral details beyond 'returns a structured, audited deliverable' and 'inputs validated server-side'. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is relatively concise and front-loaded with purpose, but includes a somewhat lengthy reference case example that could be shortened without losing essential meaning.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, nested objects, no output schema), the description is incomplete. It lacks details on how to use parameters, what the deliverable contains, and expected output format. The reference case provides a hint but is insufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only the async parameter has a description). The tool description does not compensate for the many undocumented parameters (company, topSuppliers, spendCategories, focus). It only says 'send the documented case fields' without explaining parameter usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool's purpose: procurement spend optimization, returning a structured deliverable for CFO-level expertise. It distinguishes from sibling tools like procurement_okr_esg_aligner or procurement_six_sigma_waste_hunter by focusing on spend strategy.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description implies use for CFO-level spend strategy but does not explicitly state when to use this tool vs alternatives or provide exclusion criteria. The reference case offers context but no direct guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

programmatic_attribution_calibratorA

Read-onlyIdempotent

Inspect

For ad_revenue_ops persona: calibrates marketing mix models (MMM) by ingesting OpenRTB impression-level data from FreeWheel Marketplace and other programmatic sources. Accepts model parameters, date ranges, and impression IDs as input, returning structured calibration metrics and attribution adjustments. Useful for improving model accuracy with real-time bidding data and validating revenue attribution across programmatic channels.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`endDate`	Yes	End date for impression data (ISO 8601)
`modelId`	Yes	Identifier of the MMM model to calibrate
`startDate`	Yes	Start date for impression data (ISO 8601)
`impressionIds`	No	List of OpenRTB impression IDs to include in calibration
`confidenceThreshold`	No	Confidence threshold for calibration metrics

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`calibrationMetrics`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and idempotentHint, so the description does not need to repeat that. The description adds context about the data sources and the output being metrics and adjustments, which aligns with read-only behavior. No contradictions are present. The description earns a high score for supplementing annotations without redundancy.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no wasted words. It front-loads the core action and then states the value proposition. It is appropriately sized for the tool's complexity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers the main purpose, inputs, and outputs. An output schema exists, so return values are not required. However, it omits mention of the async parameter and polling behavior, which could be important for usage. Overall, it is sufficiently complete for a calibration tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description reiterates the types of parameters (model parameters, date ranges, impression IDs) but does not add new meaning beyond what the schema already provides. It does not clarify the meaning of the 'confidenceThreshold' or 'async' parameters further. Hence, a score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the persona (ad_revenue_ops), the specific action (calibrates MMM), data sources (OpenRTB from FreeWheel Marketplace), and outputs (structured calibration metrics and attribution adjustments). It distinguishes itself from siblings by its focused purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes a useful statement about improving model accuracy and validating revenue attribution, which implies when to use it. However, it does not explicitly say when not to use it or mention alternative tools among the many siblings. More guidance on alternatives or prerequisites would improve clarity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

programmatic_brand_safety_auditorA

Read-onlyIdempotent

Inspect

Evaluates programmatic ad inventory for brand safety risks using IAB Tech Lab's standards and GDPR-compliant tracking methods. Designed for ad revenue operations teams to assess inventory quality before bidding. Inputs include domain, page URL, and optional contextual signals. Outputs a structured brand safety score with risk categorization and compliance warnings.

ParametersJSON Schema

Name	Required	Description
`url`	Yes	Full page URL being evaluated
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`domain`	Yes	Root domain of the inventory (e.g., 'example.com')
`categories`	No	Optional IAB content categories for contextual analysis
`gdprConsent`	No	GDPR consent string (TCF v2.0)

Output Schema

ParametersJSON Schema

Name	Required	Description
`flags`	No
`score`	No	Brand safety score (0-100)
`status`	Yes
`sources`	No
`warnings`	No
`riskLevel`	No
`gdprCompliant`	No

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, indicating safe, idempotent operation. The description adds that output is a structured brand safety score with risk categorization and compliance warnings, but does not disclose behavioral traits like the async option or performance characteristics. The description adds moderate value beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences, front-loading the core purpose and immediately following with users and inputs. No wasted words, and every sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema and rich parameter schema, the description adequately covers purpose, users, inputs, and output nature. However, it does not mention the async parameter, though it is detailed in the schema. The description is complete enough for a well-documented tool with strong annotations and schema coverage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear parameter descriptions. The description adds limited value by grouping parameters as 'domain, page URL, and optional contextual signals', but does not enhance understanding beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool evaluates programmatic ad inventory for brand safety using IAB Tech Lab standards and GDPR-compliant tracking. It specifies target users (ad revenue operations teams) and inputs (domain, page URL, optional contextual signals). This distinguishes it from siblings like 'ugc_moderation_classifier', which focuses on user-generated content.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states the tool is designed for pre-bid inventory quality assessment by ad revenue operations teams, providing clear when-to-use context. It does not explicitly list when not to use or alternatives, but the specificity implies distinct usage from related tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

proposal_generatorC

Read-only

Inspect

Générateur de propositions commerciales — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Spendesk × Gapup Hub — Proposition 7 sections · ROI 3Y €1.8M · Payback 4 mois. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`offer`	Yes
`company`	Yes
`prospect`	Yes
`dealContext`	No

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint and openWorldHint, but the description adds server-side validation and a case study. It does not mention the async parameter or that the tool can return a job_id, and the readOnlyHint might be inconsistent with generation (though likely read-only).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences but includes a reference case that may be unnecessary. It could be more concise by focusing on essential usage guidance.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (5 nested parameters, no output schema), the description omits key details like return format, async behavior, and parameter mapping. It is insufficient for complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema coverage is only 20% (only async has description). The description does not explain any of the 5 parameters or their nested fields, leaving significant ambiguity for the agent.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Générateur de propositions commerciales' (business proposal generator) and specifies a structured, audited deliverable. However, it does not differentiate from sibling tools like battle_plan or deal_coach.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. It simply instructs to send case fields without context on prerequisites or when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

qa_pre_flightC

Read-only

Inspect

Préparation Q&A investisseurs — Gapup agent-payable C-suite expertise (FUNDRAISING). Returns a structured, audited deliverable. Reference case: Agicap Série C €70M — 30 Q&A stratégiques · 8 questions pièges · Plan de préparation 21 jours. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`round`	Yes
`company`	Yes
`founderContext`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, indicating no mutation. The description adds that inputs are validated server-side and that the tool can return a job_id via the async parameter for slow execution, which is useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise with 4 sentences, front-loading the purpose. The reference case example adds context but could be shorter. Overall efficient but not perfectly crisp.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, 4 parameters, no output schema), the description lacks details about the deliverable structure, input formatting expectations, and how to use the async feature properly. It is not sufficiently complete for an unfamiliar user.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25%, and the description does not explain individual parameters or their structured fields. It merely says 'send the documented case fields', adding no semantic meaning beyond the schema's bare property descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool provides 'Préparation Q&A investisseurs' (investor Q&A preparation) and returns a structured deliverable, clearly indicating the domain and output. However, the verb is implicit ('Préparation') and the description mixes languages, reducing clarity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The sibling list includes many fundraising-related tools, but the description does not differentiate usage contexts or provide when-not-to-use conditions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

qbr_autoB

Read-only

Inspect

QBR automatique CSM — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub × Alan — QBR Q1 2026 · Health score 82/100 · Upsell €18k détecté · Renewal low risk. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`wins`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`period`	Yes
`company`	Yes
`metrics`	Yes
`customer`	Yes
`challenges`	Yes
`nextQuarterGoals`	Yes

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and the tool returns a deliverable, which provides some behavioral context beyond annotations. However, it does not disclose auth needs, rate limits, or potential side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (two sentences plus a reference example) with no wasted words. It front-loads the core action. However, it could benefit from a more structured breakdown of inputs and outputs.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With 8 parameters (7 required), nested objects, no output schema, and only a 13% schema description coverage, the description fails to provide sufficient context. Important aspects like the async parameter (in schema) and deliverable structure are omitted, leaving the agent underinformed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%, but the description does not explain any parameters or their intended use. It only says 'send the documented case fields', which adds no value. For a tool with 8 complex parameters (including nested objects), this is a critical gap.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates an automatic CSM QBR deliverable, with a concrete reference case (Gapup Hub × Alan). The verb 'returns' and resource 'structured, audited deliverable' are specific, and the tool name 'qbr_auto' aligns well with its purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus other tools, nor does it mention alternatives or exclusions. The reference case implies a use case but lacks explicit when/when-not instructions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

real_estate_intelA

Read-onlyIdempotent

Inspect

Real estate intelligence aggregator with a best-in-class French dataset (DVF — Demandes de Valeurs Foncières — 100% of FR transactions since 2019, public, keyless) plus UK Land Registry Price Paid (all UK transactions 1995+). Four modes: (1) property — full transaction history for a specific address; (2) comparables — median/std price/m² within a radius (default 500m); (3) market — annual price series, YoY change, volume, trend by commune; (4) valuation — two-method estimate (comparables median + hedonic regression if n≥30) with confidence scoring (high/medium/low). All sources are free and require no API key. ICP: PropTech agents, REITs, fund managers, family offices, insurance. SLA: ≤25s p95 (sources fetched in parallel, 8s budget each). Cache: 24h TTL (DVF data is stable). Quality score: 30 pts DVF retrieved, 20 pts geocoding, 20 pts UK LR retrieved, 15 pts if comparables count ≥10, 15 pts if method quality achieved. Status: failed/<60/≥60 → failed/partial/final. No env vars required.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	property: transactions at an address \| comparables: sample around a point \| market: commune/neighbourhood market stats \| valuation: price estimate for a given surface
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`date_to`	No	ISO date YYYY-MM-DD — latest transaction date
`location`	Yes	Location descriptor. One of: {address, city?, country?} \| {lat, lon, radius_m?} \| {insee_code} for FR communes.
`date_from`	No	ISO date YYYY-MM-DD — earliest transaction date
`max_results`	No	Maximum number of results to return (5–50, default 20)
`surface_max`	No	Maximum surface in m² (±20% tolerance applied for comparables)
`surface_min`	No	Minimum surface in m² (±20% tolerance applied for comparables)
`property_type`	No	Filter by property type (default: all)

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`market`	No	mode=market — commune-level market stats
`status`	Yes
`sources`	Yes
`property`	No	mode=property — transactions at the location
`valuation`	No	mode=valuation — price estimate
`comparables`	No	mode=comparables — aggregated comp stats
`quality_score`	Yes
`location_resolved`	Yes

Tool Definition Quality

A4.3/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnly, idempotent, non-destructive), the description details SLA (≤25s p95, parallel fetches with 8s budget), cache TTL (24h), quality scoring (30+20+20+15+15 points), and status values (failed/partial/final). This provides rich behavioral context that annotations alone do not convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is comprehensive but somewhat verbose, mixing purpose, usage, and technical details (SLA, cache, quality scoring) in a single block. It is front-loaded with the core purpose but could be tightened by separating operational details into distinct sections or trimming redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (9 parameters, nested objects, four modes), the description covers data sources, mode behaviors, performance, caching, quality scoring, and status. An output schema exists, so return values are handled. No significant gaps remain for an agent to understand and use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema already describes all 9 parameters with 100% coverage, so the description adds no parameter-specific semantics. It does provide high-level context for the four modes, but the parameters themselves are adequately documented in the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a real estate intelligence aggregator with specific datasets (DVF and UK Land Registry) and explicitly lists four distinct modes (property, comparables, market, valuation). This specificity distinguishes it from sibling tools, which cover diverse domains like security, finance, HR, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicates when to use each mode: property for transaction history, comparables for radius-based statistics, market for commune trends, valuation for price estimates. It also mentions target users (PropTech agents, REITs, etc.) and key characteristics (free, no API key). However, it does not explicitly contrast with alternatives or state when not to use the tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

realtime_data_streamsA

Read-only

Inspect

High-frequency real-time market data for trading agents, market-making bots and fintech analysts. Returns FX ticks (bid/ask/spread), intraday OHLCV candles, crypto orderbook snapshots (depth 5-50), recent trades with VWAP, and sovereign bond yields. All sources are keyless public REST APIs (Binance, Coinbase, Kraken, OKX, open FX feeds, worldgovernmentbonds.com). Ultra-short cache: 10s for ticks/trades, 60s for orderbook. Use when an agent needs live market data as precise numeric inputs for trading logic, arbitrage detection, or portfolio valuation.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Data stream type: fx_tick (latest FX bid/ask/mid/spread), fx_history_intraday (OHLCV candles), crypto_orderbook (order book snapshot), crypto_trades_recent (last 50 trades + VWAP), bond_yields (sovereign yield %)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`depth`	No	Orderbook depth (levels each side) for crypto_orderbook mode (default: 20)
`period`	No	Candle period for fx_history_intraday mode (default: 5m)
`symbol`	Yes	Market symbol. FX: EURUSD, GBPUSD, USDJPY. Crypto: BTCUSDT, ETHUSDT, BTC-USD. Bonds: US10Y, US2Y, DE10Y, FR10Y, UK10Y, JP10Y, IT10Y

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`symbol`	Yes
`fx_tick`	No
`sources`	Yes
`fx_history`	No
`bond_yields`	No
`crypto_trades`	No
`quality_score`	Yes
`crypto_orderbook`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds unique behavioral context: ultra-short cache times (10s for ticks/trades, 60s for orderbook), keyless public REST APIs from named exchanges, and the fact that it returns specific data structures. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is compact (4 sentences) and front-loaded with purpose. Every sentence adds essential information: data types, sources, cache behavior, and usage case. No repetition or filler.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 modes, multiple data types) and the presence of an output schema, the description adequately covers what the tool does, when to use it, key parameters, and behavioral traits (caching). The output schema handles return values.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the description does not need to document parameters from scratch. However, it adds some value by grouping symbol examples (FX: EURUSD, crypto: BTCUSDT, bonds: US10Y) and clarifying the purpose of each mode beyond what the schema enum provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides 'high-frequency real-time market data' and enumerates specific data types (FX ticks, OHLCV candles, crypto orderbook, trades, bond yields), using strong verbs like 'returns'. It distinguishes itself from siblings by focusing on live/real-time data vs historical or rate tools like historical_price_series and fx_rate.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use when an agent needs live market data as precise numeric inputs for trading logic, arbitrage detection, or portfolio valuation.' It also mentions keyless public sources and cache durations, but does not explicitly tell when NOT to use or name specific sibling alternatives (e.g., 'For historical data, use historical_price_series').

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

recruiting_architectA

Read-only

Inspect

Architecte du recrutement — Gapup agent-payable C-suite expertise (CHRO). Returns a structured, audited deliverable. Reference case: Stripe France — 12 postes Q3 2026 · sourcing multi-canaux + employer brand + frameworks d'entretien + parcours candidat · time-to-hire -45%. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`roles`	Yes
`budget`	Yes
`company`	Yes
`preferences`	Yes

Tool Definition Quality

A3.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The annotations already declare readOnlyHint=true and openWorldHint=true, so the description's role is lighter. The description adds value by noting 'Inputs are validated server-side' and that output is a structured, audited deliverable, which supplements the behavioral context beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is front-loaded with purpose but includes a lengthy reference case that may not be essential for all agents. It is somewhat verbose, though not excessively long. A more concise version could omit the example without losing core meaning.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested objects, no output schema), the description is insufficiently complete. It does not explain the deliverable's structure, how to interpret results, or what happens on server-side validation failure. Critical context for an AI agent to use the tool correctly is missing.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, yet the description does not elaborate on parameter meanings. It merely says 'send the documented case fields', which adds no semantic detail. The complex nested schema (company, roles, budget, preferences) remains largely unexplained, leaving the agent to infer from field names alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states the tool is for recruitment architecture ('Architecte du recrutement'), targets C-suite CHRO expertise, and provides a concrete reference case (Stripe France). It clearly distinguishes itself from sibling tools, which span diverse domains like sales, finance, and legal, by focusing on a specific HR function.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for recruitment planning and deliverable generation ('Returns a structured, audited deliverable'), but it does not explicitly state when to use this tool over alternatives like candidate_screening_ranking or talent_intelligence. No when-not-to-use or alternative tool guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

re_deal_screenerA

Read-only

Inspect

Screener deal immobilier (EU) — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Answers: Screen this real estate deal: , , asking € — give me cap rate vs market, location score, risk flags, and deal recommendation. · Should I pursue this hotel investment at for € with keys? Run an EU deal screener with DVF comparables and Géorisques risk data. · What is the real estate market valuation for a at based on recent French DVF transactions? · Run a due diligence deal screen on this property: , €, sqm — flood risk, cap rate, price vs comparables. · Evaluate this commercial real estate deal for an investment committee: at , €, NOI €. Reference case: Hôtel boutique 45 keys · 12 rue de la Paix 75002 Paris · €12.5M · €277k/key · comp DVF €250-380k/key · location 92/100 · score 72 · pursue-with-conditions. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`address`	Yes
`deal_type`	Yes
`country_iso2`	Yes		FR
`units_or_keys`	No
`gross_area_sqm`	No
`current_noi_eur`	No
`asking_price_eur`	Yes
`investment_thesis`	No

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint: true, so the description's behavioral information is additive but not critical. It mentions that returns are structured and audited, and that inputs are validated server-side, which provides context. However, it does not discuss rate limits, authentication, or any potential side effects beyond the read-only nature.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose and contains multiple repetitive example queries and a reference case. While it front-loads the tool's purpose, it could be more concise and structured. The inclusion of examples adds value but at the cost of brevity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (9 parameters, no output schema), the description provides a good overview with concrete examples covering main use cases. However, it lacks a formal description of return value structure, which is partly compensated by the examples. It is adequate but not exhaustive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 11% (only 'async' has a description), so the description carries the burden. It partially compensates through examples that illustrate usage of address, deal_type, price, units_or_keys, gross_area_sqm, and current_noi_eur. However, not all parameters (e.g., country_iso2, investment_thesis) are explained, and the mapping from examples to schema parameters is implicit.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a 'Screener deal immobilier (EU)' and provides multiple example queries that illustrate the tool's specific function: screening real estate deals by location, type, and price to return cap rate, location score, risk flags, and recommendation. It distinguishes itself from sibling tools like 'ma_deal_screener' by its focus on real estate, not M&A.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description includes example use cases but does not explicitly state when to use this tool versus alternatives, nor does it provide guidance on prerequisites or scenarios where it should not be used. The 'Gapup agent-payable C-suite expertise (CFO)' phrase is vague and offers little actionable direction.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

renewal_optimizerC

Read-only

Inspect

Optimiseur de renouvellements — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Renewals 10 comptes · €89k ARR à 90j · 3 comptes at-risk · Playbook 6 scénarios. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`horizon`	No
`product`	Yes
`accounts`	Yes
`targetRenewalRatePct`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true and openWorldHint=true, consistent with returning a deliverable. The description adds that inputs are validated server-side, but does not detail other behavioral traits like performance, error handling, or output structure beyond 'structured, audited deliverable'.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise at about 50 words, but includes a reference case that may be unnecessary for the agent. It is front-loaded with the name but could be more streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (6 parameters, nested objects) and lack of output schema, the description omits important details about the deliverable contents, possible errors, and performance. It only mentions server-side validation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17%, with only the async parameter described. The description does not explain the meaning of key parameters like accounts, horizon, or targetRenewalRatePct, leaving the agent to infer from schema constraints alone.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is a renewal optimizer that returns a structured, audited deliverable, and includes a reference case. However, it does not distinguish itself from sibling tools that may overlap, such as churn_defender or upsell_hunter.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The only usage hint is 'send the documented case fields', which is about input format, not contextual selection.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

repo_rate_arbitrage_scannerA

Read-onlyIdempotent

Inspect

Scans for arbitrage opportunities between repo rates (ECB) and short-term funding markets (Treasury Direct). Designed for CFOs to identify cost-effective funding strategies. Inputs include optional date ranges and currency filters. Outputs structured arbitrage opportunities with rate differentials and confidence scores.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`endDate`	No
`currency`	No
`startDate`	No
`minDifferential`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`opportunities`	No

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already set readOnlyHint, openWorldHint, and idempotentHint. The description adds that it outputs 'structured arbitrage opportunities with rate differentials and confidence scores,' which is helpful but does not disclose other behaviors like rate limits, authentication needs, or potential variability in results. With strong annotations, the description adds marginal value.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences with no wasted words. It first states the core function, then the target user and goal, then lists inputs and outputs. Each sentence earns its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity with optional filters and an output schema (not detailed here but exists), the description covers the essential aspects. It mentions outputs (arbitrage opportunities, rate differentials, confidence scores). It does not discuss pagination or result size, but with readOnly and openWorld hints, this is acceptable. Slightly above adequate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only async parameter described). The description mentions 'optional date ranges and currency filters,' adding meaning to startDate, endDate, and currency beyond the schema. However, it does not cover the minDifferential parameter or the async parameter beyond what schema states. Given low coverage, the description partially compensates.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool 'Scans for arbitrage opportunities between repo rates (ECB) and short-term funding markets (Treasury Direct)' with a specific verb and resource. It also identifies the target user (CFOs) and distinguishes it from siblings like 'tariff_arbitrage_finder' which targets different markets.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear context by stating it is 'Designed for CFOs to identify cost-effective funding strategies,' implying when it should be used. However, it does not explicitly mention when not to use it or provide direct alternatives, leaving some ambiguity against closely related tools like 'treasury_optimizer' or 'working_capital'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

reputation_engineC

Read-only

Inspect

Moteur de réputation — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Reference case: PayShield SaaS — Monitoring réputation Q2 2026. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`brand`	Yes
`channels`	Yes
`industry`	Yes
`keywords`	Yes
`historicalCrises`	No

Tool Definition Quality

C2.1/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true. The description adds only that inputs are validated server-side and a deliverable is returned. It fails to describe behavioral traits like execution time, cost implications, or what 'audited' means. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is not concise; it includes unnecessary jargon and a reference case that does not aid immediate understanding. Key information is buried, and the structure lacks front-loading of critical details. Every sentence does not earn its place.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, no output schema), the description is insufficient. It does not explain the deliverable's structure, how parameters affect results, or what the output looks like. Agents lack enough context to confidently invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 6 parameters with only 17% schema description coverage, yet the description does not explain any parameter meaning or usage. It merely says 'send the documented case fields' without clarifying which fields or their roles, providing no additional value over the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description indicates a reputation engine that returns a structured deliverable, but it uses jargon ('Gapup agent-payable C-suite expertise (CMO)') and does not clearly state the core action. The purpose is vaguely implied (reputation monitoring) but not explicitly defined, making it ambiguous compared to sibling tools like 'brand_builder' or 'sentiment_news_pulse'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternatives. The reference case (PayShield SaaS) serves as an example but does not explain selection criteria, prerequisites, or exclusions. There is no mention of when not to use it or how it differs from similar tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

research_paper_qaB

Read-only

Inspect

Synthèse littérature scientifique (PaperQA2) — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Answers: Conduct a literature review on — what does the evidence show across recent papers? · Evaluate the current hypothesis that — supporting and contradicting evidence with citations. · Map contradictions in the literature on — which camps exist, how many papers per side? · What is the state-of-the-art understanding of as of ? · Perform an interdisciplinary synthesis on — findings from and . Reference case: Gut-brain axis · Cognitive performance in healthy adults · OpenAlex+SemanticScholar+CORE · Evidence synthesis · DOI-verified citations · Contradictions + gaps mapped. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`max_papers`	Yes
`year_range`	No
`focus_domain`	Yes		all
`include_preprints`	Yes
`research_question`	Yes
`evidence_grade_required`	Yes		standard

Tool Definition Quality

B3.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds valuable context: server-side validation, async capability (via 'async' parameter), sources used (OpenAlex, Semantic Scholar, CORE), and output features (DOI-verified citations, contradiction mapping). This goes beyond the annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is overly verbose with a mix of English and French, includes a reference case ('Gut-brain axis') and internal notes (e.g., 'Gapup agent-payable C-suite expertise'). While front-loaded with purpose, the list of example questions and extra content makes it unstructured and less concise for quick agent interpretation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, 5 required, nested objects, no output schema), the description lacks sufficient detail for correct invocation. It does not explain the output format beyond 'structured audited deliverable', nor provide guidance on using async polling. Without an output schema, agents need a clearer description of what to expect.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (14%). The description provides some context for parameters like research_question (via examples), focus_domain (mentioned), and max_papers (default stated). However, it fails to explain key enums like evidence_grade_required (strict/standard/broad), year_range format, or meaning of 'all' domain. This is insufficient to compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs scientific literature synthesis (PaperQA2) and lists specific use cases like literature review, hypothesis evaluation, contradiction mapping, and state-of-the-art synthesis. However, it does not explicitly differentiate from the sibling tool 'sci_literature_search', which likely offers a simpler paper search.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides example questions that imply when to use the tool (e.g., for evidence synthesis, hypothesis testing). However, it lacks explicit guidance on when not to use it or mention of alternatives, leaving the agent to infer usage context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

retail_media_attribution_bridgeA

Read-onlyIdempotent

Inspect

Provides unified attribution insights for retail media and programmatic campaigns by analyzing MMM signals from FreeWheel Marketplace and Common Crawl. Designed for ad revenue operations teams to bridge cross-channel performance gaps. Accepts campaign IDs, date ranges, and channel filters as input. Returns structured attribution data with source provenance and confidence scores.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`endDate`	Yes	End date for attribution window (YYYY-MM-DD)
`channels`	No	Channels to include in analysis
`startDate`	Yes	Start date for attribution window (YYYY-MM-DD)
`campaignIds`	Yes	List of campaign identifiers to analyze
`confidenceThreshold`	No	Minimum confidence score for included signals

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`attribution`	No

Tool Definition Quality

A3.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint) already indicate safe reads; the description adds data sources and output framing. However, it fails to mention async behavior despite the async parameter in the schema, which is a notable omission for behavioral understanding.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loading purpose and summarizing inputs/outputs without waste. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, target users, inputs, outputs, and data sources, which is quite complete given the tool has an output schema. However, it misses mentioning async behavior, a minor gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema already documents all parameters. The description lists campaign IDs, date ranges, and channel filters but does not add meaning beyond the schema, meeting the baseline of 3.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides unified attribution insights for retail media and programmatic campaigns, naming specific data sources (FreeWheel Marketplace, Common Crawl) and target users (ad revenue operations teams). It distinguishes itself from siblings like programmatic_attribution_calibrator and retail_media_esg_compliance.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for cross-channel attribution analysis but does not explicitly state when not to use it or mention alternatives. Given many sibling tools in the same domain, this is a gap, though the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

retail_media_esg_complianceA

Read-onlyIdempotent

Inspect

Audits retail media networks for ESG compliance by analyzing ad placements, tracking cookies, and verifying ethical advertising standards. Designed for ad_revenue_ops teams to ensure GDPR and sustainability compliance across digital retail platforms. Accepts domain lists or network identifiers as input and returns structured compliance reports with warnings and source references. Requires async:true to avoid timeout errors.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`domains`	No	List of retail media network domains to audit
`checkESG`	No	Enable ESG advertising standards compliance check
`checkGDPR`	No	Enable GDPR cookie tracking compliance check
`networkIds`	No	List of retail media network identifiers

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`summary`	No
`warnings`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses key behavioral traits: the tool is read-only (consistent with readOnlyHint=true), returns structured reports with warnings and source references, and crucially requires async:true to avoid timeouts. This adds value beyond annotations, although it does not detail side effects (none expected due to idempotentHint).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is four sentences with no redundancy. It front-loads the purpose, then adds audience, inputs/outputs overview, and the critical async note. Every sentence adds value, and the structure is optimal for quick understanding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (5 optional params, output schema exists, rich annotations), the description covers all essential aspects: purpose, target audience, input types, output summary, and a key usage requirement. The output schema exists, so detailed return format explanation is unnecessary.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with all parameters having descriptions. The description briefly mentions accepting 'domain lists or network identifiers' which maps to domains and networkIds, but does not add new meaning beyond the schema. The boolean parameters are self-explanatory from schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool audits retail media networks for ESG compliance by analyzing ad placements, tracking cookies, and verifying ethical advertising standards. It specifies the exact resource (retail media networks) and action (audits), distinguishing it from sibling tools like esg_audit_multi which cover broader ESG audits.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description identifies the target audience (ad_revenue_ops teams) and provides a critical usage note about requiring async:true to avoid timeout. However, it does not explicitly mention when to use this tool over alternatives like esg_audit_multi or supplier_esg_audit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

revops_architectB

Read-only

Inspect

Architecte RevOps — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Qonto — ARR €200M · 200 reps · forecast ±35% · fuite €4,2M/an identifiée · plan RevOps 12 semaines. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`keyMetrics`	Yes
`objectives`	Yes
`revenueTeam`	Yes
`currentStack`	Yes
`horizonMonths`	Yes
`currentPainPoints`	Yes

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, making the tool's behavioral safety clear. The description adds little beyond stating it returns a deliverable, which is consistent with read-only behavior. No additional context on side effects or permissions is provided.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short and front-loaded with the main purpose. However, the reference case adds some extraneous detail. Overall, it is efficient and avoids unnecessary fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool is complex with many nested parameters and no output schema, yet the description provides minimal guidance. It fails to explain the structure of the deliverable, how inputs map to the RevOps case, or any usage flow, leaving significant gaps for the agent.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 13% schema description coverage, the description fails to compensate for missing parameter details. It merely says 'send the documented case fields' without explaining the specific inputs, their format, or their semantics. The nested objects have no descriptive aid.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is a RevOps architect that returns a structured audited deliverable. The purpose is well-defined, but it does not explicitly differentiate from sibling tools like 'ld_architect' or 'abm_architect', which are similar in nature.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies use for CRO-level RevOps planning through the reference case, but it lacks explicit guidance on when to use this tool versus alternatives. No 'when-not-to-use' or alternative tools are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rfp_tender_architectC

Read-only

Inspect

Architecte d'appels d'offres — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: AO DINUM — Plateforme IA souveraine. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`rfpType`	Yes
`rfpScope`	Yes
`budgetRange`	Yes
`deadlineISO`	Yes
`clientCompany`	Yes
`ourPositioning`	Yes
`compliancePoints`	No
`competitorsLikely`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already set readOnlyHint=true. The description adds that inputs are validated server-side, which is helpful but does not disclose additional behavioral traits beyond safety.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes jargon and a reference case that may not be immediately clear. It could be more efficiently structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 9 parameters, no output schema, and many siblings, the description lacks output details, parameter documentation, and usage context, making it incomplete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 11% (only 'async' has a description). The description fails to explain any of the 9 parameters, merely stating 'send the documented case fields' without elaboration.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is for RFP tender architecture, returning a structured, audited deliverable. However, it does not distinguish itself from siblings like 'proposal_generator' or 'battle_cards_live', missing differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use this tool versus alternatives. The description provides a reference case but lacks when-not or alternative tool references.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

rse_policy_builderC

Read-only

Inspect

Architecte de politique RSE — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: TechCorp SAS — Politique RSE 2025-2028 (500 FTE, €60M CA, SaaS B2B France). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`values`	Yes
`company`	Yes
`ambitions`	Yes
`targetLabels`	No
`currentInitiatives`	No
`targetStakeholders`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds that inputs are validated server-side and returns a structured deliverable, which provides some context but does not disclose performance, cost, or other behavioral traits. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is 3 sentences but includes unclear phrasing ('Gapup agent-payable C-suite expertise') that wastes space. The reference case is helpful but not essential. Overall, it is acceptable but not optimally concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (8 parameters, nested objects, no output schema, low schema coverage), the description is incomplete. It lacks details on output format, expected use cases, and does not leverage the reference case to explain parameters or expected behavior.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%, yet the description does not explain any parameter meanings beyond a vague reference to 'documented case fields'. With 8 parameters, the description fails to compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as an architecte de politique RSE (CSR policy architect) and states it returns a structured, audited deliverable. However, it does not explicitly differentiate from sibling tools like action_plan_esg or sustainability_report.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives, no prerequisites, and no exclusions. It only gives a reference case example without explaining contexts or when-not-to-use scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sabbatical_policy_comparatorA

Read-onlyIdempotent

Inspect

Enables CHROs to benchmark their company's sabbatical policies against peer organizations using data from SHRM, Payscale, and Mercer. Inputs include company size, industry, and current policy details. Outputs structured comparison with cost impact analysis, eligibility criteria, and duration benchmarks. Ideal for strategic HR planning and policy optimization.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`industry`	Yes	Industry classification code (NAICS)
`peerGroup`	No	List of peer company names for direct comparison
`companySize`	Yes	Number of employees in the company
`currentPolicy`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`benchmark`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint. The description adds value by detailing data sources (SHRM, Payscale, Mercer) and output structure (cost impact, eligibility, duration), providing useful behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the purpose and key details. Every sentence adds value, and it is structured effectively for quick understanding.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, the description sufficiently explains what the tool does and what it returns (structured comparison with cost impact, eligibility, duration). It does not cover performance or limitations, but the description is adequate for a benchmarking tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is high (80%), so baseline is 3. The description adds context by mentioning inputs (company size, industry, current policy) but does not elaborate on parameters beyond the schema. It provides enough context for understanding the tool's purpose.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool enables CHROs to benchmark sabbatical policies against peers, specifies inputs (company size, industry, current policy), and outputs (structured comparison with cost impact, eligibility, duration). It distinguishes from sibling tools by being specific to sabbatical policy benchmarking.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates it is for strategic HR planning and policy optimization, and implies use cases for benchmarking. It lacks explicit when-not-to-use or alternative tool references, but the specificity provides clear context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

safety_guardrail_breach_analyzerA

Read-onlyIdempotent

Inspect

Analyzes potential LLM guardrail breaches against IEEE 7000 ethical compliance standards. Designed for risk persona to evaluate safety violations in AI outputs. Accepts raw LLM responses or structured breach reports, returns compliance analysis with severity scoring and mitigation recommendations.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`context`	No	Contextual information about the prompt or conversation
`llmOutput`	Yes	Raw text output from LLM to analyze for guardrail breaches
`severityThreshold`	No	Minimum severity score to report (0-10 scale)
`includeMitigations`	No	Whether to include mitigation recommendations

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`breaches`	No
`warnings`	No
`complianceScore`	No

Tool Definition Quality

A4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, openWorldHint, idempotentHint) declare the tool as safe and idempotent. The description adds valuable behavioral context: it accepts raw LLM outputs or structured reports and returns compliance analysis with severity scoring and mitigations. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is only three sentences, each serving a clear purpose: first sentence defines the core action and standard, second identifies the target user, third lists inputs and outputs. No superfluous content, highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description correctly avoids detailing return values. It covers inputs (raw or structured), outputs (analysis with scoring and mitigations), and the ethical standard. Minor gap: does not mention that it can return job_id for async (the schema includes an async parameter), but that's a parameter detail.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are fully described in the schema. The description provides minimal additional parameter-specific detail beyond the schema, matching the baseline expectation for a scenario with high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the specific action ('analyzes'), the resource ('potential LLM guardrail breaches'), and the standard ('IEEE 7000 ethical compliance standards'). It effectively differentiates from siblings like 'safety_violation_incident_logger' and 'jailbreak_attempt_detector' by focusing on compliance analysis against a specific ethical framework.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description identifies the target persona ('risk persona') and context ('evaluate safety violations in AI outputs'), providing clear situational guidance. However, it lacks explicit instructions on when not to use the tool or when to prefer alternatives, which limits its completeness for decision-making.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

safety_violation_incident_loggerB

Read-onlyIdempotent

Inspect

Logs AI safety violations for compliance reporting, targeting risk management personas. Accepts incident details such as violation type, severity, description, and timestamp. Returns structured data with compliance categorization based on NIST AI RMF guidelines. Ideal for automated incident tracking and regulatory reporting workflows.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`metadata`	No
`severity`	Yes
`timestamp`	Yes
`description`	Yes
`violationType`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`incidentId`	No
`nistReference`	No
`complianceCategory`	No

Tool Definition Quality

B3.1/5.0

Behavior1/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description claims the tool 'Logs' violations, implying a write operation, yet annotations declare readOnlyHint: true. This is a clear contradiction. No additional behavioral traits (e.g., idempotency implications, persistence) are disclosed beyond the contradictory annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each with distinct purpose: purpose/audience, inputs/output, and ideal use cases. No redundancy, well front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, enums, nested object, output schema), the description lacks details on output structure, metadata handling, and the NIST categorization's significance. It does not address the contradiction with annotations, leaving agent uncertain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 17% schema description coverage, the description should compensate but merely lists parameter names (violation type, severity, description, timestamp) without adding meaning about formats, constraints, or impact on output. The optional 'async' and 'metadata' parameters are not mentioned.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool logs AI safety violations, specifies input fields (violation type, severity, description, timestamp), and mentions compliance categorization based on NIST AI RMF guidelines. It distinguishes from sibling tools like 'safety_guardrail_breach_analyzer' by focusing on logging for compliance reporting.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for automated incident tracking and regulatory reporting workflows, targeting risk management personas. However, it does not explicitly differentiate from alternative tools like 'ai_act_incident_response' or provide when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sales_enablement_architectB

Read-only

Inspect

Architecte Sales Enablement — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Spendesk — 45 reps · attainment 67% · ramp 5 mois → 3 mois · programme 8 modules · +€2,1M ARR. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`gaps`	Yes
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`salesTeam`	Yes
`objectives`	Yes
`currentEnablement`	Yes

Tool Definition Quality

B3.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds that it returns a structured audited deliverable and that inputs are validated server-side. This provides sufficient behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with only two sentences plus a reference case. It is front-loaded with the purpose. The reference case adds useful context but could be trimmed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the nested input schema, high number of required parameters, and no output schema, the description lacks details on what the deliverable contains, how results are structured, or expected runtime (async option exists in schema but not mentioned).

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With schema coverage at only 17%, the description does not compensate. It only says 'send the documented case fields' without explaining individual parameters or their constraints. The nested objects are not described.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns a structured, audited deliverable for sales enablement, targeting C-suite (CRO). The reference case provides concrete context. However, it does not explicitly differentiate from similar sibling tools like revops_architect or growth_path_architect.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for sales enablement auditing but lacks explicit when-to-use or when-not-to-use guidance. It mentions server-side validation but no alternatives or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sales_pipeline_forecastC

Read-only

Inspect

Prévision de pipeline commercial — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Doctolib Enterprise — pipeline Q2 2026 · 50 deals enterprise/mid-market · forecast confidence par deal + commit/best-case/worst-case. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`pipeline`	Yes
`historicalConversionByStage`	No

Tool Definition Quality

C2.3/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds 'inputs are validated server-side' and 'returns structured audited deliverable', but annotations already declare readOnlyHint=true. There is no disclosure of what happens if inputs are invalid, rate limits, or other behavioral traits. The description provides marginal additional context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness2/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single rambling sentence mixing French and English. It could be split into clear bullet points. The jargon ('Gapup agent-payable C-suite expertise') wastes space without adding clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness1/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, nested objects, no output schema), the description is severely lacking. It does not describe the return format, error handling, pagination for large pipelines, or how to interpret the forecast. The reference case provides some context but is insufficient for complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20% (only async has a description). The description does not explain any parameter meaning or usage. It merely says 'send the documented case fields' without elaborating on the complex nested objects (company, pipeline, historicalConversionByStage). This fails to compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns a structured, audited deliverable for sales pipeline forecasting. The reference case gives concrete context. However, the phrase 'Gapup agent-payable C-suite expertise (CRO)' is jargon that may confuse. Still, the core purpose is discernible.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit when-to-use or when-not-to-use guidelines are provided. The description implies it is for forecasting but does not differentiate from sibling tools like deal_coach or pipeline-related tools. There is no guidance on prerequisites or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sanctions_screener_multiA

Read-only

Inspect

Screening Sanctions Multi-listes — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Answers: For , run full OFAC + EU + UK HMT + UN + SECO + Canada SEMA + PEP + adverse media screening with composite risk score and evidence trail. · Is <company/individual> on any major international sanctions list? · What is the composite AML risk score for across all major watchlists? · Screen this M&A target / supplier / LP against all major sanctions lists and give me a compliance recommendation. · Is a PEP or associated with a PEP? What Enhanced Due Diligence is required? Reference case: Veridian Trading Co. LLC (Cyprus) — 7 listes · PEP check · adverse media 2 ans · composite 52/100 · escalate-to-compliance → EDD requis. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`address`	No
`aliases`	No
`entity_name`	Yes
`entity_type`	Yes
`context_note`	No
`date_of_birth`	No
`jurisdiction_focus`	Yes		all
`country_of_registration`	No
`adverse_media_lookback_days`	Yes

Tool Definition Quality

A3.8/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds value by specifying the data sources, output as a structured deliverable with evidence trail, and validated inputs. It mentions an async parameter in the schema but not in the description. Behavioral traits like auth needs or rate limits are not covered, but the description does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose and includes a marketing-style phrase 'Gapup agent-payable C-suite expertise (RISK)' and a reference case example that adds length without essential guidance. It lacks clear structure; important information is mixed with promotional language. Could be more concise and organized.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 10 parameters, no output schema, and moderate complexity, the description provides a high-level overview of functionality and example questions. However, it does not explain the output structure in detail (what fields are in the deliverable), how to interpret risk scores, or how the async mode works (only described in schema). This leaves the agent underinformed for invocation and result handling.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (10%), with only the 'async' parameter having a description. The tool description does not define individual parameters beyond mentioning required fields in context. Example questions imply some semantics, but the agent lacks detailed guidance on fields like 'address', 'aliases', or 'context_note'. This leaves significant gaps for correct usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs multi-list sanctions screening, listing specific watchlists (OFAC, EU, UK HMT, UN, SECO, Canada SEMA, PEP, adverse media) and returns a composite risk score and evidence trail. It addresses specific questions an agent might ask, distinguishing it from siblings like kyc_screener or sharia_compliance_screener which focus on single types of screening.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides usage context through example queries and mentions validated server-side inputs. However, it does not explicitly state when NOT to use this tool or compare it to sibling tools beyond implied scope. The agent can infer usage from the listed questions, but explicit alternatives are absent.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

save_playsC

Read-only

Inspect

Plans de sauvetage clients — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Kyriba — Plan sauvetage 30j · ARR €11.988 · Champion parti · Script 6 actions · 3 concessions. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`account`	Yes
`company`	Yes
`product`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true and openWorldHint=true, suggesting the tool does not modify state and may accept unexpected inputs. The description adds that inputs are validated server-side and returns a deliverable, which aligns with read-only behavior. However, it does not detail the output format, side effects, or potential limitations beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively short but includes a lengthy reference case example that could be abbreviated. The structure is somewhat front-loaded with the purpose, but the example takes up space that could be used for more explicit instructions. It is adequate but not optimally concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (nested objects, required fields, async parameter) and lack of output schema, the description is incomplete. It does not explain the return value in detail, the async parameter is not mentioned, and the inputs are only described by reference case. The description leaves many gaps for the agent to infer.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The schema coverage is only 25%, and the description does not explain the parameters beyond mentioning 'documented case fields.' The reference case implies certain fields (company, account, product, async), but no additional meaning is provided. The description fails to compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: saving client rescue plans ("plans de sauvetage clients") and returning a structured, audited deliverable. It mentions the target audience (C-suite, CRO) and provides a reference case. However, it does not differentiate from sibling tools, many of which are unrelated, so the purpose is clear but not distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

There is no explicit guidance on when to use this tool versus alternatives. The description provides a reference case (Kyriba) but does not explain under what circumstances the tool should be invoked or when other tools might be more appropriate. No exclusions or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sci_literature_searchA

Read-only

Inspect

Recherche bibliographique multi-sources sur la litterature scientifique. Sources : OpenAlex (200M+ works) · Semantic Scholar · arXiv · PubMed · CrossRef. Modes : search | meta_analysis | citation_network | expert_finder. Keyless / free tier. Cache LRU 12h.

ParametersJSON Schema

Name	Required	Description
`mode`	No	Mode de recherche. Defaut: search
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Keywords, titre, auteur, DOI (ex: 10.xxxx/xxxx accepte)
`domain`	No	Domaine scientifique. Defaut: all
`date_to`	No	Date ISO fin (YYYY-MM-DD)
`date_from`	No	Date ISO debut (YYYY-MM-DD)
`max_results`	No	5-50. Defaut: 20
`min_citations`	No	Nombre minimal de citations

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`query`	Yes
`papers`	Yes
`status`	Yes
`experts`	No
`sources`	Yes
`meta_analysis`	No
`quality_score`	Yes
`citation_network`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, openWorldHint), the description adds key behavioral traits: keyless/free tier, LRU cache with 12-hour lifetime, and multi-source aggregation. This provides useful context not available in annotations alone.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is highly concise: four short lines encapsulating all essential information (sources, modes, key characteristics). No wasted words. Front-loaded with clear purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite multiple sources and modes, the description covers all critical contextual aspects: sources, modes, authentication (keyless), caching behavior, and performance. With a complete input schema and output schema present, no major gaps remain.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are well-documented in the schema. The description adds no additional parameter-level semantics beyond what the schema already provides. Baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states multi-source literature search, lists specific sources (OpenAlex, Semantic Scholar, arXiv, PubMed, CrossRef), and enumerates modes (search, meta_analysis, citation_network, expert_finder). This distinguishes it from sibling tools like research_paper_qa.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives, no exclusions or prerequisites mentioned. Sibling tools include research_paper_qa and others that may also deal with scientific literature, but no comparison is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sec_filing_decoderB

Read-only

Inspect

Décodeur de filing SEC — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Answers: Read the 10-K of and give me the material red flags, KPI movements, and a board-ready executive summary. · What has materially changed in 's risk profile in its latest annual filing? Flag any going-concern or auditor-change signals. · Is there any M&A signal or strategic review hint in 's most recent SEC filings? What's the evidence? · Prepare a due-diligence SEC filing brief for : financial snapshot, red flags, governance changes, and recommended next actions. · What is the sentiment of 's latest 10-K compared to its most recent 10-Q — bullish, neutral, or bearish? Reference case: SHOP · 10-K FY2024 · 4 red flags (1 critical: merchant concentration) · Revenue +24.7% YoY · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`cik`	No
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	Yes		all
`ticker`	No
`filing_types`	Yes
`lookback_months`	Yes

Tool Definition Quality

B3.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint=true and openWorldHint=true. The description adds that it returns a 'structured, audited deliverable' and mentions async functionality from the schema. This provides some behavioral context but does not go beyond what annotations already convey regarding safety.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately lengthy and includes a list of example questions that are somewhat repetitive. It front-loads the purpose but contains redundant information, reducing conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description provides useful context with examples and a reference case, but it does not detail the output format (no output schema) or clarify how this tool differs from many SEC-related siblings. It is adequate but not fully comprehensive.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 17% schema description coverage, the description does not compensate by explaining key parameters like cik, ticker, filing_types, or focus. While it gives example queries that hint at parameter usage, it lacks explicit semantic guidance for the 6 parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool decodes SEC filings and provides analysis like red flags, KPI movements, executive summary. It includes example use cases and a reference case (SHOP). However, it does not explicitly differentiate from sibling tools like earnings_reviewer or competitive_deep_dive.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides example questions that imply usage contexts (e.g., 'Read the 10-K of <ticker> and give me the material red flags'), but it does not explicitly state when to use this tool vs alternatives or provide when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sentiment_news_pulseC

Read-only

Inspect

Pulse Média & Sentiment — Gapup agent-payable C-suite expertise (CMO). Returns a structured, audited deliverable. Answers: What is the current PR / brand sentiment for over the last 7 days? Show top headlines, trend signals, and recommended actions. · Is there a crisis building for ? Detect early-warning signals in press coverage and flag emerging negative narratives. · Track launch media coverage for — what is the press sentiment and which topics dominate the conversation? · Compare media sentiment between and its competitors over the past week. · What should our communications director prioritize in the next 48h based on current press coverage of ? Reference case: Velora Payments — Pulse média 7j · sentiment neutre (score +5) · crise émergente détectée · . Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`entity_name`	Yes
`entity_type`	Yes		company
`sentiment_lens`	Yes		reputation
`date_range_days`	Yes
`language_filter`	Yes		en
`include_competitors`	No

Tool Definition Quality

C2.8/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint=true and openWorldHint=true. The description says 'Returns a structured, audited deliverable' and 'Inputs are validated server-side', but adds little beyond annotations. It does not explain what 'audited' entails, authorization needs, rate limits, or the implications of openWorldHint.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is verbose, containing multiple full-sentence questions and a reference case. While it provides useful context, it could be more concise and better structured for quick parsing.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (7 parameters, no output schema, 14% coverage), the description lacks critical details such as the format of the deliverable, interpretation of sentiment scores, and behavior related to openWorldHint. It is insufficient for an agent to fully understand the tool's behavior without additional documentation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 14% (only 'async' has a description). The description mentions entity_name, entity_type, date_range_days, language_filter, and sentiment_lens through example questions, but does not provide detailed parameter semantics. It fails to compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool returns a structured, audited deliverable for media sentiment analysis and lists specific use cases (PR sentiment, crisis detection, launch monitoring, competitor comparison). However, it does not differentiate from sibling tools like reputation_engine or press_influencer, which may overlap in purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides example questions that imply when to use the tool (e.g., 'What is the current PR / brand sentiment...', 'Is there a crisis building...'). However, it does not specify when not to use it or compare to alternatives, leaving usage guidance implicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

seo_cro_auditA

Read-only

Inspect

Full SEO + CRO audit of any public URL. Analyses technical SEO (HTTP status, HTTPS, title/meta/canonical/robots, H1-H2, JSON-LD structured data, sitemap, robots.txt, OG/Twitter cards), content SEO (word count, keyword density top-10, readability estimate, image alt coverage, internal/external links), performance signals (page size, estimated render time, inline scripts/styles, unoptimised images), and CRO (CTA detection, above-fold CTAs, forms, social proof, trust signals, pricing visibility). Optionally compares up to 5 competitor URLs. Returns 0-100 scores per dimension plus a prioritised (P0/P1/P2) recommendation list. ICP: marketing managers, SEO/CRO consultants, e-commerce ops, agency teams. Budget: 8s per URL. Cache TTL: 1h.

ParametersJSON Schema

Name	Required	Description
`url`	Yes	Fully-qualified URL to audit (e.g. https://stripe.com/pricing)
`mode`	No	Audit scope — defaults to 'full'
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`compare_competitors`	No	Optional list of competitor URLs to compare (max 5)

Output Schema

ParametersJSON Schema

Name	Required	Description
`url`	Yes
`status`	Yes
`sources`	Yes
`audit_modes`	Yes
`content_seo`	Yes
`cro_signals`	Yes
`quality_score`	Yes
`technical_seo`	Yes
`overall_scores`	Yes
`recommendations`	Yes
`performance_signals`	Yes
`competitor_comparison`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false. The description adds valuable behavioral information: expected execution time ('Budget: 8s per URL'), cache behavior ('Cache TTL: 1h'), and return format (0-100 scores with prioritised recommendations). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph but efficiently packs all key details: purpose, audit categories, competitor option, output format, target audience, and performance characteristics. It is front-loaded with the most important information and each sentence adds value, though it could be slightly more structured (e.g., bullet points).

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of the tool and the presence of an output schema (which covers return structure), the description adequately explains the audit scope, optional parameters, output format, target users, and performance characteristics. It is complete and requires no additional clarification.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are already well-documented in the schema. The description adds context about the audit scope and competitor comparison, but does not provide additional details per parameter beyond what the schema already contains. Baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Full SEO + CRO audit of any public URL', specifying the exact scope and components (technical, content, performance, CRO). It also mentions optional competitor comparison, which distinguishes it from sibling tools like seo_keyword_research or competitor_deep_dive.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context by listing the ideal customer profile (ICP: marketing managers, SEO/CRO consultants, etc.) and mentions optional competitor comparison. However, it does not explicitly state when to use this tool versus alternatives or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

seo_keyword_researchA

Read-only

Inspect

SEO keyword research from a seed keyword or topic. Uses Google Suggest (public, keyless) to discover related queries at 2 expansion levels, then clusters them by intent: informational / commercial / transactional / navigational — via heuristic pattern matching. Search volume is bucketed (very_high / high / medium / low / very_low) and clearly labelled as ESTIMATED — no fabricated precise numbers. Returns all keywords, intent clusters, quality scores (0-100), and top 10 opportunities. Supports country (gl) and language (hl) targeting. 100% keyless. Cache TTL 6h. ICP: SEO managers, content strategists, SaaS founders, agency teams.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`country`	No	ISO 3166-1 alpha-2 country code for Google Suggest (e.g. 'US', 'FR', 'DE'). Defaults to 'US'.
`language`	No	BCP-47 language code for suggestions (e.g. 'en', 'fr', 'de', 'es'). Defaults to 'en'.
`seed_keyword`	Yes	The seed keyword or topic to research (e.g. 'invoice software', 'project management tool')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`country`	Yes
`clusters`	Yes
`language`	Yes
`warnings`	Yes
`all_keywords`	Yes
`seed_keyword`	Yes
`quality_score`	Yes
`total_keywords`	Yes
`top_opportunities`	Yes

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses numerous behavioral traits beyond annotations: uses Google Suggest (public, keyless), two expansion levels, heuristic intent clustering, estimated volume buckets, quality scores, country/language targeting, and 6-hour cache. Annotations (readOnlyHint, openWorldHint) are consistent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is dense but efficient, covering all key features in a single paragraph. It front-loads the main action and follows with important details. Could be slightly more structured with bullet points, but remains concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (expansion, clustering, bucketed volumes, quality scores, targeting) and the presence of an output schema, the description fully covers what the tool does, how it works, and its limitations (estimated volumes, cache). It leaves no significant gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameters with descriptions. The description adds context (e.g., country defaults to US, language to en, async behavior for slow calls) and examples, enriching understanding beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it performs SEO keyword research from a seed keyword using Google Suggest, with expansion levels and intent clustering. It distinguishes itself from sibling tools (e.g., seo_cro_audit) by focusing on keyword discovery and intent analysis.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use it (for keyword research, intent clustering) and provides context like 100% keyless, cache TTL, and target audience. It doesn't explicitly state when not to use it, but the context implies it's for broad discovery, not precise volume data.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sharia_compliance_screenerA

Read-only

Inspect

Sharia compliance screening engine for Islamic banks, Sukuk issuers, Gulf sovereign funds, halal investment managers and MENA family offices. Zero competing MCP on this vertical.

Standards supported: AAOIFI (default) | MSCI_Islamic | S&P_Sharia | DJIM

Four modes: • company — Full Sharia screen of a listed company: business activity (halal/haram/mixed) + AAOIFI financial ratios (debt/market-cap <30%, interest-assets <30%, non-compliant revenue <5%) • instrument — Sukuk / halal fund classification by ISIN or name. Maps to known Sharia boards. • sector_screen — Industry classification (halal/haram/mixed) with rationale + examples. Static AAOIFI-based map covering 40+ sectors. • financial_ratios — AAOIFI ratio computation on fetched or provided financials.

Prohibited activities screened: alcohol, gambling, pork, weapons, pornography, tobacco, conventional banking (riba), conventional insurance, adult entertainment, embryonic stem cells.

Output includes compliance_status (halal/haram/doubtful_mixed/purification_required), purification_pct when applicable, P0/P1/P2 signals, quality_score, and sources.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Screening mode. company=full listed company screen, instrument=Sukuk/fund classification, sector_screen=industry halal/haram classification, financial_ratios=AAOIFI ratio check.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Entity to screen. Company name, ticker or ISIN (e.g. "Aramco", "AAPL", "tobacco", "XS1234567890").
`standard`	No	Sharia standard to apply. Default "AAOIFI" (most conservative, widely accepted by Islamic banks).

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`company`	No
`signals`	Yes
`sources`	Yes
`instrument`	No
`quality_score`	Yes
`sector_screen`	No
`standard_used`	Yes

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses behavioral traits such as the async mode for non-blocking execution, the default standard (AAOIFI), and the output fields including compliance status and purification percentage. It aligns with annotations (readOnlyHint=true, destructiveHint=false) and adds valuable context beyond them.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured and front-loaded with the purpose. While it is lengthy, every section (modes, standards, output) adds necessary context. It could be slightly more concise, but it remains clear and informative without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 4 parameters, enums, and an output schema, the description provides all necessary context: standards, modes, prohibited activities, and output fields. It covers complex behaviors like async mode and the four screening modes comprehensively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All parameters have schema descriptions (100% coverage). The tool description adds significant meaning by explaining each mode's purpose, providing examples for the query parameter (e.g., 'Aramco', 'AAPL'), and listing the four standards with their contexts. This goes well beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states that the tool is a 'Sharia compliance screening engine' targeting specific entities like Islamic banks and Sukuk issuers. It details four distinct screening modes, making the purpose very clear. It also claims 'Zero competing MCP on this vertical,' distinguishing it from all sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear contexts for each of the four modes and lists supported standards. It does not explicitly state when not to use the tool, but its niche focus implies limited applicability. The claim of no competing tools on the same server further clarifies when it should be used.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

social_engagement_velocity_trackerA

Read-onlyIdempotent

Inspect

Tracks hourly social engagement velocity (likes, shares, comments) across Twitter, LinkedIn, and Reddit for CMOs. Inputs include platform handles/subreddits and time range. Outputs engagement metrics, velocity trends, and platform-specific insights. Ideal for real-time marketing performance monitoring and competitive benchmarking. Keywords: social media analytics, engagement tracking, marketing KPIs, CMO dashboard.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`hours`	No
`platforms`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`trends`	No
`sources`	No
`warnings`	No
`engagement`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and openWorldHint. The description adds that outputs include engagement metrics, velocity trends, and platform-specific insights, aligning with annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two effective sentences plus keywords, front-loaded with the main purpose, and contains no unnecessary fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema, the description covers purpose, inputs, outputs, and use cases, but omits mention of the async parameter and any prerequisites or error conditions, leaving gaps for a complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 33%, but the tool description explains two parameters (platforms and hours) beyond the schema, adding context for platform handles/subreddits and time range. The async parameter is not explained.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'tracks', the resource 'hourly social engagement velocity', and the scope across Twitter, LinkedIn, and Reddit for CMOs, distinguishing it from sibling tools like fake follower detectors.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions ideal use cases ('real-time marketing performance monitoring and competitive benchmarking') but does not provide explicit when-to-use vs alternatives or when-not-to-use guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

social_influencer_fake_follower_detectorA

Read-onlyIdempotent

Inspect

Analyzes up to 10 social media influencers for fake followers by checking engagement velocity patterns (Trends24) and RSS feed anomalies. Returns authenticity scores, follower growth spikes, and suspicious activity flags. Optimized for CMOs evaluating influencer partnerships. Includes keywords: influencer marketing, fake follower detection, engagement analysis, social media audit.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`platform`	Yes	Social media platform of the influencers
`influencerHandles`	Yes	Array of up to 10 social media handles (e.g., ['@influencer1', 'user2'])

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`results`	Yes
`sources`	Yes
`summary`	No
`warnings`	Yes

Tool Definition Quality

A3.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, openWorld, and idempotent; the description adds that it analyzes up to 10 influencers and uses specific data sources, but does not disclose additional behavioral traits beyond the async parameter.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is reasonably concise but includes a keyword list that adds bulk without functional benefit; could be leaner.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, the description covers purpose, parameters, target user, and async behavior adequately for a read-only tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3; the description reiterates the max of 10 influencers but adds no new meaning beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it analyzes up to 10 influencers for fake followers by checking engagement velocity and RSS anomalies, and it distinguishes from siblings like social_engagement_velocity_tracker by focusing on fake follower detection.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It mentions optimization for CMOs evaluating influencer partnerships, which implies target users, but does not provide explicit guidance on when to use vs alternative tools or when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sovereign_data_breach_impactA

Read-onlyIdempotent

Inspect

Estimates financial impact of a data breach across three jurisdictions (US, EU, UK) for CFO strategic planning. Inputs include breach size, industry sector, and affected jurisdictions. Outputs include direct costs, regulatory fines, reputational damage, and cyber insurance premium adjustments. Ideal for cross-border risk assessment, financial contingency planning, and board-level reporting. Keywords: data breach cost, regulatory fines, cyber insurance, financial risk, cross-jurisdiction impact.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`industry`	No	Industry sector of the affected organization
`records_lost`	Yes	Number of records compromised in the breach
`jurisdictions`	Yes	Jurisdictions where the breach has legal or financial impact
`detection_time_days`	No	Time in days to detect the breach

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`total_cost_usd`	No	Estimated total financial impact in USD
`cost_per_record_usd`	No	Cost per compromised record in USD
`regulatory_fines_usd`	No
`cyber_insurance_impact`	No
`reputational_damage_usd`	No	Estimated reputational damage cost in USD

Tool Definition Quality

A4/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, and idempotentHint. Description adds context (audience, outputs) but no behavioral details beyond what annotations convey. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is concise (4 sentences) and front-loaded with purpose. Each sentence adds value: purpose, inputs/outputs, use cases, keywords. No redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers main inputs and outputs, but omits mention of optional parameter detection_time_days. Output schema exists so return values need not be detailed. Overall fairly complete for a tool with 5 params.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. Description names key inputs (breach size, industry, jurisdictions) matching schema, but adds no extra detail beyond parameter names.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool estimates financial impact of data breaches across three jurisdictions for CFO strategic planning. It specifies inputs, outputs, and use cases, and has no directly competing siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description explicitly lists ideal use cases (cross-border risk assessment, financial contingency planning, board-level reporting). While it doesn't mention when not to use, the context is clear given sibling diversity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sre_slo_breach_predictorA

Read-onlyIdempotent

Inspect

As a CTO, predict potential SLO breaches 24 hours in advance by analyzing public incident reports and MITRE ATT&CK techniques. Input your service's critical components and reliability thresholds to receive breach probability scores, top contributing TTPs, and recommended mitigations. Uses MITRE ATT&CK, GitHub Advisories, and Cloudflare Radar data. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`time_window_hours`	No
`service_components`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`incident_reports`	No
`breach_probability`	No
`recommended_actions`	No
`top_ttp_contributors`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, openWorld, idempotent. Description adds that tool uses MITRE ATT&CK, GitHub Advisories, Cloudflare Radar, and can be slow (async hint). No contradictions. Adds useful context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, front-loaded with main purpose. No fluff. Efficiently covers key aspects: user, input, output, data sources, async note.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Description covers inputs, outputs (breach probability scores, TTPs, mitigations), data sources, async behavior, and target user. With output schema existing, return values are outlined. Slightly lacking on time_window_hours parameter explanation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is low (33%). Description explains service_components as 'critical components and reliability thresholds' and async as timeout avoidance. Does not explain time_window_hours which has default but no description. Partially compensates.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool predicts SLO breaches 24 hours in advance for CTOs, using specific data sources. It specifies the verb 'predict', resource 'SLO breaches', and scope '24 hours'. This distinguishes it from many sibling tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides context: target user (CTO), scenario (predicting failures), and async option to avoid timeout. Mentions data sources. Lacks explicit when-not-to-use or alternative comparisons, but usage context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

strategic_options_analyzerC

Read-only

Inspect

Analyseur d'options stratégiques — Gapup agent-payable C-suite expertise (CSO). Returns a structured, audited deliverable. Reference case: Aircall — 5 options stratégiques post-Série D (2023-2024). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`optionHypotheses`	Yes
`strategicContext`	Yes
`founderConstraints`	Yes

Tool Definition Quality

C2.9/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and openWorldHint=true, so the description adds limited behavioral context. It states inputs are validated server-side and returns a deliverable, but does not explain what happens on validation failure, response structure, or any side effects. Without annotations this would be a 1, but annotations carry some burden.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, fitting essential info: purpose, deliverable type, reference case, and input validation. It is concise and front-loaded with the tool's function. However, it could be slightly more structured with explicit sections.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite the tool having 5 parameters with nested objects and no output schema, the description does not explain the deliverable structure, how to handle the async parameter, or how results relate to inputs. The reference case provides some context but is insufficient for full understanding of tool behavior and return format.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, meaning parameter descriptions are sparse. The tool description does not add meaningful semantic information for the complex nested parameters (company, optionHypotheses, strategicContext, founderConstraints). It mentions 'documented case fields' but does not elaborate on their meaning or usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool is an 'Analyseur d'options stratégiques' for C-suite expertise, returning a structured deliverable. The verb 'analyze' is implied. It references a specific case (Aircall) to clarify scope. However, it does not explicitly differentiate from sibling strategic analysis tools like 'market_entry_strategist' or 'capital_strategy', which may cover similar ground.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions inputs are validated server-side and to send the documented case fields, but provides no guidance on when to use this tool versus alternatives. It gives a reference case but no explicit when-not or comparative context. Usage guidance is minimal.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

supplier_esg_auditC

Read-only

Inspect

Audit ESG des fournisseurs — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: TechCorp — Audit ESG fournisseurs 2025 (5 fournisseurs, €1.37M spend). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`suppliers`	Yes
`targetScore`	No
`auditCriteria`	Yes

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description indicates it returns a 'structured, audited deliverable' and that inputs are validated server-side. Annotations provide readOnlyHint=true and openWorldHint=true, which are consistent. However, no further behavioral details (e.g., whether it calls external APIs, data retention, or performance characteristics) are disclosed beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short (3 sentences plus reference case) and minimally verbose. However, it could be better front-loaded: the purpose is stated first, but the target audience phrase is ambiguous. Still, it avoids unnecessary fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested objects, no output schema), the description is insufficient. It does not describe the return format, error handling, when to use async parameter, or the structure of the deliverable. The annotations and reference case only partially compensate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 17%, meaning most parameters (company, suppliers, auditCriteria) lack descriptions. The description does not explain these parameters' semantics; it only provides a reference case (TechCorp) with example values but no formal parameter meaning. The mention 'send the documented case fields' is too generic.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states 'Audit ESG des fournisseurs' (audit ESG of suppliers) and mentions it returns an audited deliverable. A reference case is provided. However, it does not differentiate from sibling tools like 'esg_audit_multi' which may have overlapping scope.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description does not specify prerequisites, exclusions, or context for using this tool over siblings such as 'vendor_esg_diversity_scanner' or 'carbon_footprint_calculator'. The mention of 'Gapup agent-payable C-suite expertise' is vague and does not clarify usage decisions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

supply_chain_fx_exposure_dashboardA

Read-onlyIdempotent

Inspect

Provides real-time foreign exchange exposure dashboard for supply chain monitoring. Designed for COO persona to track currency risk across suppliers and regions. Inputs include supplier IDs, base currency, and target currencies. Outputs structured FX exposure data with risk indicators, exchange rates, and supplier impact analysis sourced from World Bank LPI and live FX rate APIs.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`supplierIds`	No	List of supplier identifiers to analyze
`baseCurrency`	Yes	Base currency code (ISO 4217) for exposure calculation
`riskThreshold`	No	Percentage threshold for high-risk exposure flagging
`targetCurrencies`	Yes	Target currency codes (ISO 4217) to compare against base

Output Schema

ParametersJSON Schema

Name	Required	Description
`data`	No
`status`	Yes
`sources`	No
`warnings`	No

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds that it provides 'real-time' data sourced from 'World Bank LPI and live FX rate APIs', but this only slightly augments the behavioral context without revealing new traits beyond what annotations cover.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences front-load the purpose and high-level scope. Every word contributes meaning with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

While output schema exists, the description omits the 'async' and 'riskThreshold' parameters and does not differentiate from closely related siblings. It provides adequate high-level context but has clear gaps in parameter coverage and usage guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% parameter description coverage, so the baseline is 3. The description mentions three of five parameters (supplierIds, baseCurrency, targetCurrencies) but omits async and riskThreshold, providing no additional meaning beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides a 'real-time foreign exchange exposure dashboard for supply chain monitoring' with a specific persona (COO) and resource (dashboard). It distinguishes from siblings like supplier_esg_audit and working_capital_fx_hedge_optimizer by focusing on monitoring rather than auditing or optimizing.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for COO tracking currency risk, but does not explicitly state when not to use it or mention alternative tools. No exclusions or alternatives are provided, leaving the agent to infer context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sustainability_reportC

Read-only

Inspect

Rapport de durabilité — Gapup agent-payable C-suite expertise (SUSTAINABILITY). Returns a structured, audited deliverable. Reference case: GreenLoop Solutions — rapport durabilité B-Corp 2025 (95 FTE, €18M CA). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`pillars`	Yes
`stakeholders`	Yes
`targetLabels`	No
`existingLabels`	No
`audienceProfile`	Yes

Tool Definition Quality

C2.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint. The description adds 'audited deliverable' but does not elaborate on behavioral traits beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (two sentences plus a reference case). It is front-loaded with purpose. The reference case adds some value but could be trimmed.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has complex nested parameters and an async option, but the description does not explain return format, processing time, or how async works. It is incomplete for a tool of this complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 13%, yet the description adds no parameter-level details. It only says 'send the documented case fields', which provides no semantic meaning for individual parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states it returns a structured, audited deliverable for sustainability reporting, which is clear. However, it does not distinguish from siblings like 'sustainability_reporting_pilot' which may have a similar purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. Only says 'send the documented case fields', but no context on prerequisites or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

sustainability_reporting_pilotB

Read-only

Inspect

Pilote de reporting durabilité — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: AlphaTech Industries SAS — premier rapport CSRD wave 2 (exercice 2025). Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No
`company`	Yes
`dataInputs`	Yes
`materiality`	Yes
`targetFrameworks`	Yes

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint (read operation) and openWorldHint (non-deterministic). The description adds that it returns a 'structured, audited deliverable' and mentions a reference case, which provides some context but does not significantly expand on the behavioral traits beyond what annotations already convey.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is brief (two sentences plus a reference case) but could be better structured. It front-loads the core purpose but misses opportunities to include key details like async usage or parameter hints. The reference case adds value but the overall structure is average.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the high complexity (nested objects, 6 parameters, no output schema) and low schema coverage, the description is insufficient. It does not explain what the deliverable contains, how to structure materiality, or what target frameworks entail. The tool needs more context to be fully usable.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The description does not explain any parameter semantics. With only 17% schema description coverage (only the async parameter has a schema description), the description should add meaning but fails to do so. The required parameters like company, dataInputs, materiality are complex nested objects with no additional guidance.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a sustainability reporting pilot that returns a structured, audited deliverable, with a specific reference to CSRD wave 2. However, it does not explicitly differentiate from sibling tools like sustainability_report or esg_audit_multi, which may have overlapping purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for sustainability reporting, but lacks explicit guidance on when to use this tool instead of alternatives. It mentions inputs are validated server-side but does not provide criteria for selection or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

syndicated_loan_covenant_breach_alertA

Read-onlyIdempotent

Inspect

Monitors syndicated loan covenants for potential breaches by analyzing Tradeweb market data. Designed for CFOs to proactively identify financial compliance risks in loan agreements. Accepts loan identifiers, covenant thresholds, and reporting period as inputs. Returns structured breach alerts with market context and severity indicators.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`loanId`	Yes	Unique identifier for the syndicated loan
`currency`	No	ISO currency code for financial values
`reportingPeriod`	Yes	Time period for covenant compliance check
`covenantThresholds`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`breaches`	No
`warnings`	No

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds value beyond annotations by specifying the data source (Tradeweb), output format (structured alerts with market context and severity indicators), and no contradictions with annotation hints.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences deliver the core purpose, target user, and output; front-loaded and efficient with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity and the existence of an output schema, the description covers purpose, inputs, output, and user target comprehensively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 80%, and the description summarizes inputs (loan identifiers, thresholds, period) but adds little beyond the schema's details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool monitors syndicated loan covenants for potential breaches using Tradeweb market data, distinguishing it from siblings like bond covenant tools.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description targets CFOs for proactive compliance risk identification, implying usage context, but lacks explicit when-not-to-use or alternative references.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

syndicated_loan_pricing_benchmarkA

Read-onlyIdempotent

Inspect

Provides CFOs with peer benchmarking for syndicated loan pricing by comparing current loan terms against market data from Tradeweb and FRED. Inputs include loan amount, tenor, credit rating, and currency. Outputs structured pricing benchmarks with spread, yield, and fee comparisons. Ideal for quick validation of loan competitiveness or negotiation preparation.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`tenor`	Yes	Loan tenor (e.g., '5Y', '3Y')
`region`	No	Region for benchmarking (e.g., 'US', 'EU')
`currency`	Yes	Currency code (e.g., 'USD', 'EUR')
`loanAmount`	Yes	Loan amount in millions
`creditRating`	Yes	Borrower credit rating (e.g., 'BBB', 'BB+')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`benchmarks`	No

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark it as readOnly, idempotent, and openWorld, so no contradiction. The description adds value by naming data sources (Tradeweb, FRED) and indicating output structure (spread, yield, fee comparisons), which complements the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences are concise and front-loaded: purpose and data sources first, then inputs, outputs, and use case. No unnecessary words or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers purpose, data sources, inputs, outputs, and use case. With a complete output schema and clear annotations, it provides sufficient context for an agent to decide when and how to invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are well-documented. The description reiterates the key required inputs (loan amount, tenor, credit rating, currency) and explains output structure, adding context beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides peer benchmarking for syndicated loan pricing, comparing terms against specific market data sources (Tradeweb, FRED). It distinguishes from siblings like syndicated_loan_covenant_breach_alert by focusing on pricing competitiveness rather than compliance.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the ideal use case ('quick validation of loan competitiveness or negotiation preparation') but does not explicitly contrast with sibling tools or state when not to use it. While clear, it lacks explicit alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

talent_contract_risk_mapperA

Read-onlyIdempotent

Inspect

For CHROs: analyzes employee contracts for non-compete, IP assignment, and confidentiality clauses, comparing against state labor laws and jurisdiction-specific precedents. Returns risk levels, conflicting statutes, and suggested revisions. Uses USPTO PatFT, CourtListener, and EUR-Lex for legal cross-referencing. Ideal for contract reviews, compliance audits, or policy updates.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`jurisdiction`	Yes	State or country jurisdiction (e.g., 'California', 'Germany')
`contract_text`	Yes	Full text of the employee contract or clause section to analyze
`employee_role`	No	Job title or role classification (e.g., 'Software Engineer', 'Executive')
`effective_date`	No	Contract effective date (YYYY-MM-DD)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`risk_summary`	No
`suggested_revisions`	No
`conflicting_statutes`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, openWorldHint, idempotentHint. The description adds that the tool uses external legal databases (USPTO, CourtListener, EUR-Lex) and returns risk levels, conflicting statutes, and suggested revisions. This provides valuable context beyond annotations without contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two efficient sentences plus a use-case line. It is front-loaded with the target audience and purpose, and every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the existence of an output schema, the description sufficiently covers the tool's functionality: analyzing contracts, comparing laws, and outputting risk levels and suggested revisions. The mention of external sources adds completeness. Minor gap: could clarify supported jurisdictions, but overall complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so baseline is 3. The description adds that jurisdiction is used for comparing state labor laws and contract_text for analyzing specific clauses, but does not elaborate on async, employee_role, or effective_date beyond schema definitions. Adequate but not enhancing.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it analyzes employee contracts for specific clauses (non-compete, IP assignment, confidentiality) against state laws, targeting CHROs. It distinguishes from siblings like 'contract_risk_scanner' by specifying the HR/employee contract focus and the use of legal databases.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists use cases: contract reviews, compliance audits, policy updates. It does not mention when not to use or provide alternatives, but the context is clear enough for an agent to infer appropriate scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

talent_intelligenceA

Read-only

Inspect

HR tech intelligence for CHROs, recruiters, VC teams, comp & benefits leads and workforce planners. Four modes powered by ESCO, O*NET, BLS OES and crowd-sourced salary data:

• salary_benchmark — cash-only salary medians (p25/median/p75) for 54+ roles across US/EU/Asia. Covers tech, finance, compliance, healthcare, marketing, ops and C-suite. Data from BLS OES, Levels.fyi and StackOverflow Developer Survey 2024. • skills_taxonomy — maps a skill to its ESCO URI, O*NET codes, skill type (hard/soft/knowledge/cert), 8 related skills with similarity scores and typical roles. • job_market_trends — YoY growth %, open positions estimate, top employers and leading skills per job category × country. Static 2024 data with BLS baseline fallback. • adjacent_roles — up to 6 roles adjacent to a source role with ESCO taxonomy adjacency: similarity score, salary delta % and skills overlap %.

All salary data is cash-only (excludes equity/RSU/bonus). Cache TTL: 24h (stable labour market data). Optional env ONET_API_KEY for authenticated O*NET lookups (free registration at onetcenter.org).

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analysis mode: salary_benchmark=compensation data, skills_taxonomy=ESCO/O*NET mapping, job_market_trends=market growth and demand, adjacent_roles=career path recommendations.
`role`	No	Job title (required for salary_benchmark, job_market_trends, adjacent_roles). Examples: "Senior Software Engineer", "Compliance Officer", "Data Scientist", "CFO".
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`skill`	No	Skill to classify (required for skills_taxonomy mode). Examples: "Python", "transformer architecture", "GDPR", "Kubernetes", "leadership".
`country`	No	ISO 2-letter country code. Default: US. Examples: US, FR, DE, GB, SG.
`seniority`	No	Seniority level. Default: senior. Affects salary benchmark ranges.

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`sources`	Yes
`quality_score`	Yes
`adjacent_roles`	No
`skills_taxonomy`	No
`salary_benchmark`	No
`job_market_trends`	No

Tool Definition Quality

A4.5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description adds significant behavioral context beyond annotations: data sources (ESCO, O*NET, BLS), cash-only salary, cache TTL (24h), optional API key for ONET, and async capability. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with bullet points but is somewhat lengthy. It front-loads the purpose but could be more concise while retaining key details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description covers all necessary aspects: modes, parameters, data sources, caching, authentication, and async behavior. Given the complexity (multiple modes, output schema exists), it is complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description enriches parameters with examples (e.g., role examples, skill examples) and explains mode enum values in more detail, adding value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it provides HR tech intelligence with four specific modes (salary_benchmark, skills_taxonomy, job_market_trends, adjacent_roles), each with a concrete purpose. It distinguishes itself from sibling tools by focusing on talent/HR data exclusively.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description gives detailed guidance on when to use each mode but does not explicitly compare to alternative sibling tools. It implies usage through mode-specific use cases and parameter requirements.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

talent_legal_dashboardA

Read-onlyIdempotent

Inspect

Generates a real-time legal risk dashboard for CHROs, covering contracts, intellectual property, and labor law compliance. Inputs include jurisdiction, employee count, and risk thresholds; outputs include risk scores, actionable alerts, and source citations. Ideal for proactive legal risk management and compliance monitoring. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`includeIP`	No
`jurisdiction`	Yes
`employeeCount`	Yes
`riskThreshold`	No
`includeLaborLaw`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`alerts`	No
`status`	Yes
`sources`	No
`warnings`	No
`riskScore`	No
`lastUpdated`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnly, openWorld, and idempotent, which already clarify safety. The description adds value by warning about potential timeout with 'Pass async:true to avoid timeout', which is critical behavioral context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences that efficiently cover purpose, inputs/outputs, and a usage tip. It is front-loaded with the main action and has no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (6 params, output schema exists), the description outlines key inputs and outputs (risk scores, alerts, citations). The async tip addresses potential delays. However, it does not explain the async polling mechanism (e.g., using job_result), leaving a minor gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 17% schema description coverage, the description should compensate but only mentions jurisdiction, employee count, and risk thresholds without adding detail beyond the schema. Parameters like includeIP and includeLaborLaw remain unexplained, providing minimal added understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states that the tool generates a real-time legal risk dashboard for CHROs, covering contracts, IP, and labor law. It specifies inputs (jurisdiction, employee count, risk thresholds) and outputs (risk scores, alerts, citations), making the purpose very specific and actionable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions that it is 'ideal for proactive legal risk management and compliance monitoring,' which implies a use case. However, it does not explicitly state when to avoid using it or compare it to sibling tools like talent_contract_risk_mapper or talent_litigation_exposure, so guidance is only implied.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

talent_litigation_exposureA

Read-onlyIdempotent

Inspect

Estimates litigation exposure risk for CHROs by analyzing past employee lawsuits, settlement amounts, and industry benchmarks. Inputs include company location, industry code, and employee count range. Returns exposure score, average settlement amounts, lawsuit frequency trends, and risk factors. Ideal for legal risk assessment, HR strategy planning, and board-level reporting. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`industry_code`	Yes	NAICS industry code (e.g., '541511' for IT services)
`employee_count`	No	Current number of employees
`lookback_years`	No	Number of years to analyze
`company_location`	Yes	State or region where company operates (e.g., 'CA', 'New York')

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`warnings`	Yes
`avg_settlement`	No	Average settlement amount in USD
`exposure_score`	Yes	Normalized risk score (0-100)
`historical_trend`	No
`top_risk_factors`	No
`lawsuit_frequency`	No	Lawsuits per 1000 employees per year
`industry_benchmark`	No	Industry average exposure score

Tool Definition Quality

A4.5/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint, openWorldHint, and idempotentHint. The description adds the async capability and timeout guidance, which are behavioral traits not covered by annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences: purpose, inputs/outputs, and usage cases with async tip. No redundant information; every sentence serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema and all parameters are described in the schema, the description covers purpose, inputs, outputs summary, usage cases, and async behavior. It is complete for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds context by explaining the tool analyzes past lawsuits, settlements, and benchmarks, giving meaning to parameters like industry_code and company_location. This elevates the score to 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool estimates litigation exposure risk from past lawsuits, settlements, and benchmarks. It lists inputs and outputs, and distinguishes from sibling tools like talent_contract_risk_mapper and talent_legal_dashboard by focusing specifically on litigation exposure.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides usage scenarios such as legal risk assessment, HR strategy planning, and board-level reporting, and mentions async option to avoid timeout. However, it does not explicitly state when not to use this tool or suggest alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

talent_poaching_riskA

Read-onlyIdempotent

Inspect

Analyzes employee poaching risk for CHROs by evaluating LinkedIn profile activity (job searches, profile views) and comparing compensation against BLS benchmarks. Returns a ranked list of high-risk employees with risk scores and suggested retention actions. Ideal for proactive talent retention strategies. Keywords: employee retention, poaching risk, compensation benchmark, LinkedIn activity, CHRO analytics.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`location`	No	Geographic location filter (e.g., 'San Francisco, CA')
`department`	Yes	Department filter (e.g., 'Engineering', 'Sales')
`min_tenure_months`	No	Minimum tenure in months to include in analysis
`benchmark_job_title`	No	Specific job title for compensation benchmarking

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`risk_assessment`	No
`department_avg_risk`	No
`benchmark_comparison`	No

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds behavioral context beyond read-only and idempotent annotations: specifies data sources (LinkedIn activity, BLS benchmarks) and output (ranked list with scores and actions). No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences front-loading core functionality, followed by use case and keywords. No superfluous content.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Covers core functionality, output format, and use case. Does not mention async polling behavior, but output schema exists and async is a common pattern across tools, so not a critical gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% so baseline is 3. Description adds high-level purpose (e.g., benchmarking) but does not elaborate on individual parameters beyond what schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it analyzes employee poaching risk for CHROs using LinkedIn activity and compensation benchmarks, returning a ranked list. Distinguishes from siblings like talent_intelligence or talent_contract_risk_mapper.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Identifies target user (CHROs) and use case (proactive retention strategies). Lacks explicit when-not-to-use or alternative tools, but context and sibling list provide implicit differentiation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tariff_arbitrage_finderA

Read-onlyIdempotent

Inspect

As a COO, identify tariff reclassification opportunities to reduce import costs. Analyzes product HS codes against WTO TFA and USA Trade Online data to find lower-duty classifications. Inputs: product description, current HS code, country of origin, and annual import volume. Outputs: potential duty savings, alternative HS codes, and compliance considerations.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`annualVolume`	No
`currentHsCode`	Yes
`countryOfOrigin`	Yes
`currentDutyRate`	No
`productDescription`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`opportunities`	No

Tool Definition Quality

A3.7/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description reveals that the tool analyzes external data sources (WTO TFA, USA Trade Online), which is valuable behavioral context. Annotations already indicate readOnlyHint, openWorldHint, and idempotentHint, so the description adds complementary info. There is no contradiction with annotations. It could be improved by mentioning any rate limits or data freshness, but overall it provides sufficient transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a lead sentence specifying the tool's role, followed by a brief explanation of methodology, then a clear listing of inputs and outputs. It is concise with no redundant information. The opening phrase 'As a COO' subtly targets the user, which is acceptable. Minor improvement could be to reorder for maximum clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (context signal indicates true), the description does not need to detail return values, but it provides a summary of outputs anyway. It covers the core functionality and required parameters. Lacking is information on processing time, authentication requirements, or how compliance considerations are derived. Nonetheless, for a tariff analysis tool with good annotations, completeness is above average.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 17% schema description coverage, the description partially compensates by explaining the purpose of four key parameters (product description, current HS code, country of origin, annual volume). However, it omits two parameters (async, currentDutyRate) and does not add formatting or constraint details beyond the schema. The added value is moderate but incomplete.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: identifying tariff reclassification opportunities to reduce import costs using HS code analysis against specific data sources (WTO TFA, USA Trade Online). It specifies the verb 'identify' and resource 'tariff reclassification opportunities', making the purpose clear. However, it does not explicitly differentiate from sibling tools like tariff_impact_simulator, which could cause confusion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions the target user ('As a COO') and lists inputs and outputs, giving implied usage context. However, it does not provide explicit guidance on when to use this tool versus alternatives, nor does it state any exclusions or prerequisites. The context is clear but lacks depth for optimal agent decision-making.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tariff_impact_simulatorA

Read-onlyIdempotent

Inspect

As a COO, model how proposed tariff changes affect landed costs for imported goods. Inputs: HS code, current tariff rate, proposed tariff rate, product value, shipping cost, and country of origin. Outputs: detailed cost breakdown including new duties, taxes, and total landed cost impact. Sources include WTO TFA and US Census trade data.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`hsCode`	Yes
`productValue`	Yes
`shippingCost`	No
`countryOfOrigin`	Yes
`currentTariffRate`	Yes
`proposedTariffRate`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`costImpact`	No
`currentDuty`	No
`proposedDuty`	No
`dutyDifference`	No
`currentLandedCost`	No
`proposedLandedCost`	No
`costImpactPercentage`	No

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already provide readOnlyHint, openWorldHint, and idempotentHint. The description adds context by mentioning data sources (WTO TFA and US Census trade data), which enhances transparency. However, it fails to mention the async parameter behavior, which is a notable omission given that the input schema includes an 'async' flag.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, consisting of three sentences that front-load the purpose. Every sentence adds value: purpose, input/output summary, and data sources. No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (7 parameters, low schema coverage) and the presence of an output schema, the description covers the main use case but misses the async behavior, which is relevant for tool invocation. It also does not differentiate from siblings strongly. The data source mention adds value, but the omission of async is a gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (14%), so the description must compensate. It lists 6 of 7 parameters and explains their role (e.g., HS code, tariff rates, product value, etc.), providing high-level meaning. However, it does not detail specific constraints or formatting beyond what the schema already provides, and it omits the 'async' parameter entirely.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: modeling how proposed tariff changes affect landed costs. It specifies a specific verb ('model'), resource ('tariff changes'), and context ('for imported goods'). It lists inputs and outputs, and distinguishes from sibling tools like 'tariff_arbitrage_finder' by focusing on simulation rather than arbitrage.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage for tariff impact simulation but does not explicitly state when to use or not use this tool versus alternatives. No mention of prerequisites or exclusions. The sibling list includes related tools, but no comparative guidance is provided.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tax_compliance_multiA

Read-onlyIdempotent

Inspect

Multi-jurisdiction tax compliance data for international SaaS, cross-border marketplaces and expat services. Five modes: (1) vat_lookup — validate EU VAT numbers live via VIES SOAP (27 EU countries) or UK VRN via HMRC; (2) sales_tax — US state sales tax rates, nexus thresholds (post-Wayfair 2018), digital goods taxability for all 50 states + DC; (3) gst — APAC GST/SST/consumption-tax rates for IN, SG, AU, NZ, MY, JP, KR, TH, ID, PH, VN with reduced rates and registration thresholds; (4) oss_ioss_eligibility — EU One-Stop-Shop and Import-OSS eligibility analysis (EUR 10k OSS threshold, EUR 150 IOSS per-consignment); (5) transfer_pricing_benchmark — OECD/JTPF operating-margin benchmarks by industry and country (20+ sectors, country-specific adjustments). Returns P0/P1/P2 compliance signals: P0=invalid VAT used for zero-rating, P1=taxable digital goods detected/audit risk, P2=filing deadlines/nexus alerts. Keyless — no API key required. Optional env: HMRC_VAT_API_KEY for UK VAT live validation. Cache TTL 24h.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Tax mode to invoke.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Mode-specific query: vat_lookup -> VAT number with country prefix (e.g. 'FR40303265045'); sales_tax -> US state code or name (e.g. 'CA', 'California'); gst -> ISO country code (e.g. 'SG', 'IN', 'AU'); oss_ioss_eligibility -> annual EU B2C revenue in EUR or keyword (e.g. '5000', 'below'); transfer_pricing_benchmark -> industry name (e.g. 'manufacturing', 'saas', 'r&d').
`country`	No	ISO 3166-1 alpha-2 country code. Required for gst when query is ambiguous. Used in transfer_pricing_benchmark for country-specific OECD adjustments.
`transaction_type`	No	Transaction type for signal generation. 'digital' triggers GST/sales-tax digital goods warnings.

Output Schema

ParametersJSON Schema

Name	Required	Description
`gst`	No
`mode`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`oss_ioss`	No
`sales_tax`	No
`vat_lookup`	No
`quality_score`	Yes
`transfer_pricing`	No

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, idempotentHint=true, destructiveHint=false. The description adds valuable behavioral traits: keyless access, optional HMRC API key, 24h cache TTL, and P0/P1/P2 compliance signals, enriching agent understanding.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is lengthy but well-structured with numbered modes and clear sentence flow. It front-loads the main purpose and provides essential details without excessive verbosity for the complexity involved.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's five modes, complex input schema, and output availability, the description fully covers all aspects: purpose, modes, parameters, return signals, authentication, and caching. It leaves no major gaps in understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, but the description significantly augments parameter meaning by explaining mode-specific query formats and the role of optional parameters like country and transaction_type, adding context beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides 'multi-jurisdiction tax compliance data' for specific business types, and enumerates five distinct modes with explicit purposes. This specificity differentiates it from sibling tools like tax_optimization.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description details each mode and its query format, giving clear context for use. However, it does not explicitly state when not to use this tool or mention alternatives among siblings, though the mode breakdown provides implicit guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tax_optimizationC

Read-only

Inspect

Optimisation fiscale — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Pennylane — Fiscalité optimisée · CIR €1.2M · IP Box France 10% · Économie totale €2.4M/an. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`ipAssets`	No
`activities`	Yes
`financials`	Yes
`jurisdictions`	Yes
`currentTaxOptimizations`	No

Tool Definition Quality

C2.4/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true. The description adds minimal behavioral context: it mentions server-side validation and that inputs are 'documented case fields,' but doesn't discuss rejection behavior, rate limits, or how the open-world hint manifests (e.g., external data sources). No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short and includes a concrete reference case, which adds context but also some noise. It could be more focused on essential purpose and usage without the promotional language ('optimisation fiscale — Gapup agent-payable C-suite expertise'). The structure is adequate but not exemplary.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 7 parameters, nested objects, no output schema, and moderate complexity. The description does not explain what the deliverable contains, how to interpret results, or edge cases (e.g., missing financials). Annotations provide some context (readOnlyHint, openWorldHint) but are insufficient alone.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters1/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 14% (only the async parameter is described). The tool description does not explain any of the 12+ undocumented fields (e.g., company, financials, jurisdictions, activities). It only includes a vague reference to 'send the documented case fields,' leaving the agent to guess parameter meanings and constraints.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description references 'Optimisation fiscale' and a case study with CIR, IP Box, and savings, making the tool's purpose clear: generating tax optimization recommendations. However, it could be more explicit about the output being a structured deliverable with actionable strategies, and the phrase 'Gapup agent-payable C-suite expertise (CFO)' is somewhat opaque.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool vs. alternatives, such as tax_compliance_multi or other tax-related siblings. It only hints at a CFO audience but does not specify scenarios, prerequisites, or when not to use it.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

term_sheet_negotiationB

Read-only

Inspect

Négociation term sheet — Gapup agent-payable C-suite expertise (FUNDRAISING). Returns a structured, audited deliverable. Reference case: Agicap Série C €50M — 8 clauses analysées · 3 rouges · Score fondateur 62/100 → plan pour 81. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`round`	Yes
`company`	Yes
`termSheetClauses`	Yes

Tool Definition Quality

B3/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations include readOnlyHint=true and openWorldHint=true. The description adds that inputs are validated server-side and returns a deliverable, consistent with a read-only analysis. No contradiction, but no significant additional behavioral disclosure beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is relatively concise but includes a verbose reference case that may be distracting. It could be more front-loaded with the primary purpose and clearer structure.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (4 parameters, nested objects, no output schema), the description does not fully explain the return format or how the deliverable helps the user. It mentions a reference case but lacks completeness for an agent to understand output structure.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25% (async param described). The description says 'send the documented case fields' but does not elaborate on the required objects (round, company, termSheetClauses) or their fields. This fails to add meaningful context beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is for term sheet negotiation in fundraising, and mentions it returns a structured audited deliverable. It distinguishes from a broad set of siblings by its specific focus, though it does not explicitly differentiate from similar tools like deal_coach.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides implicit context (for fundraising term sheet negotiation) but no explicit guidance on when to use versus alternatives. It mentions server-side validation but lacks when-not-to-use or exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

tool_recommendA

Read-only

Inspect

Cross-tool recommendation system: given a free-text intent, returns the most appropriate tools from the 170+ Gapup MCP catalogue, ranked by confidence, with pre-filled input suggestions and an optimal multi-tool chain when applicable. Use this first when you are unsure which tool to call — it navigates the full catalogue for you. Supports 15+ static pre-designed chains for frequent intents (M&A due diligence, sanctions screening, ESG 360, AI Act compliance, FTO patent clearance, crypto wallet tracking, etc.). Domains: compliance | finance | intel | legal | content | data | trade | infra. Pure compute — $0.01/call, no external fetch. Ideal as a first call in any multi-step agent workflow.

ParametersJSON Schema

Name	Required	Description
`lang`	No	Optional ISO 639-1 language hint (fr, en, de, zh, es …). Used for language-aware boosting.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`domain`	No	Optional domain hint to boost tools in this category.
`intent`	Yes	Free-text description of what you want to accomplish. E.g. 'Run a full M&A due diligence on Acme Corp' or 'Je veux vérifier qu'un fournisseur n'est pas sous sanctions OFAC'. FR/EN/DE/ZH supported.
`max_results`	No	Max number of recommendations returned (1-10). Default 5.
`include_chain`	No	Whether to include a suggested_chain of tools in the optimal sequence. Default true. Chain is always included for well-known intents (M&A, compliance, ESG, etc.).

Output Schema

ParametersJSON Schema

Name	Required	Description
`intent`	Yes
`status`	Yes
`sources`	No
`not_covered`	No
`quality_score`	Yes
`recommendations`	Yes
`suggested_chain`	No
`alternative_paths`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and openWorldHint=false. The description adds that it's pure compute with no external fetch, costs $0.01/call, and explains the async parameter behavior. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately long but well-structured, with key information front-loaded. Every sentence serves a purpose, though it could be slightly more concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of the tool (recommending from 170+ tools with chains and domains), the description covers essential aspects. It mentions static chains, domains, and pricing. With an output schema available, the description is sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so parameters are already documented. The description adds useful context such as language support ('FR/EN/DE/ZH'), domain hints, and the behavior of include_chain. This adds value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: a cross-tool recommendation system that returns the most appropriate tools from a catalogue based on free-text intent. It distinguishes itself from siblings by being the first call when unsure, which is reinforced by the explicit instruction to use it first.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says 'Use this first when you are unsure which tool to call' and provides context about domains and chains. It does not explicitly state when not to use it, but the usage context is clear enough for an agent to decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

trade_finance_eligibilityB

Read-onlyIdempotent

Inspect

Evaluates trade finance eligibility for CFOs by analyzing counterparty risk and jurisdiction using World Bank and BIS data. Inputs include counterparty country code (ISO 3166-1 alpha-3) and industry sector. Returns risk scores, eligibility flags, and financing terms. Ideal for assessing letters of credit, export credit agency guarantees, and other trade finance instruments. Keywords: trade finance, counterparty risk, jurisdiction risk, letters of credit, ECA guarantees.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`industrySector`	Yes
`annualTradeVolumeUSD`	No
`counterpartyCountryCode`	Yes
`counterpartyCreditRating`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`eligibility`	No
`financingTerms`	No
`countryRiskScore`	No
`maxFinancingAmountUSD`	No
`recommendedInstruments`	No

Tool Definition Quality

B3.2/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and idempotentHint=true, so the description adds moderate value by specifying data sources (World Bank, BIS) and output details (risk scores, eligibility flags, terms). No contradictions or additional behavioral traits are disclosed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (four sentences plus keywords) and front-loaded with the core purpose. It is well-structured, moving from purpose to inputs/outputs to use cases. No redundant information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description provides a reasonable overview. However, it omits explanation of optional parameters and their impact, and does not detail the risk evaluation logic. It is adequate but not fully complete for a tool with 5 parameters.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, yet the description only elaborates on two of five parameters (counterpartyCountryCode and industrySector) with format hints. It ignores optional parameters like annualTradeVolumeUSD and counterpartyCreditRating, which could be critical for accuracy. This is insufficient compensation for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool evaluates trade finance eligibility for CFOs, using specific data sources (World Bank, BIS). It mentions counterparty risk and jurisdiction analysis, making the purpose distinct. However, it does not explicitly differentiate from sibling tools like 'africa_trade_finance_esg_rater', leaving some ambiguity.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides implied usage context by stating it is 'ideal for assessing letters of credit, export credit agency guarantees, and other trade finance instruments.' However, it gives no explicit guidance on when not to use this tool or what alternatives exist among siblings, which is a gap.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

transcribe_chapterize_mediaA

Read-onlyIdempotent

Inspect

Transcription and chapterization of long-form media (YouTube, podcasts, direct audio/video) for content marketing teams, podcast publishers, edu tech, journalists and accessibility/compliance.

Pipeline: • YouTube → timedtext captions (keyless) + oEmbed metadata + native timecode chapters from description • Podcast RSS → episode description + duration + timecodes if embedded in show notes • Direct media → partial (requires Whisper API via OPENAI_API_KEY + force_whisper:true) • Chapters: native YouTube timecodes preferred; heuristic TF-IDF segmentation as fallback • Summary: extractive TF-IDF top-sentences (no LLM required) • Language detection: character-set heuristic (CJK→zh, kana→ja, hangul→ko, accents→fr/de/es)

Output formats: json (full structured object) | text (plain transcript) | srt | vtt

SLA: ≤15s budget total. Cache: 24h TTL.

ParametersJSON Schema

Name	Required	Description
`url`	Yes	YouTube URL, podcast RSS feed URL, or direct MP3/MP4 URL. Example: "https://www.youtube.com/watch?v=jNQXAC9IVRw"
`lang`	No	ISO 639-1 language hint (e.g. "en", "fr", "de"). Default "auto".
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`chapters_max`	No	Maximum number of chapters. Default 8.
`output_format`	No	Transcript format. Default "json".
`include_summary`	No	Include extractive summary. Default true.

Output Schema

ParametersJSON Schema

Name	Required	Description
`url`	Yes
`status`	Yes
`signals`	Yes
`sources`	Yes
`summary`	No
`chapters`	Yes
`segments`	Yes
`key_topics`	Yes
`transcript`	Yes
`source_type`	Yes
`lang_detected`	Yes
`quality_score`	Yes
`duration_seconds`	Yes

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint, idempotentHint, and destructiveHint; the description adds rich behavioral details: pipeline steps, caching (24h TTL), SLA (≤15s), language detection heuristics, chapter fallback (native vs TF-IDF), and summary method (extractive TF-IDF). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is well-structured with a lead sentence followed by bullet points. Each section adds essential information. Slightly longer than necessary but no fluff; efficient in conveying complex pipeline details.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple sources, async support, output formats, caching, SLA, language detection, chapterization fallback), the description covers all critical aspects. Output schema exists but not needed; description is self-sufficient for agent decision-making.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

All 6 parameters have schema descriptions (100% coverage). The description adds context beyond schema: explains default for output_format (json), chapters_max default 8, and the force_whisper:true nuance for direct media, which is not in schema. Provides meaningful added value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool transcribes and chapterizes long-form media from YouTube, podcasts, and direct audio/video. It specifies output formats and pipeline steps, making it distinct from any sibling tools (none of which are transcription-focused).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists target users (content marketing teams, podcast publishers, etc.) and explains when to use async mode for slow requests. It implies when to use force_whisper for direct media but does not explicitly exclude other scenarios or mention alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

treasury_optimizerC

Read-only

Inspect

Optimiseur de trésorerie — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Alan — Trésorerie €380M post-Série F · Allocation optimale 4 instruments · Yield +145bp · +€5.5M/an. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`horizon`	No
`constraints`	Yes
`cashPosition`	Yes

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, so the tool is read-only. The description adds that inputs are validated server-side, consistent with readOnly. No contradiction, but no additional behavioral details like authentication needs or rate limits.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short with a front-loaded purpose. However, the reference case example adds marketing fluff rather than essential information, slightly reducing conciseness.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, nested objects, no output schema), the description is insufficient. It does not explain the deliverable's content, result interpretation, or prerequisites, leaving gaps for proper use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is low (20%), and the description does not explain individual parameters. It only refers to 'documented case fields' without elaboration, failing to compensate for the schema's lack of parameter descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states the tool is a cash optimizer ('Optimiseur de trésorerie') for CFOs and returns a structured deliverable. It provides a concrete reference case, making the purpose clear. However, it does not differentiate from sibling tools like working_capital.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description only says 'send the documented case fields' but lacks explicit usage context or exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

trend_watcherA

Read-onlyIdempotent

Inspect

Monitor emerging trends, regulatory shifts and adoption signals for a given market sector. Returns 5-12 trend cards, each with a momentum score (rising/stable/declining), a 3-month and 12-month outlook, opportunity windows, and recommended actions. When to use this tool: the user asks what is heating up in a market, wants to time a product roadmap or content calendar, or needs an early read on a sector. Inputs: a sector to monitor and 3-8 keywords defining the watch perimeter. Delivered by Manue, the AI CMO of the Gapup portfolio.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`focus`	No	Optional context (geography, language target, comparator window, etc.)
`sector`	Yes	Sector to monitor (e.g. 'B2B SaaS productivity', 'EU fintech', 'climate-tech hardware')
`keywords`	Yes	3-8 keywords describing the watch perimeter

Output Schema

ParametersJSON Schema

Name	Required	Description
`kpis`	No	3-5 headline KPI bubbles
`trends`	Yes	5-12 trend cards for the sector
`recommendations`	No	Prioritised strategic recommendations
`executiveSummary`	Yes	Board-ready sector overview prose

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive behavior. The description adds detail on output structure (momentum scores, outlooks, opportunity windows) and notes it is delivered by Manue, providing context beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences: first covers purpose and output, second gives usage scenarios, third lists inputs. Front-loaded, no wasted words, efficient and clear.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given output schema exists (not shown), the description covers input, output format, and usage context. It is complete enough for an agent to decide to use it, though it could mention data freshness or limitations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description summarizes key inputs (sector and keywords) but does not explain 'async' or 'focus' parameters, which are only in the schema. It adds little extra meaning beyond what the schema already provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool monitors emerging trends, regulatory shifts, and adoption signals for a market sector, with a specific output of 5-12 trend cards containing momentum scores and outlooks. This distinguishes it from sibling tools like market_research_brief or competitor_intel.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly lists when to use: when the user asks what is heating up, wants to time roadmap or content calendar, or needs an early read. It provides clear scenarios but does not mention alternatives or when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

ugc_moderation_classifierA

Read-onlyIdempotent

Inspect

Multi-language UGC content moderation for marketplaces, social platforms and comment systems. Detects policy violations in text content across 9 policies and 12 languages without external API calls.

Policies checked: • hate — hate speech, slurs, dehumanization (50+ terms × 12 languages) • sexual — explicit sexual content, pornography references, nudity solicitation • violence — threats, weapon references, graphic violence • self_harm — suicidal ideation, self-injury, eating disorder promotion • harassment — doxxing, stalking, cyberbullying, blackmail • scam — phishing, investment fraud, romance scam, lottery fraud • spam — bots, keyword stuffing, excessive caps, emoji storms, suspicious URLs • copyright — piracy, leaked content, serial keys, streaming fraud • minor_safety — grooming signals, CSAM references, minor + adult content combos

Languages: en / fr / de / es / it / pt / nl / zh / ja / ko / ar / ru (auto-detected)

Output includes severity (low/medium/high/severe), confidence (0-100), matched patterns, excerpt, recommended action, age appropriateness (adult/teen/child), and signals.

No API key required. Stateless — no content is stored or logged.

ParametersJSON Schema

Name	Required	Description
`lang`	No	Language override. If omitted, language is auto-detected.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`content`	Yes	Text content to moderate (comment, review, post, chat message).
`policies`	No	Policies to check. Default: all 9 policies.
`content_type`	No	Type of content. Affects recommended_action heuristic. Default: comment.

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`signals`	Yes
`sources`	Yes
`violations`	Yes
`lang_detected`	Yes
`quality_score`	Yes
`age_appropriate`	Yes
`content_preview`	Yes
`policies_checked`	Yes
`recommended_action`	Yes

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description reveals statelessness, no content storage, and no API key requirement, adding significant context beyond annotations. It also explains the async parameter and output fields (severity, confidence, etc.). Annotations are consistent (readOnlyHint, idempotentHint, destructiveHint). No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a header, bulleted policy list, and clear sections for languages and output. It is informative and front-loaded, though slightly long. Every sentence adds value, but the bullet list could be condensed without losing clarity.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (9 policies, 12 languages, 5 parameters, output schema present), the description is thorough. It covers input, processes, output fields, statelessness, and async behavior. The existence of an output schema ensures return values are documented, so completeness is high.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by explaining auto-detection for language, that 'policies' defaults to all 9, and that 'content_type' affects the recommended_action heuristic. It also describes the async parameter's purpose. This extra context justifies a 4.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it is a multi-language UGC content moderation tool that detects policy violations across 9 policies and 12 languages. It provides a detailed list of policies and languages, making its purpose unmistakable and distinguishing it from other tools on the server.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states that no external API calls are needed and that the tool is stateless, giving clear context for usage. However, it does not explicitly state when not to use this tool or mention alternatives, which slightly limits guidance. The specificity of policies and languages makes usage clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

upsell_hunterC

Read-only

Inspect

Chasseur d'upsell — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Upsell 8 comptes · €127k potentiel · Top 3 : Alan+Qonto+Pennylane · Playbook 5 étapes. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`horizon`	No
`product`	Yes
`accounts`	Yes
`targetUpsellEur`	No

Tool Definition Quality

C2.6/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint and openWorldHint, so the description's mention of 'validated server-side' adds little. The description does not contradict annotations, but also does not elaborate on behavioral traits such as cost implications ('agent-payable') or side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise but includes a specific reference case that may not be universally understood. It could be clearer by stating the tool's function in general terms without relying on internal jargon.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of the input schema (nested objects, many optional fields) and no output schema, the description is insufficient. It does not describe what the deliverable contains or how results are presented, leaving the agent with limited understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is very low (17%). The description does not explain any of the 6 parameters, including the required nested objects. It simply says 'send the documented case fields', adding no semantic value beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose3/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description states 'Chasseur d'upsell' and mentions returning a structured deliverable, but does not explicitly state what the tool does in plain terms. The reference case provides context, but the purpose is somewhat vague for an AI agent unfamiliar with 'Gapup' and 'CRO'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance is provided on when to use this tool versus alternative tools like cross_sell_reco or account_expansion_mapper. The description lacks any cues for selection among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

usdc_x402_payments_intelA

Read-only

Inspect

Real-time analytics on x402 protocol USDC micropayments for MCP endpoints on Base network. Unique competitive advantage: aggregates internal production telemetry (our own traffic data) with on-chain USDC Transfer events and Bazaar marketplace listings — data no external competitor can access. Four modes: (1) facilitator_stats — Coinbase x402 facilitator settlement statistics (volume, count, top payees/payers). Uses Coinbase CDP API if COINBASE_X402_API_KEY is set; falls back to Base mainnet RPC scan of USDC transfers to known facilitator addresses. (2) endpoint_intel — Per-MCP-endpoint analytics: tx count, USDC volume, unique callers, success rate, catalog size. For gapup-mcp.io endpoints: reads internal JSONL telemetry (richest data source, unique). (3) agent_caller_profile — Anonymous profile of a calling agent wallet: tx count, USDC spent, top endpoints, inferred persona (depth-seeker / bulk-scanner / generalist / researcher / explorer). Wallet anonymised via SHA-256. (4) price_radar — USDC price distribution by tool category (data_lookup / synthesis / compliance / competitive) from Bazaar + internal catalog. Returns median, P25, P75. Network: Base mainnet. USDC contract: 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913. Cache: 30 min LRU. Timeout per source: 8s. Optional env: COINBASE_X402_API_KEY (higher-fidelity facilitator stats).

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	Analytics mode: facilitator_stats=network-wide settlements \| endpoint_intel=per-URL analytics \| agent_caller_profile=per-wallet analytics \| price_radar=price distribution by category
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`category`	No	Tool category for price_radar mode. Defaults to all.
`period_days`	No	Lookback window in days (5-90, default 30)
`endpoint_url`	No	MCP endpoint URL for endpoint_intel mode (e.g. https://mcp.gapup.io/mcp)
`wallet_address`	No	EVM wallet address for agent_caller_profile mode (0x...)

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`sources`	Yes
`price_radar`	No
`quality_score`	Yes
`endpoint_intel`	No
`facilitator_stats`	No
`agent_caller_profile`	No

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false, so the tool is safe. The description adds significant behavioral details: cache duration (30 min LRU), timeout per source (8s), data sources (internal telemetry, on-chain USDC transfers, Bazaar listings), fallback mechanism, and wallet anonymization via SHA-256. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with a clear intro, competitive advantage, and four bulleted modes. It is front-loaded with the main purpose. However, it is somewhat verbose; some details (e.g., exact cache/timeout values) could be streamlined. Still, every sentence adds value for selection and invocation.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (four modes, six parameters, seven siblings, and an output schema), the description covers all necessary aspects: mode selection, parameter usage, data sources, caching, timeouts, fallback, and environment setup. The existence of an output schema means return values do not need to be explained. It is fully complete for an agent to select and invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% parameter description coverage with enums and defaults. The description adds meaning beyond schema: e.g., 'For gapup-mcp.io endpoints: reads internal JSONL telemetry (richest data source, unique)' for endpoint_url, and 'Wallet anonymised via SHA-256' for wallet_address. It also explains the behavior of the 'async' parameter and how to poll results.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides 'real-time analytics on x402 protocol USDC micropayments for MCP endpoints on Base network.' It lists four specific modes (facilitator_stats, endpoint_intel, agent_caller_profile, price_radar), each with a clear verb and resource. The tool is uniquely positioned among siblings as a specialized analytics tool for x402 micropayments, with no overlapping purpose.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly explains when to use each mode, including specific use cases and fallback behavior (e.g., 'Uses Coinbase CDP API if COINBASE_X402_API_KEY is set; falls back to Base mainnet RPC scan'). It also provides guidance on optional environment variable. However, it does not explicitly state when NOT to use this tool in favor of alternatives, though the narrow niche makes it clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vendor_esg_blacklist_monitorA

Read-onlyIdempotent

Inspect

As a COO, quickly check if a vendor is blacklisted for ESG non-compliance using CDP and GRI data. Input the vendor's legal name or identifier to receive their ESG risk score, blacklist status, and compliance violations. Returns structured data including CDP disclosure score, GRI alignment, and any regulatory flags. Ideal for vendor due diligence, risk assessment, and sustainability reporting. Keywords: ESG, vendor risk, compliance, CDP, GRI, sustainability, blacklist.

ParametersJSON Schema

Name	Required	Description
`year`	No	Reporting year (default: current year)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`vendorId`	No	Optional identifier (e.g., LEI, DUNS)
`vendorName`	Yes	Legal name of the vendor to check

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`vendorId`	No
`warnings`	Yes
`griAligned`	No
`vendorName`	Yes
`violations`	No
`blacklisted`	Yes
`esgRiskScore`	No
`cdpDisclosureScore`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, idempotentHint. The description adds value by detailing the structured output (ESG risk score, blacklist status, CDP score, GRI alignment, regulatory flags) and the input format (vendor legal name or identifier). No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph that efficiently states purpose, input, output, and use cases. The inclusion of keywords at the end aids discoverability. No unnecessary sentences.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of output schema (not shown but referenced), rich annotations, and 100% parameter coverage, the description adequately covers what the tool does, what it returns, and when to use it. It does not need to explain return values as those are in the output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% so baseline is 3. Description mentions vendorName as legal name and async as a way to get job_id, but does not add significant extra meaning beyond the schema definitions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb (check) and resource (vendor blacklist) with clear scope (ESG non-compliance using CDP and GRI data). It distinguishes from siblings like vendor_risk_assessor by focusing on blacklist status and specific frameworks.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states the target user (COO) and use cases (vendor due diligence, risk assessment, sustainability reporting). While it doesn't mention when to avoid this tool or name alternatives, the context is clear and appropriate for the tool's role.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vendor_esg_diversity_scannerA

Read-onlyIdempotent

Inspect

For COOs: scans vendor ESG reports to identify suppliers lacking diversity disclosures in GRI or CDP filings. Input a supplier name or identifier to receive a structured assessment of gender, ethnicity, and board diversity metrics. Returns compliance gaps, missing data flags, and source references from CDP open data and GRI standards. Ideal for vendor risk assessment and ESG compliance tracking.

ParametersJSON Schema

Name	Required	Description
`year`	No	Reporting year to check (default: current year)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`supplierId`	No	CDP or GRI identifier for the supplier (e.g., CDP company ID)
`supplierName`	Yes	Exact or partial name of the supplier to scan

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`reportLinks`	No	URLs to relevant ESG reports
`supplierName`	Yes
`complianceScore`	Yes	Percentage compliance with diversity disclosure standards
`diversityDisclosures`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, openWorldHint, and idempotentHint. The description adds context on data sources (CDP open data, GRI standards) and output contents (compliance gaps, missing data flags), which is useful beyond the annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four concise sentences that front-load the primary purpose and build logically: target user, action, input, output, and ideal use. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool complexity (4 params, provides output schema), the description covers purpose, inputs, outputs, and use cases. It does not detail output structure (handled by output schema) or edge cases, but remains adequate for a scanning tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% with adequate descriptions for all 4 parameters. The tool description does not add significant new semantic details beyond what the schema already provides, so the baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description starts with a clear target user (COOs) and a specific action ('scans vendor ESG reports to identify suppliers lacking diversity disclosures'). It names the data sources (GRI or CDP) and metrics (gender, ethnicity, board diversity), distinguishing it from sibling tools like 'supplier_esg_audit' which likely cover broader ESG aspects.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description specifies inputs ('Input a supplier name or identifier') and output ('structured assessment... compliance gaps, missing data flags'). It states ideal use cases ('vendor risk assessment and ESG compliance tracking'), but does not explicitly mention when to avoid or alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vendor_managementC

Read-only

Inspect

Gestion des fournisseurs — Gapup agent-payable C-suite expertise (COO). Returns a structured, audited deliverable. Reference case: Qonto (12 fournisseurs · €2.4M/an) — €290k économies identifiées · 4 renegociations prioritaires. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`vendors`	Yes
`objectives`	Yes

Tool Definition Quality

C2.6/5.0

Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations provide readOnlyHint=true, indicating no mutation. The description adds server-side validation but does not elaborate on behavioral traits beyond that. Does not contradict annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is compact but includes a reference case that, while illustrative, adds extra length without core operational instruction. Could be more streamlined.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with nested objects and no output schema, the description fails to explain the deliverable structure, return format, or how to interpret results. Incomplete for effective use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 25%, and the description provides no details about parameters (company, vendors, objectives). It merely says 'send the documented case fields', which is insufficient.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description implies the tool analyzes vendor data to identify savings and prioritization (reference case shows savings and renegotiations), but does not explicitly state the core purpose. Distinguishes from sibling vendor tools via case example, but could be clearer.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives like vendor_risk_assessor or procurement_spend_optim. The description lacks context on prerequisites or decision criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vendor_risk_assessorC

Read-only

Inspect

Évaluateur de risque fournisseurs — Gapup agent-payable C-suite expertise (RISK). Returns a structured, audited deliverable. Reference case: Gapup Hub — 15 fournisseurs · €1.8M spend · 3 critiques · Heatmap + plan de remédiation. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`vendors`	Yes
`riskFramework`	No
`assessmentPurpose`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=true and openWorldHint=true, and the description adds that it returns a structured, audited deliverable with server-side validation. This aligns with annotations and adds some context (e.g., deliverable nature), but does not disclose the external access implied by openWorldHint or detail what 'audited' entails. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is moderately concise at three sentences, but includes a dense reference case that may be superfluous. The structure could be improved by front-loading the core purpose and omitting the example. It earns its keep but has some waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 parameters, nested objects, no output schema), the description is incomplete. It does not explain what the deliverable contains, how to interpret results, or the roles of riskFramework and assessmentPurpose. The agent lacks sufficient information to use the tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 20%, meaning most parameters lack descriptions. The description does not elaborate on the parameters beyond 'send the documented case fields.' For a complex nested schema (company, vendors, riskFramework, assessmentPurpose), the agent receives minimal guidance on how to populate fields like vendors[].criticality or knownIssues. The description fails to compensate for low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a vendor risk assessor ('Évaluateur de risque fournisseurs') and mentions it returns a structured, audited deliverable. The purpose is clear, though the French language may reduce clarity for English-dominant AI agents. The included reference case helps illustrate the output but doesn't differentiate from sibling tools like vendor_management.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides no guidance on when to use this tool versus alternatives (e.g., vendor_esg_blacklist_monitor). It only states that inputs are validated server-side, but does not explain context, prerequisites, or exclusions. The agent is left to infer usage from the name alone.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vertical_ai_agent_governanceB

Read-onlyIdempotent

Inspect

Generates a comprehensive vertical AI agent workforce integration plan for CHROs, including governance frameworks, human-AI collaboration metrics, and upskilling recommendations. Inputs: industry vertical, workforce size, and current AI adoption level. Outputs: role-specific AI integration roadmaps, skill gap analysis, and performance benchmarks. Uses O*NET skill taxonomies and Gartner AI adoption trends. For best results with large datasets, pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`industry`	Yes
`target_roles`	No
`workforce_size`	Yes
`ai_adoption_level`	No
`include_benchmarks`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`skill_gap_analysis`	No
`integration_roadmap`	No
`collaboration_metrics`	No
`governance_recommendations`	No

Tool Definition Quality

B3.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and openWorldHint=true. The description adds context by stating it uses O*NET taxonomies and Gartner trends, and explains the async behavior. It does not contradict annotations and provides useful behavioral details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise at 3-4 sentences, with a logical structure: purpose, inputs/outputs, sources, and optimization tip. It is front-loaded with the core function. Minor improvement could include clearer separation of sections.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (context confirms it exists), the description appropriately does not detail return values. However, it only covers three of six parameters and does not explain how to interpret the results. The tool is moderately complex, and the description leaves gaps for a complete understanding.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 17% (only async parameter has a description). The description mentions only three of the six parameters (industry, workforce_size, ai_adoption_level) and omits target_roles and include_benchmarks. It does not adequately compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool generates a vertical AI agent workforce integration plan for CHROs, specifying inputs and outputs. However, it does not distinguish this tool from its many siblings, such as ai_governance_full_report_async or ai_governance_pilot, which could cause confusion.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides a specific optimization tip about using async:true for large datasets to avoid timeouts. It implies when to use the tool based on inputs like industry and workforce size, but does not mention when not to use it or suggest alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vuln_exploitability_forecastA

Read-onlyIdempotent

Inspect

As a CTO, assess the exploitability risk of CVEs using EPSS scores and cloud asset exposure data. Input a CVE ID (e.g., CVE-2021-44228) to receive exploitability likelihood, affected cloud services, and threat intelligence context. Returns structured risk metrics for prioritization. Sources: CVE NVD, OpenCVE, GitHub Advisories. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`cveId`	Yes
`cloudProvider`	No
`includeDetails`	No

Output Schema

ParametersJSON Schema

Name	Required	Description
`cveId`	Yes
`status`	Yes
`sources`	Yes
`warnings`	Yes
`epssScore`	No
`lastUpdated`	No
`cloudExposure`	No
`epssPercentile`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, open world, idempotent. Description adds data sources (CVE NVD, OpenCVE, GitHub Advisories) and async behavior, providing useful context beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, front-loaded with purpose, concise and free of fluff. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of output schema, the description covers inputs, outputs, sources, and async option adequately for the tool's complexity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Description explains cveId with example and async usage, but does not clarify cloudProvider (enum) or includeDetails, leaving some parameters underdocumented. Schema covers async description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool assesses exploitability risk of CVEs using EPSS scores and cloud asset exposure data, specifying outputs and data sources. It distinguishes from siblings like cve_security_lookup by focusing on exploitability and cloud context.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description indicates usage for vulnerability prioritization and mentions async option, but does not explicitly exclude alternatives or specify when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

vuln_patch_priority_engineA

Read-onlyIdempotent

Inspect

As a CTO, quickly prioritize unpatched CVEs by combining exploitability scores (EPSS) with cloud asset criticality. Input a list of CVE IDs and your AWS service types (e.g., EC2, RDS) to receive a ranked patching order with risk scores and estimated cloud impact. Uses public NVD, OpenCVE, and AWS pricing data. Ideal for vulnerability management and cloud security posture improvement.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`cveIds`	Yes	List of CVE identifiers to analyze (e.g., ["CVE-2021-44228", "CVE-2023-3824"])
`maxResults`	No	Maximum number of prioritized CVEs to return (default: 10)
`awsServices`	No	AWS service types affected by these CVEs (e.g., ["EC2", "RDS", "Lambda"])

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`prioritizedCves`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, idempotentHint, and openWorldHint, reducing the burden. The description adds that the tool uses public NVD, OpenCVE, and AWS pricing data, which informs the agent about external dependencies. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph that efficiently conveys the core purpose and usage. It is front-loaded with the main action. Slightly more structure (e.g., bullet points) could improve scannability, but it is already concise.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown but mentioned) and annotations, the description covers the essential context: input, output, data sources, and use case. It does not mention async behavior, but that is documented in the input schema. Overall, complete for a prioritization tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, and the description merely paraphrases the parameters (e.g., 'list of CVE IDs and your AWS service types'). It does not add meaningful semantics beyond what the schema already provides, so the baseline of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: to prioritize unpatched CVEs by combining exploitability scores (EPSS) with cloud asset criticality. It specifies the input (CVE IDs, AWS services) and output (ranked patching order with risk scores and cloud impact), distinguishing it from siblings like cve_security_lookup or vuln_exploitability_forecast.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context by targeting CTOs and vulnerability management use cases. It implies when to use (to prioritize CVEs with cloud context) but does not explicitly state when not to use or name alternative tools. This is clear but lacks exclusion criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

weather_climate_intelA

Read-only

Inspect

Physical climate intelligence for insurance underwriting, agritech, logistics, energy trading and ESG/climate risk disclosure. Three modes: (1) forecast — 14-day daily weather forecast with temperature, precipitation, wind and humidity; (2) historical — daily records and monthly aggregates for any date range since 1940, with anomaly detection (P90/P95 heat events, extreme precipitation days); (3) climate_risk — long-term physical risk scoring combining CMIP6 ensemble projections (2020-2050), altitude, FEMA flood zones (US) and historical baselines. Risk dimensions: flood, heat (days >35°C/year), drought (SPI), wildfire, sea-level. Overall score 0-100 (100 = severe). Location: city string or lat/lon coordinates. Sources: Open-Meteo (keyless, global, 1940→2050), Open-Elevation, FEMA NFHL (US), NOAA CDO (optional NOAA_API_KEY env var for US+global station data). SLA: ≤25s p95. Cache: 1h forecast / 24h historical / 7d climate_risk.

ParametersJSON Schema

Name	Required	Description
`mode`	Yes	'forecast' (14 days), 'historical' (date range since 1940), 'climate_risk' (long-term physical risk score)
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`date_to`	No	ISO date YYYY-MM-DD — end of date range (required for historical/climate_risk)
`metrics`	No	Weather metrics to include. Default: all metrics.
`location`	Yes	Geographic location. Provide either {city, country?} or {lat, lon}.
`date_from`	No	ISO date YYYY-MM-DD — start of date range (required for historical/climate_risk)

Output Schema

ParametersJSON Schema

Name	Required	Description
`mode`	Yes
`status`	Yes
`sources`	Yes
`forecast`	No
`location`	Yes
`historical`	No
`climate_risk`	No
`quality_score`	Yes

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint and non-destructive behavior. The description adds significant behavioral context: SLA ≤25s p95, cache durations, data sources for different modes, and the async polling pattern. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with numbered modes and clear sections. It is somewhat long, but every sentence provides necessary information for a complex tool. No redundancy; front-loaded with the overall purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested object, output schema exists), the description covers all aspects: modes, parameters, data sources, SLA, cache, location format, metrics, and async behavior. The existence of an output schema reduces the need to describe return values. The description is fully adequate for agent selection and invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds value by elaborating on the meaning of each mode's output, explaining location options (city vs lat/lon), and clarifying default metrics. This goes beyond the schema's minimal descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool provides physical climate intelligence for various sectors, with three distinct modes (forecast, historical, climate_risk). It distinguishes itself from siblings by specifying the data sources and risk dimensions, making its purpose unmistakable.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly explains when to use each mode, provides context for data sources and SLA, and mentions the async option for slow requests. However, it does not explicitly state when not to use the tool or list alternatives among siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

webhooks_manageAInspect

Manage HTTP webhook callbacks for async tools (T5/T6 batch flagships). Instead of polling every 5s, register a callback URL — Gapup posts the job result to your endpoint the moment it completes. Supported events: job.completed | job.failed | monitoring.alert | quota.threshold. Modes: register (add endpoint), list (view active webhooks), revoke (soft-delete), test (fire a test payload to verify your receiver), history (last 20 fires). Security: every delivery is signed with HMAC-SHA256 on the body — verify the X-Gapup-Signature header against sha256(secret, body).

ParametersJSON Schema

Name	Required	Description
`url`	No	(register) HTTPS/HTTP endpoint that will receive POST callbacks. Must return 2xx within 10s.
`mode`	Yes	register — add a webhook endpoint. list — view your active webhooks. revoke — soft-delete a webhook by webhook_id. test — fire a test payload to verify the receiver is alive. history — last 20 delivery attempts for a webhook.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`events`	No	(register, optional) Events to subscribe to. Defaults to all events if omitted.
`secret`	No	(register, optional) A secret string used to sign deliveries with HMAC-SHA256. Store it safely — verify X-Gapup-Signature header on your receiver.
`webhook_id`	No	(revoke / test / history) The webhook_id returned from register.
`caller_hash`	No	Optional caller identity override. If omitted, uses the internal session hash.

Output Schema

ParametersJSON Schema

Name	Required	Description
No output parameters

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Description adds important behavioral details beyond annotations: HMAC-SHA256 signing, mode behaviors (register, list, revoke, test, history), and security guidance for verifying signatures. Annotations are non-contradictory.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and front-loaded, covering all key aspects in one paragraph. Could be slightly more structured with bullet points, but it's efficient and informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema, the description satisfactorily covers events, modes, security, and usage. It is complete enough for an agent to understand and use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed descriptions for each parameter. The description adds overall context but doesn't significantly enhance individual parameter semantics beyond what the schema provides.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool manages HTTP webhook callbacks for async tools (T5/T6 batch flagships), specifying supported modes and events. It is distinct from siblings by focusing on webhook management rather than other async result retrieval mechanisms.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly advises using webhooks instead of polling every 5s, and describes each mode's purpose. However, it doesn't mention alternative tools or when not to use webhooks, though the context is clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

web_search_multilangA

Read-only

Inspect

Multi-language, multi-source web search that goes beyond Anglo-centric results. Supports 15 languages (fr/de/es/it/pt/nl/ja/zh/ko/ar/ru/sv/pl/tr/en) with automatic detection. Aggregates results from Mojeek (independent search engine, multilang) and Wikipedia (native multilang API), with DDG and HN as English-language complements. Returns deduplicated results ranked by cross-engine consensus. Use when you need non-English search results, when DDG fails, or for geographically-biased queries. Phase 2 #7 of the geo/lang expansion plan. Note: Brave/Bing/Searx are blocked from DO IPs — configure AICI_RESEARCH_PROXY_URL for residential proxy.

ParametersJSON Schema

Name	Required	Description
`lang`	No	2-letter language code. If omitted, auto-detected from query characters and lexical markers.
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`query`	Yes	Search query in any language
`country`	No	ISO-3166-1 alpha-2 country code for geographic bias (e.g. FR, DE, JP, BR). Optional.
`max_results`	No	Maximum number of results to return (default 10).

Output Schema

ParametersJSON Schema

Name	Required	Description
`query`	Yes
`status`	Yes
`results`	Yes
`sources`	Yes
`by_engine`	Yes
`lang_used`	Yes
`country_used`	No
`quality_score`	Yes
`total_unique_results`	Yes

Tool Definition Quality

A4.1/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and openWorldHint=true, covering safety. The description adds that results are 'deduplicated' and 'ranked by cross-engine consensus,' and mentions the async polling pattern. This provides some behavioral context but not exhaustive details like rate limits or pagination.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured with the core purpose first, followed by supported languages, sources, deduplication behavior, and usage cases. It is slightly lengthy but every sentence adds value. The note about proxy configuration and Phase 2 #7 is useful context without being redundant.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With an output schema present, the description doesn't need to detail return values. It explains deduplication, cross-engine ranking, async behavior, and source aggregation. The proxy configuration note addresses a key operational constraint. No major gaps remain for an agent to use this tool effectively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds significant meaning: lang auto-detection ('If omitted, auto-detected from query characters and lexical markers'), country with examples (FR, DE, etc.), max_results default 10, and async polling mechanism. This goes beyond the schema's basic descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs 'multi-language, multi-source web search' with a specific focus on non-Anglo-centric results. It lists supported languages and sources (Mojeek, Wikipedia, DDG, HN), distinguishing it from typical English-only searches and from siblings like 'web_search' or 'research_paper_qa'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicit usage guidance: 'Use when you need non-English search results, when DDG fails, or for geographically-biased queries.' This provides clear positive scenarios. It lacks explicit 'when not to use' instructions, but the context is sufficient for an agent to decide.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

win_loss_decoderC

Read-only

Inspect

Analyse Win/Loss deals — Gapup agent-payable C-suite expertise (CRO). Returns a structured, audited deliverable. Reference case: Gapup Hub — Win/Loss 32 deals Q1 2026 · Win rate 41% → 68% potentiel · Playbook 8 actions CRO. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`deals`	Yes
`company`	Yes
`product`	Yes
`topCompetitors`	No
`primaryChallenge`	No
`salesCycleTargetDays`	No

Tool Definition Quality

C2.9/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true and openWorldHint=true, so the description's mention of 'validated server-side' and 'returns a structured, audited deliverable' adds minor context but is not essential. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is short and front-loaded with the main purpose. However, the reference case example adds some fluff that could be omitted without losing core meaning.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complex nested input schema (7 parameters, required fields) and no output schema, the description is insufficient. It does not explain how to structure the deals, company, or product objects, leaving the agent with vague guidance.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 14% (only async described). The description says 'send the documented case fields' but does not explain any parameters or their roles, failing to compensate for the low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it analyzes win/loss deals and returns a structured deliverable, with a reference case adding context. However, it does not explicitly differentiate from sibling tools like deal_coach or pricing_in_deal, so it's clear but not fully distinguished.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description mentions 'Gapup agent-payable' but does not explain when it's appropriate or when not to use it, leaving the agent without decision support.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

workflow_orchestratorA

Read-only

Inspect

Meta-tool that CHAINS multiple MCP tools sequentially into a named workflow — delivering a composite output in a single call. 10 predefined workflows: compliance_full_audit (6 steps: KYC+sanctions+AI_gov+privacy+ESRS+CSRD), deal_due_diligence (7 steps: deep_dive+registry+court+patents+KYC+financials+M&A), market_entry_brief (6 steps: country_study+regulations+procurement+tax+AGOA+market_brief), competitor_intelligence_pack (5 steps: deep_dive+intel+patents+earnings+pitch_deck), esg_360 (5 steps: ESG_audit+carbon+CSRD+ESRS+supplier_esg), ip_freedom_to_operate (4 steps: patent_search+async_deep+IP_audit+competitive), climate_property_assessment (3 steps: climate_risk+real_estate+geo), pharma_target_screen (4 steps: trials+adverse_events+patents+meta_analysis), sanctions_360 (5 steps: KYC+Russian_sec+registry+crypto_wallet+court_filings), talent_market_brief (4 steps: salary+trends+adjacent_roles+skills_taxonomy). Returns steps_executed, consolidated P0/P1/P2 signals, overall_status, estimated_cost_usd, and raw outputs per step. Cache: 1h LRU per (workflow, target). Budget: 60s global timeout → partial if exceeded. Use when an agent needs a composite liverable without orchestrating tools manually.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`params`	No	Optional overrides passed to sub-tools. Keys depend on workflow (e.g., country, sector, role, drug, technology, wallet_address, acquirer).
`target`	Yes	The entity to analyze. A company name for most workflows; location for climate_property_assessment; role+country for talent_market_brief.
`workflow`	Yes	Named workflow to execute. Each workflow chains 3-7 tools sequentially.
`skip_failed_steps`	No	Default true: continue on step failure. Set false to abort on first error.

Output Schema

ParametersJSON Schema

Name	Required	Description
`target`	Yes
`outputs`	Yes
`summary`	Yes
`workflow`	Yes
`overall_status`	Yes
`steps_executed`	Yes
`total_duration_ms`	Yes
`estimated_cost_usd`	Yes
`consolidated_signals`	Yes

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses caching (1h LRU), global timeout (60s with partial results), async behavior, and the ability to skip failed steps. While annotations include readOnlyHint=true and openWorldHint=true, the description adds valuable behavioral context beyond these, with no contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is informative but somewhat verbose, enumerating all 10 workflows with step details. While front-loaded with the core purpose, the list could be condensed or moved to a separate reference. Still, it remains clear and structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity and the presence of an output schema, the description covers return values (steps_executed, signals, status, cost, raw outputs), caching, timeout, and async behavior. It could address error handling more, but overall it is sufficiently complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, but the description adds value by clarifying the 'target' parameter (company, location, or role+country depending on workflow) and providing examples of keys for the 'params' object. This goes beyond the schema's description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly identifies the tool as a meta-tool that chains multiple MCP tools into named workflows for composite output. It lists all 10 predefined workflows with step counts, distinguishing it from sibling tools that execute single steps.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool ('when an agent needs a composite deliverable without orchestrating tools manually') and provides context on async, caching, timeout, and partial results. However, it does not explicitly state when not to use it (e.g., for single-step tasks), leaving room for ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

working_capitalC

Read-only

Inspect

Optimiseur du BFR — Gapup agent-payable C-suite expertise (CFO). Returns a structured, audited deliverable. Reference case: Agicap — BFR optimisation · DSO 52→38j · Cash libéré +€2.8M · 3 quick wins immédiats. Inputs are validated server-side — send the documented case fields.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`company`	Yes
`industry`	No
`challenges`	Yes
`financials`	Yes
`topCustomers`	No
`topSuppliers`	No

Tool Definition Quality

C2.8/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint and openWorldHint. The description adds that it returns an audited deliverable and that inputs are validated server-side. It does not disclose authentication needs or rate limits, but it aligns with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise but includes a lengthy reference case (DSO 52→38j, cash liberated) that may distract from the core purpose. The mixed French/English may reduce readability. It is front-loaded but not optimally structured.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has 7 parameters, nested objects, and no output schema, but the description does not explain the return format, how to handle async results, or what the deliverable contains beyond 'structured and audited'. It lacks completeness for an AI agent to understand usage end-to-end.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 14% (only async parameter has description). The description does not explain individual parameters like company, financials, or challenges, relying on schema property names. With low coverage, the description should provide more context but fails to do so.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool is for optimizing working capital (BFR) and targets CFO-level users. It mentions returning a structured deliverable and provides a reference case. However, the French jargon may reduce clarity for non-French speakers, and it does not explicitly differentiate from siblings like working_capital_esg_impact_rater.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No explicit guidance on when to use or when not to use this tool. Siblings exist for ESG and FX hedging, but the description does not reference alternatives or set selection criteria.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

working_capital_esg_impact_raterA

Read-onlyIdempotent

Inspect

As a CFO, assess how ESG factors (Environmental, Social, Governance) influence working capital efficiency using IMF SDR and BIS data. Inputs include company sector, geographic exposure, and ESG risk scores. Outputs provide a quantitative impact rating on working capital metrics like days sales outstanding (DSO) and inventory turnover, alongside IMF SDR-aligned liquidity risk indicators.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`region`	Yes	Primary geographic exposure (e.g., 'EU', 'APAC')
`sector`	Yes	Industry sector (e.g., 'manufacturing', 'energy')
`currency`	No	Reporting currency (ISO 4217 code, e.g., 'USD', 'EUR')
`esgRiskScore`	Yes	Aggregate ESG risk score (0-100)
`workingCapitalRatio`	No	Current working capital ratio (current assets / current liabilities)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`impactRating`	No	ESG impact on working capital efficiency (-100 to +100)
`esgFactorBreakdown`	No
`liquidityRiskIndicator`	No	IMF SDR-aligned liquidity risk score (0-1)
`workingCapitalAdjustment`	No	Projected adjustment to working capital ratio (%)

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, idempotentHint, openWorldHint) are complemented by the description, which adds behavioral details: reliance on IMF SDR and BIS data, and outputs including liquidity risk indicators. No contradictions. However, it lacks explicit discussion of rate limits or authentication requirements, which are less critical given the read-only nature.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, clear sentence that efficiently conveys the tool's purpose, inputs, and outputs. It is well front-loaded and avoids redundancy. Minor room for improvement in structuring as separate points.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the presence of an output schema (not shown) and thorough annotations, the description provides adequate context: data sources, input types, and output metrics. It sufficiently differentiates from sibling tools. The term 'IMF SDR-aligned liquidity risk indicators' might require domain knowledge but is acceptable for the intended audience.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the baseline is 3. The description reinforces parameter purpose (sector, region, ESG risk scores) but does not add new parameter-level semantics beyond what is already in the schema. It focuses on the tool's overall function and output.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's function: assessing how ESG factors influence working capital efficiency using specific data sources (IMF SDR, BIS). It specifies inputs (sector, region, ESG risk scores) and outputs (quantitative impact rating on DSO, inventory turnover, liquidity indicators). This distinguishes it from sibling tools like 'working_capital' (general) and 'esg_audit_multi' (broader ESG audit).

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage (when you need ESG-integrated working capital analysis) but provides no explicit guidance on when to use this tool vs. alternatives, nor any conditions or exclusions. The agent must infer from context.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

working_capital_fx_hedge_optimizerA

Read-onlyIdempotent

Inspect

For CFOs managing multinational working capital, this tool analyzes real-time ECB and FRED foreign exchange rates to recommend optimal hedging strategies. Input base currency, target currencies, and working capital amounts to receive forward contract suggestions, natural hedge opportunities, and cost-benefit analysis of various hedging instruments (forwards, options, swaps). Outputs include hedge ratios, estimated cost savings, and risk reduction metrics.

ParametersJSON Schema

Name	Required	Description	Default
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`baseCurrency`	Yes	ISO 4217 code of the company's functional currency (e.g., 'USD', 'EUR')
`riskAppetite`	No	Company's risk tolerance for currency fluctuations	balanced
`timeHorizonDays`	No	Planning horizon in days (default: 90)
`targetCurrencies`	Yes	ISO 4217 codes of currencies to hedge against (e.g., ['EUR', 'GBP', 'JPY'])
`workingCapitalAmounts`	Yes	Working capital amounts in each target currency (e.g., { EUR: 5000000, GBP: 3000000 })

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	No
`warnings`	No
`recommendations`	No

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate the tool is read-only, idempotent, and open-world. The description adds behavioral context by specifying data sources (ECB, FRED) and output types (forward suggestions, natural hedge opportunities, cost-benefit analysis), enhancing transparency without contradicting annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise (3-4 sentences), front-loaded with target audience, and covers purpose, inputs, and outputs without fluff. Every sentence adds value, making it efficient and easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (6 parameters, nested objects, output schema), the description adequately covers its purpose, inputs, outputs, and intended use. It explains what kind of recommendations and metrics are provided, which is complete enough with the output schema available.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the baseline is 3. The description mentions key parameters (baseCurrency, targetCurrencies, workingCapitalAmounts) but does not add significant meaning beyond what the schema provides. It contextualizes their role in hedging, which is sufficient but not exceptional.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: analyzing real-time FX rates to recommend optimal hedging strategies for CFOs managing multinational working capital. It specifies inputs, outputs, and distinguishes itself from generic FX tools or working capital tools by focusing on hedging optimization.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides usage context (CFOs, multinational working capital) but does not explicitly state when to use this tool over siblings like 'fx_rate' for simple rates or 'working_capital' for general needs. No exclusions or alternatives are mentioned, leaving usage guidance implied rather than explicit.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

x402_liquidity_monitorA

Read-onlyIdempotent

Inspect

Monitors real-time x402-USDC liquidity depth across 12 decentralized and centralized exchanges, providing slippage alerts and depth analysis for CFO liquidity risk assessment. Inputs include slippage thresholds and exchange selection; outputs liquidity depth, price impact estimates, and warning flags. Essential for optimizing trade execution and managing liquidity exposure. Keywords: liquidity monitoring, slippage analysis, DEX/CEX depth, x402-USDC pair, CFO financial tooling.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`exchanges`	No	List of exchanges to monitor (defaults to all 12 if empty)
`depthLevels`	No	Liquidity depth levels to analyze (percentage from mid-price)
`slippageThreshold`	Yes	Maximum acceptable slippage percentage (0-100)

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`midPrice`	No	Current x402-USDC mid-price
`warnings`	Yes
`priceImpact`	No
`liquidityDepth`	Yes
`slippageAlerts`	Yes

Tool Definition Quality

A3.7/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint, openWorldHint, and idempotentHint. The description adds context about real-time monitoring and CFO risk assessment but does not disclose additional behavioral traits beyond what annotations provide.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences, front-loaded with the core function. It is concise but includes keyword tags and a brief list of outputs, earning its place without unnecessary repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's moderate complexity (4 params, simple enums) and the presence of an output schema, the description covers purpose, inputs, outputs, and context adequately. It mentions specific outputs like liquidity depth and warning flags.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema has 100% coverage with detailed descriptions for all 4 parameters. The description adds generic mentions of 'slippage thresholds and exchange selection' but does not provide significant new information beyond the schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool monitors real-time x402-USDC liquidity depth across 12 exchanges, providing slippage alerts and depth analysis for CFO liquidity risk assessment. It distinguishes from sibling tools like x402_payment_flow_analyzer by focusing on liquidity monitoring rather than payment flows.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions it is 'Essential for optimizing trade execution and managing liquidity exposure' but does not explicitly state when to use it over alternatives or when not to use it. Usage context is implied but not directly addressed.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

x402_payment_flow_analyzerA

Read-onlyIdempotent

Inspect

As a CTO, analyze USDC payment flows involving x402 addresses to assess counterparty risk, trace transaction paths, and evaluate regulatory exposure. Input a wallet address or transaction hash to receive risk scores, flow diagrams, and compliance flags from Chainalysis and TRM Labs public APIs. Ideal for due diligence, fraud detection, and compliance reporting. Pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`depth`	No	Hops to trace in payment flow
`txHash`	No	USDC transaction hash to trace
`address`	Yes	Ethereum wallet address to analyze
`includeRiskScore`	No	Include counterparty risk scoring

Output Schema

ParametersJSON Schema

Name	Required	Description
`flowId`	No	Unique identifier for this payment flow analysis
`status`	Yes
`sources`	No
`warnings`	No
`riskScore`	No	Counterparty risk score (0-100)
`complianceFlags`	No
`exposureSummary`	No
`transactionPath`	No

Tool Definition Quality

A4.2/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, openWorldHint, idempotentHint) already indicate the tool is safe and side-effect-free. The description adds behavioral context: it uses external APIs (Chainalysis, TRM Labs), implying potential slowness, and details the async mode (returns job_id quickly). No contradictions. Could mention data freshness or rate limits, but sufficient.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences. First sentence states purpose and persona (CTO). Second details inputs and outputs. Third gives use cases and async tip. No filler, front-loaded with main action. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (external APIs, async mode, multiple parameters) and the presence of an output schema (not shown), the description covers: what it does, inputs, outputs, use cases, and async mechanism. It does not explain depth or risk score details, but those are in schema. Overall very complete for the context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds context by mentioning 'Input a wallet address or transaction hash' and the async parameter tip, but does not significantly elaborate beyond the schema descriptions. The depth and includeRiskScore parameters are not mentioned, but schema covers them.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool analyzes USDC payment flows involving x402 addresses for counterparty risk, transaction tracing, and regulatory exposure. It specifies the resource (USDC payment flows with x402 addresses) and actions (analyze, assess, trace, evaluate), and lists outputs (risk scores, flow diagrams, compliance flags). It distinguishes itself from sibling tools like 'usdc_x402_payments_intel' and 'x402_payment_fraud_detector' by focusing on flow analysis for risk assessment and compliance.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides clear use cases: 'Ideal for due diligence, fraud detection, and compliance reporting.' It also gives a usage tip: 'Pass async:true to avoid timeout.' However, it does not explicitly state when not to use this tool or mention alternatives among sibling tools, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

x402_payment_fraud_detectorA

Read-onlyIdempotent

Inspect

Risk-focused tool that analyzes x402-USDC payment transactions for fraud patterns using on-chain forensics. Takes a transaction hash or wallet address as input and returns risk scores, suspicious indicators, and historical patterns. Designed for risk management teams to quickly assess payment legitimacy. Includes keywords: fraud detection, USDC risk, blockchain forensics, transaction monitoring. pass async:true to avoid timeout.

ParametersJSON Schema

Name	Required	Description
`async`	No	If true, returns a job_id immediately (<200ms) instead of waiting for the result. Poll the result with job_result(job_id). Use for slow tools to avoid client timeouts.
`walletAddress`	No
`includeHistory`	No
`amountThreshold`	No
`transactionHash`	Yes

Output Schema

ParametersJSON Schema

Name	Required	Description
`status`	Yes
`sources`	Yes
`warnings`	Yes
`riskScore`	Yes
`isSuspicious`	Yes
`sanctionsMatch`	No
`fraudIndicators`	No
`transactionHistory`	No

Tool Definition Quality

A3.5/5.0

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, openWorldHint=true, and idempotentHint=true, indicating safe, read-only, and idempotent behavior. The description adds that it uses on-chain forensics and returns risk scores, but does not disclose additional behavioral traits beyond what annotations provide. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description front-loads the main purpose and includes relevant details, but the list of keywords at the end is unnecessary and adds clutter. It could be more concise without losing essential information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description gives a basic understanding of inputs and outputs but does not detail the output schema or distinguish from similar sibling tools like x402_liquidity_monitor or x402_payment_flow_analyzer. With 5 parameters and low schema coverage, more details on output structure and usage context would improve completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is only 20% (only async has a description). The description partially covers transactionHash and walletAddress by mentioning 'transaction hash or wallet address as input', and mentions async. However, includeHistory and amountThreshold are not explained. The description adds some value but does not fully compensate for low schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it analyzes x402-USDC payment transactions for fraud patterns, specifying inputs (transaction hash or wallet address) and outputs (risk scores, suspicious indicators, historical patterns). It also targets risk management teams, distinguishing it from generic fraud detectors or informational tools like usdc_x402_payments_intel.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description mentions it is for risk management teams and advises using async:true to avoid timeout, but does not explicitly state when to use this tool versus alternatives like fraud_detector or x402_payment_flow_analyzer. Usage context is implied but not clear.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.