Skip to main content
Glama

Ground Truth - First Tool Call

Server Details

✅ Try check_endpoint with url=https://example.com. Paid plan adds monitors and reports.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL
Repository
anish632/ground-truth-mcp
GitHub Stars
0
Server Listing
ground-truth-mcp

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4.5/5 across 15 of 15 tools scored. Lowest: 3.5/5.

Server CoherenceA
Disambiguation4/5

Most tools have clearly distinct purposes, e.g., check_endpoint vs inspect_security_headers. However, test_hypothesis and verify_claim have overlapping functionality, and compare_competitors and estimate_market both deal with package registries, though one is for specific packages and the other for category breadth.

Naming Consistency5/5

All tool names follow a consistent verb_noun pattern in snake_case (e.g., check_endpoint, create_monitor, verify_claim). No mixed conventions or ambiguous verbs.

Tool Count5/5

15 tools is well-scoped for a monitoring and verification server. Each tool covers a specific aspect without redundancy, and the count is within the typical 3-15 range for good coverage.

Completeness4/5

The tool set covers monitoring lifecycle (create, list, run, get, delete), verification (endpoint, pricing, compliance, security, packages), and hypothesis testing. Minor gaps exist (e.g., no tool to update a monitor directly, but delete+create works).

Available Tools

16 tools
assess_compliance_postureCompliance Signal ScanA
Read-onlyIdempotent
Inspect

Scan a public security, trust, compliance, or legal page for common enterprise buying signals before you claim a vendor supports a particular compliance posture. It looks for public references to SOC 2, ISO 27001, GDPR, HIPAA, DPA terms, subprocessors, SSO, SCIM, encryption, and data residency language. This is a signal scanner, not proof of certification or legal sufficiency.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlYesPublic trust, security, compliance, or policy URL to scan.

Output Schema

ParametersJSON Schema
NameRequiredDescription
urlYesCompliance or trust page that was analyzed.
errorNoFetch or parsing error when the page could not be analyzed.
cachedNoTrue when the page body came from the 5-minute cache.
signalsNoBoolean scan results for common enterprise compliance and security signals.
pageLengthNoSize of the fetched page body in characters.
matchedSignalsNoSignal names that were detected on the page.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate read-only, idempotent, and non-destructive behavior. The description adds valuable context by specifying what signals are scanned and clarifying that it is a signal scanner, not certification proof. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise with two sentences that front-load the purpose and include necessary caveats. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (1 parameter, output schema exists), the description covers purpose, usage context, limitations, and behavioral traits thoroughly. It is complete for effective tool selection.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the single parameter (url) is well-described. The description does not add additional parameter-level detail beyond the schema, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it scans a public page for compliance signals, using specific verbs and listing common indicators (SOC 2, ISO 27001, etc.). It distinguishes from sibling tools like check_endpoint or verify_claim by focusing on compliance posture scanning.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides context for when to use ('before you claim a vendor supports a particular compliance posture') and includes a warning about misuse ('not proof of certification'). However, it does not explicitly contrast with sibling tools or state when not to use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_endpointEndpoint Reachability CheckA
Read-onlyIdempotent
Inspect

Perform one live, unauthenticated fetch against a public URL or API endpoint before you recommend it, document it, or build on top of it. Use this when the question is simply whether an endpoint currently responds and what kind of response it returns. It reports HTTP status, content type, elapsed time, likely auth/rate-limit signals, and a short response sample. A successful result only proves basic reachability at fetch time. Do not use it to validate authenticated flows, POST side effects, JavaScript execution, or deeper business logic.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlYesPublic http(s) URL or bare domain to probe. Bare domains like google.com are accepted and normalized to https:// automatically.

Output Schema

ParametersJSON Schema
NameRequiredDescription
urlYesNormalized URL that was actually fetched.
errorNoValidation or network error when the request could not be completed.
statusNoHTTP status code returned by the endpoint, when a response was received.
inputUrlNoOriginal user input when normalization changed it, for example when https:// was added.
accessibleYesTrue when the endpoint returned a 2xx HTTP status.
contentTypeNoResponse Content-Type header, if present.
rateLimitedNoTrue when the server responded with 429 Too Many Requests.
authRequiredNoTrue when the server responded with 401 or 403, which usually means credentials are required.
responseTimeMsNoElapsed request time in milliseconds.
sampleResponseNoFirst 1,000 characters of the response body for quick inspection. Use this as a debugging hint only; it may be truncated and should not be treated as a complete page capture.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnlyHint, idempotentHint, destructiveHint), the description adds that the tool is unauthenticated, reports specific fields (HTTP status, content type, etc.), and warns that success only proves basic reachability. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each earning its place: purpose and usage, what is reported, and limitations/caveats. Front-loaded, no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With one simple parameter, output schema mentioned, and annotations covering safety, the description is complete. It covers main use case, output contents, and limitations. Adequate for an agent to decide when to use.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a detailed description of the url parameter (public http(s) URL, normalization). The tool description adds usage context but no new parameter semantics beyond the schema. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool performs a live, unauthenticated fetch against a public URL/API endpoint. It uses specific verbs and resources, and the context distinguishes it from siblings like inspect_security_headers or verify_claim.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use ('simply whether an endpoint currently responds') and when not to use (authenticated flows, POST side effects, JavaScript execution, deeper business logic). Provides clear guidance on appropriate use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_pricingPricing Page ScanA
Read-onlyIdempotent
Inspect

Fetch a public pricing page and extract first-pass pricing signals before you quote plan costs, free tiers, or plan names. Use this when you already have a likely pricing URL and need a quick live scan of visible page text. It returns price-like strings, heuristic plan labels, free or free-trial signals, and cache information. It does not map prices to exact plans, normalize currencies, execute checkout flows, or guarantee that a price applies to a specific region or customer type. JavaScript-rendered, logged-in, or heavily obfuscated pricing details can be missed. Results are cached for 5 minutes.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlYesPublic pricing or plans URL to analyze. Prefer the specific pricing page, for example https://stripe.com/pricing, rather than a generic homepage.

Output Schema

ParametersJSON Schema
NameRequiredDescription
urlYesPricing page that was analyzed.
errorNoFetch or parsing error when the pricing page could not be analyzed.
cachedNoTrue when the page body came from the 5-minute cache instead of a new fetch.
pageLengthNoSize of the fetched page body in characters.
pricesFoundNoDistinct price-like strings extracted from the page text. These are not linked back to specific plans or billing conditions.
hasFreeTrialNoTrue when the page contains signals that a free trial exists somewhere on the page.
hasFreeOptionNoTrue when the page contains signals that a free plan or $0 option exists somewhere on the page. This is a page-level signal, not proof that the offer is currently self-serve or globally available.
plansDetectedNoLowercased heuristic plan labels detected from the page text. They are useful hints, not authoritative plan identifiers.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive behavior. The description adds context: caching for 5 minutes, 'first-pass' scan, and specific limitations (e.g., missed JS-rendered content). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, front-loading the purpose and use case. Every sentence adds critical information without redundancy, making it easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple tool (one required param, output schema present), the description is fully adequate. It explains inputs, outputs, limitations, and caching, leaving no significant gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single `url` parameter. The description adds value by advising to prefer a specific pricing page over a homepage, which aids correct usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action: 'Fetch a public pricing page and extract first-pass pricing signals before you quote plan costs, free tiers, or plan names.' It specifies the resource (pricing page) and outputs (price strings, plan labels, free signals), and distinguishes it from siblings like compare_pricing_pages.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use: 'Use this when you already have a likely pricing URL and need a quick live scan.' It also details exclusions: does not map prices to plans, normalize currencies, etc., and notes that JS-rendered/logged-in pages can be missed, guiding the agent to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

compare_competitorsNamed Package ComparisonA
Read-onlyIdempotent
Inspect

Compare two or more exact package names side by side using live npm or PyPI metadata. Use this when you already know the candidate packages and need evidence for claims such as 'tool A is newer', 'tool B is still maintained', or 'these packages use different licenses'. It returns per-package registry metadata in input order, with field availability varying by registry. Missing or unpublished packages return found=false. Do not use it to discover unknown alternatives, estimate market size, or compare packages across different registries. Registry responses are cached for 5 minutes.

ParametersJSON Schema
NameRequiredDescriptionDefault
packagesYesTwo to ten exact package names from the same registry, for example ['react', 'vue']. Use exact registry names, not search phrases or categories.
registryNoRegistry that all package names belong to. All compared packages must come from the same registry, and returned metadata fields differ slightly between npm and PyPI.npm

Output Schema

ParametersJSON Schema
NameRequiredDescription
packagesYesPackage names that were requested for comparison.
registryYesRegistry used for all comparisons.
comparisonsYesPer-package lookup results returned in the same order as the input package list. Some fields only exist for npm or only for PyPI, so consumers should treat absent fields as normal.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Disclosures beyond annotations: returns per-package metadata in input order, field availability varies, missing packages return found=false, registry caching for 5 minutes. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, front-loaded with core action, each sentence adds value. No filler or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given output schema exists, description covers purpose, usage guidelines, behavioral details (caching, error state), and edge cases. No missing context for effective invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline 3 is appropriate. The description reinforces 'exact package names' and 'same registry', but adds marginal new meaning beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool compares exact package names using live npm or PyPI metadata. It distinguishes from sibling tools like estimate_market by explicitly contrasting use cases.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly specifies when to use (known candidate packages needing evidence like maintenance, license) and when not to (discovering unknown alternatives, market size, cross-registry). Implicitly identifies alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

compare_pricing_pagesPricing Page ComparisonA
Read-onlyIdempotent
Inspect

Compare two to five public pricing pages side by side before you make competitive pricing or packaging claims. Use this when you want a quick, live comparison of visible prices, free-plan signals, and plan-name hints across vendors. The output is heuristic and page-level: it does not map every price to every plan or normalize regional billing differences.

ParametersJSON Schema
NameRequiredDescriptionDefault
pagesYesTwo to five named pricing pages to compare side by side.

Output Schema

ParametersJSON Schema
NameRequiredDescription
pagesYesPer-page pricing signals returned in input order.
summaryYesAggregate counts across all compared pricing pages.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true and destructiveHint=false. The description adds valuable transparency beyond annotations by noting that output is 'heuristic and page-level' and describing what it does not do (e.g., mapping prices to plans). This sets proper expectations for the agent.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three sentences long, front-loaded with the main purpose, and each sentence earns its place: purpose, usage context, and limitations. No waste or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple parameter (one array fully described in schema), comprehensive annotations, and presence of an output schema, the description is complete. It sets expectations about heuristic output and limitations, which suffices for an agent to invoke the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed descriptions for each property (url, name). The description adds no additional parameter semantics beyond the schema, but it reinforces that pages should be 'public pricing pages' and 'named.' Baseline score of 3 is appropriate given the high schema coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Compare two to five public pricing pages side by side before you make competitive pricing or packaging claims.' It uses a specific verb ('compare') and resource ('pricing pages'), and implicitly distinguishes from siblings like 'compare_competitors' and 'check_pricing' by focusing on live, page-level heuristic comparison.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use the tool ('before you make competitive pricing or packaging claims') and what it captures ('visible prices, free-plan signals, and plan-name hints'). It also provides clear limitations: 'does not map every price to every plan or normalize regional billing differences,' guiding the agent on appropriate use cases.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

create_monitorCreate MonitorAInspect

Create a persistent monitor that tracks a URL, pricing page, package version, endpoint status, vendor claim, or custom keyword pattern over time. Monitors run automatically on their configured schedule (hourly/daily/weekly) via the Cloudflare cron trigger, or on demand with run_monitor_now. Results are stored in the Durable Object SQLite database. Requires a team API key.

ParametersJSON Schema
NameRequiredDescriptionDefault
nameYesHuman-readable name for this monitor.
scheduleNoHow often the monitor runs automatically. manual means only via run_monitor_now.daily
target_typeYesWhat to monitor. url/endpoint: HTTP reachability and status. pricing_page: pricing signals (prices, plans, free tier). package: package version on npm or pypi (target_value as 'npm:pkg-name' or 'pypi:pkg-name'). vendor_claim: keyword presence at a URL (target_value=claim text, instructions=URL to check). custom_prompt: comma-separated keywords checked against a URL (target_value=URL, instructions=keywords).
instructionsNoSupplementary instructions. For vendor_claim: the URL to check. For custom_prompt: comma-separated keywords. Optional for other types.
target_valueYesPrimary target. For url/endpoint/pricing_page/custom_prompt: a public https URL. For package: 'npm:package-name' or 'pypi:package-name'. For vendor_claim: the claim text to search for.
notification_destinationNoOptional destination for change alerts (email or webhook URL). Stored for future use.

Output Schema

ParametersJSON Schema
NameRequiredDescription
idYesUnique monitor ID.
nameYesMonitor name.
errorNoError message if creation failed.
scheduleYesMonitor schedule.
created_atYesCreation timestamp ISO 8601.
target_typeYesMonitor target type.
target_valueYesMonitor target value.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description discloses persistence, scheduled runs via Cloudflare cron, storage in Durable Object SQLite, and the requirement for a team API key. Annotations are all false, so description carries the burden; it provides substantial context but could mention idempotency or conflict behavior for duplicate names.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the main purpose, then elaborates on behavior and requirements. Every sentence adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given 6 parameters and existence of an output schema, the description covers purpose, behavior, parameter details, and usage. It mentions the API key requirement but could optionally include an example or mention of idempotency. Overall sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline is 3. The description adds significant value by explaining each target_type in detail (e.g., pricing page signals, package version format) and clarifying the instructions field context. This goes beyond the schema's short descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool creates a persistent monitor with specific verb 'create' and resource 'monitor'. It lists various target types (URL, pricing page, etc.) and distinguishes from siblings by mentioning cron trigger and run_monitor_now, providing clear differentiation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains when to use (to track various targets over time) and mentions schedule options including manual vs automated via cron. It implicitly contrasts with one-off check tools (siblings like check_endpoint, verify_claim) but does not explicitly state when not to use this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

delete_monitorDelete MonitorA
DestructiveIdempotent
Inspect

Permanently delete a monitor and all its stored results. This action cannot be undone. Requires a team API key.

ParametersJSON Schema
NameRequiredDescriptionDefault
monitor_idYesThe monitor ID to delete.

Output Schema

ParametersJSON Schema
NameRequiredDescription
errorNo
deletedYes
monitor_idYes
results_deletedYesNumber of result records also deleted.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark destructiveHint=true and idempotentHint=true. The description adds crucial context: deletion is permanent, cannot be undone, and deletes all stored results. It also confirms the auth requirement ('team API key'), going beyond annotations to ensure the agent understands the consequences.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences with no filler. The first sentence states the core action and scope, the second adds consequence and requirement. Every word serves a purpose.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple destructive operation with one parameter and an output schema, the description sufficiently covers the tool's purpose, permanence, and auth requirement. No additional details are necessary for correct invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with a single parameter 'monitor_id' described as 'The monitor ID to delete.' The description does not add additional meaning or formatting hints, so it meets the baseline for high coverage without extra value.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'delete' and the resource 'monitor', along with the scope 'all its stored results'. It effectively distinguishes this tool from siblings like 'create_monitor' or 'list_monitors' by emphasizing permanence and irreversibility.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description notes that a team API key is required, providing a usage prerequisite. It implies this tool should be used when permanent deletion is intended, but does not explicitly state when not to use it or offer comparisons to alternative tools like 'list_monitors' or 'get_monitor_result'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

estimate_marketPackage Market SearchA
Read-onlyIdempotent
Inspect

Search npm or PyPI to estimate how crowded a package category is before you claim that a market is empty, niche, or competitive. Use this when you have a category or search phrase such as 'edge orm' and want live result counts plus representative matches. Do not use it to compare exact known package names or to infer adoption from downloads; it reflects search results, not market share. Registry responses are cached for 5 minutes.

ParametersJSON Schema
NameRequiredDescriptionDefault
queryYesShort registry search phrase to evaluate, for example 'mcp memory server' or 'edge orm'.
registryNoRegistry to search. Use 'npm' for JavaScript ecosystems and 'pypi' for Python ecosystems.npm

Output Schema

ParametersJSON Schema
NameRequiredDescription
queryYesSearch phrase that was evaluated.
registryYesRegistry that was searched.
topResultsYesRepresentative top search matches that help interpret the market count.
totalResultsYesTotal number of matching packages reported by the registry search.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate safe, read-only, idempotent behavior. Description adds beyond that: 'Registry responses are cached for 5 minutes' and explains the tool's limitation to search results, not market share. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, concise, front-loaded with purpose, no extraneous words. Efficiently conveys all necessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple two-parameter tool, rich annotations, and output schema existence, the description covers purpose, usage, behavior, and limitations comprehensively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description adds value by mentioning 'live result counts plus representative matches', providing extra context beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches npm or PyPI to estimate category crowdedness, using specific verbs 'Search' and 'estimate'. It distinguishes from siblings which focus on compliance, pricing, security headers, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use (when you have a category or search phrase) and when not to (not for comparing exact package names or inferring adoption from downloads). Provides clear context on what the tool reflects.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

generate_change_reportGenerate Change ReportA
Read-onlyIdempotent
Inspect

Generate a summary report of monitor activity for a time window. Shows monitors run, changes detected, failures, risk levels, and recommended follow-up actions. Requires a team API key.

ParametersJSON Schema
NameRequiredDescriptionDefault
periodNoReport period. daily covers the past 24 hours, weekly covers the past 7 days.daily
include_unchangedNoWhen true also lists monitors with no detected changes.

Output Schema

ParametersJSON Schema
NameRequiredDescription
toYes
fromYes
errorNo
periodYes
changesYes
summaryYes
failuresYes
recommended_actionsYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent behavior. The description adds context about report contents and the API key requirement, but does not describe any additional behavioral traits beyond annotations. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences, front-loaded with purpose, no redundant information. Every sentence adds value.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness3/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the output schema exists and annotations are present, the description covers the core purpose and output contents. However, it does not mention the time window parameter (period) or that it aggregates data, which are important for context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with clear descriptions and defaults. The description does not add any parameter-level details beyond what the schema provides, so it meets the baseline for high coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states it generates a summary report of monitor activity, listing specific contents like monitors run and changes detected. While it is distinct from sibling tools like get_monitor_result or list_monitors, it does not explicitly differentiate itself.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool versus alternatives. The description mentions a requirement for a team API key but provides no context for when-to-use or when-not-to-use.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_monitor_resultGet Monitor ResultsA
Read-onlyIdempotent
Inspect

Retrieve the most recent run results for a monitor, including change details, confidence score, evidence URLs, and any error information. Requires a team API key.

ParametersJSON Schema
NameRequiredDescriptionDefault
limitNoMaximum number of results to return, newest first.
monitor_idYesThe monitor ID to retrieve results for.

Output Schema

ParametersJSON Schema
NameRequiredDescription
errorNo
totalYes
resultsYes
monitor_idYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, idempotent, non-destructive. Description adds behavioral info: requires a team API key (authorization) and lists specific return fields. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence with clear front-loading of the core action. Every word adds value; no redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given an output schema exists, description appropriately mentions auth and key return fields. Minor gap: does not clarify behavior when no runs exist. Otherwise complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of parameter descriptions. The description does not add additional meaning beyond the schema for the two parameters (monitor_id, limit).

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action (Retrieve), target (most recent run results for a monitor), and specific output elements (change details, confidence score, etc.). Distinguishes from sibling tools like run_monitor_now or list_monitors.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Description mentions a prerequisite (team API key) but does not specify when to use this tool versus alternatives like run_monitor_now or generate_change_report. No explicit when-not-to-use or exclusion guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

inspect_security_headersSecurity Header InspectionA
Read-onlyIdempotent
Inspect

Fetch a public URL and inspect security-relevant response headers before you claim that a product or endpoint has a strong browser-facing security baseline. Use this for quick due diligence on public apps and docs sites. It checks for common headers such as HSTS, CSP, X-Frame-Options, Referrer-Policy, Permissions-Policy, and X-Content-Type-Options. It does not replace a real security review, authenticated testing, or vulnerability scanning.

ParametersJSON Schema
NameRequiredDescriptionDefault
urlYesPublic http(s) URL or bare domain to inspect. Bare domains are normalized to https:// automatically.

Output Schema

ParametersJSON Schema
NameRequiredDescription
urlYesNormalized URL that was fetched.
errorNoValidation or network error when the request could not be completed.
httpsYesTrue when the normalized URL used https.
scoreNoHeuristic security-header score based on how many tracked headers were present.
statusNoHTTP status code returned by the endpoint.
headersNoTracked response headers and their raw values when present.
inputUrlNoOriginal user input when normalization changed it.
accessibleYesTrue when the endpoint returned an HTTP response.
presentCountNoNumber of tracked security headers that were present.
missingRecommendedNoTracked headers that were not present on the response.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds context beyond annotations by listing specific headers checked (HSTS, CSP, etc.) and reiterating non-replacement of full review. No contradiction with readOnlyHint, idempotentHint.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences front-load the purpose, then add useful details and limitations.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with one parameter and an output schema, the description covers purpose, usage, headers checked, and limitations completely.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the description adds meaning about automatic normalization of bare domains to HTTPS.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool fetches a URL and inspects security headers, and distinguishes it from sibling tools like assess_compliance_posture or check_endpoint.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use it (before claiming strong browser security) and what it does not replace (real security review, authenticated testing, vulnerability scanning).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_monitorsList MonitorsA
Read-onlyIdempotent
Inspect

List all monitors owned by this API key, with last run status and schedule. Requires a team API key.

ParametersJSON Schema
NameRequiredDescriptionDefault
active_onlyNoWhen true returns only active monitors. Set false to include paused monitors.

Output Schema

ParametersJSON Schema
NameRequiredDescription
errorNo
totalYesTotal number of monitors returned.
monitorsYesList of monitors belonging to this API key.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only and idempotent behavior. The description adds return details and an access requirement, providing useful context beyond structured fields.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two concise sentences with no superfluous information. Efficient and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given an output schema exists and annotations are rich, the description covers the key aspects: what is listed and auth requirement. Could mention pagination if relevant, but likely complete for this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description covers the single parameter fully (100%), so the description adds no additional semantics. Baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists monitors owned by the API key, with specific return fields (last run status, schedule), and distinguishes it from mutation tools like create_monitor or delete_monitor.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

It specifies a prerequisite (team API key) but does not explicitly contrast with sibling tools such as search or check endpoints. This is acceptable given the context but could be improved.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_resourcesServer Resource DiscoveryA
Read-onlyIdempotent
Inspect

List all available Ground Truth tools and their access tiers. Zero-cost schema discovery. Call this to explore what verification tools are available before making a tool call. No quota consumption, no API key required.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Output Schema

ParametersJSON Schema
NameRequiredDescription
freeToolsYesTools available in the free tier with no API key required.
paidToolsYesTools requiring team API key or agentic payment.
monitorToolsYesMonitor management tools requiring team API key.
serverVersionYesCurrent server version.
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, idempotentHint=true, and destructiveHint=false. The description adds behavioral traits: 'zero-cost schema discovery', 'no quota consumption', and 'no API key required', which go beyond annotations by specifying operational characteristics.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, each serving a purpose: first defines function, second advises when to use, third adds behavioral context. No redundant information, front-loaded with key action.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given zero parameters and an existing output schema, the description sufficiently explains the tool's purpose and usage. It covers discovery need, cost implications, and prerequisites (none). Complete for a resource discovery tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has zero parameters with 100% coverage. The description implicitly confirms no parameters are needed by stating it lists resources. Baseline 4 is appropriate as no parameter info is required.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool lists all available Ground Truth tools and their access tiers, with 'zero-cost schema discovery'. It distinguishes itself as an exploration tool, contrasting with sibling tools that are for specific actions like compliance assessment or monitor management.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises calling this tool 'before making a tool call' to explore available verification tools, providing clear context. It also mentions no quota consumption and no API key required, implying it's safe to use freely. However, it does not explicitly exclude scenarios or compare with siblings.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

run_monitor_nowRun Monitor NowAInspect

Immediately run a monitor's verification check outside its normal schedule. Records the result and returns whether the observed value changed since the last run. Counts against your monthly quota. Requires a team API key.

ParametersJSON Schema
NameRequiredDescriptionDefault
monitor_idYesThe monitor ID returned by create_monitor.

Output Schema

ParametersJSON Schema
NameRequiredDescription
errorNo
run_atYes
statusYes
changedYes
evidenceYes
new_valueYes
old_valueYes
result_idYes
confidenceYes
monitor_idYes
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations indicate readOnlyHint=false (writes) and destructiveHint=false. The description adds: 'Records the result', 'returns whether the observed value changed', and 'Counts against your monthly quota.' This provides useful behavioral context beyond the annotations, though it could mention that it updates the monitor's last-run state.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is three concise sentences that front-load the primary purpose. Every sentence adds value: action, result, side effects, auth requirements. No redundancy or wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (one parameter) and the presence of an output schema, the description covers the core action, side effects, quota, and auth requirement. An agent can confidently decide when and how to use the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with the parameter 'monitor_id' described as 'The monitor ID returned by create_monitor.' The description adds no additional semantic meaning beyond what the schema already provides, so baseline 3 applies.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action: 'Immediately run a monitor's verification check outside its normal schedule' and specifies what it returns (whether value changed). This distinguishes it from siblings like list_monitors, get_monitor_result, and create_monitor.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit context: 'Counts against your monthly quota' and 'Requires a team API key.' It indicates when to use (for immediate check) but does not explicitly state when not to use or mention alternatives like get_monitor_result for viewing past results.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

test_hypothesisMulti-step Hypothesis TestA
Read-onlyIdempotent
Inspect

Run a small verification plan made of concrete live checks and summarize whether a hypothesis is supported. Use this when one conclusion depends on multiple simple checks such as endpoint reachability, npm search counts, or whether a page contains an exact substring. This is a coordination tool, not an open-ended research agent: every test must be explicitly defined in advance, and tests run in order with no branching or early exit. The final verdict is mechanical: all tests passing => SUPPORTED, zero passing => REFUTED, otherwise PARTIALLY SUPPORTED. Use verify_claim when you already have evidence URLs, estimate_market for category sizing, and compare_competitors when you already know exact package names.

ParametersJSON Schema
NameRequiredDescriptionDefault
testsYesOrdered list of one to ten checks to run. Each test object uses only the fields required by its type.
hypothesisYesClaim to test, for example 'there are fewer than 50 MCP email servers on npm'.

Output Schema

ParametersJSON Schema
NameRequiredDescription
testsYesPer-test execution results in input order.
verdictYesHigh-level verdict for the hypothesis.
hypothesisYesHypothesis that was evaluated.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint, idempotentHint, and non-destructive behavior. The description adds valuable behavioral details: 'tests run in order with no branching or early exit,' the mechanical verdict logic (SUPPORTED/REFUTED/PARTIALLY SUPPORTED), and that response_contains does not parse DOM or execute JavaScript. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is well-structured: purpose, usage, behavior, alternatives. Every sentence contributes value, though it could be slightly shorter. It front-loads the key action and context.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of a multi-step hypothesis test tool with an output schema, the description covers all necessary aspects: when to use, how tests work, verdict logic, and limitations. It provides a complete picture for an AI agent to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, with each field described in the schema. The description adds minimal additional parameter semantics, mainly reiterating that tests must be predefined and ordered. It does not provide new meaning for individual parameters beyond what the schema already offers. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Run a small verification plan made of concrete live checks and summarize whether a hypothesis is supported.' It specifies the verb (run verification plan) and resource (hypothesis test), and explicitly distinguishes from siblings like verify_claim, estimate_market, and compare_competitors.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit guidance on when to use this tool: 'Use this when one conclusion depends on multiple simple checks such as endpoint reachability, npm search counts, or whether a page contains an exact substring.' It also states when not to use it: 'This is a coordination tool, not an open-ended research agent,' and names alternative tools for other scenarios.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

verify_claimClaim Support CheckA
Read-onlyIdempotent
Inspect

Check whether a factual claim is supported by a specific set of public evidence URLs that you already have. For each source, the tool performs a case-insensitive keyword match over the fetched page body, then marks that source as supporting the claim when at least half of the supplied keywords appear. Use this for evidence-backed claim checks on known pages, not for open-ended search, semantic reasoning, or contradiction extraction. The aggregate verdict is driven only by the per-page keyword support ratio. Fetched pages are cached for 5 minutes.

ParametersJSON Schema
NameRequiredDescriptionDefault
claimYesPlain-language claim to verify, for example 'AWS Business support includes 24/7 phone support'.
keywordsYesKeywords or short phrases that should appear on supporting pages. Matching is case-insensitive substring matching, so choose phrases that are likely to appear verbatim.
evidence_urlsYesOne to ten public documentation, pricing, policy, or support URLs that are likely to contain direct evidence for the claim.

Output Schema

ParametersJSON Schema
NameRequiredDescription
claimYesClaim that was evaluated.
sourcesYesPer-source evidence results.
verdictYesAggregate verdict across all supplied sources.
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses matching logic (case-insensitive keyword match, half-keywords threshold), aggregate verdict derivation, and caching behavior. Annotations already indicate readOnly/ openWorld/ idempotent, but description adds valuable detail beyond that. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Only 4-5 sentences, front-loaded with core purpose. Each sentence adds essential information without repetition or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given output schema exists and schema coverage is 100%, the description covers matching logic, caching, and usage boundaries comprehensively. No missing information for an agent to correctly invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good descriptions. Description adds context about the keyword matching mechanism (substring, case-insensitive, half threshold) that is not in the schema, enhancing understanding of how parameters are used.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states verb ('Check'), resource ('factual claim'), and scope ('specific set of public evidence URLs'). Differentiates from siblings like check_endpoint and check_pricing by focusing on claim verification with known URLs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (evidence-backed claim checks on known pages) and when not to use (open-ended search, semantic reasoning, contradiction extraction). Caching behavior (5 minutes) also guides usage expectations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.