Ground Truth

Name: Ground Truth
Author: anish632

by io.github.anish632

Server Details

Live fact-checks for AI agents: endpoints, security headers, pricing, claims, compliance, markets.

Status: Healthy
Last Tested: 2026-05-25 14:33
Transport: Streamable HTTP
URL
Repository: anish632/ground-truth-mcp
GitHub Stars: 0
Server Listing: ground-truth-mcp

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

A4.7/5.0

Tool DescriptionsA

Average 4.7/5 across 9 of 9 tools scored. Lowest: 4.1/5.

Server CoherenceA

Disambiguation5/5

Each tool has a well-defined, non-overlapping purpose: compliance scanning, endpoint reachability, pricing extraction, package comparison, market estimation, header inspection, hypothesis testing, and claim verification. Even the two pricing tools differ in scope (single page vs. comparison).

Naming Consistency5/5

All tool names use snake_case with a consistent verb_noun pattern (e.g., assess_compliance_posture, check_endpoint, compare_competitors). The verbs are descriptive and the naming is predictable.

Tool Count5/5

With 9 tools, the server is neither sparse nor bloated. Every tool serves a distinct function necessary for verifying claims via public sources, making the count well-scoped for its purpose.

Completeness5/5

The tool surface covers the full lifecycle of verification: scanning compliance, checking endpoints, extracting pricing, comparing packages, estimating market size, inspecting headers, testing hypotheses, and verifying claims. No obvious gaps exist for the stated domain.

Available Tools

9 tools

assess_compliance_postureCompliance Signal ScanA

Read-onlyIdempotent

Inspect

Scan a public security, trust, compliance, or legal page for common enterprise buying signals before you claim a vendor supports a particular compliance posture. It looks for public references to SOC 2, ISO 27001, GDPR, HIPAA, DPA terms, subprocessors, SSO, SCIM, encryption, and data residency language. This is a signal scanner, not proof of certification or legal sufficiency.

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	Public trust, security, compliance, or policy URL to scan.

Output Schema

ParametersJSON Schema

Name	Required	Description
`url`	Yes	Compliance or trust page that was analyzed.
`error`	No	Fetch or parsing error when the page could not be analyzed.
`cached`	No	True when the page body came from the 5-minute cache.
`signals`	No	Boolean scan results for common enterprise compliance and security signals.
`pageLength`	No	Size of the fetched page body in characters.
`matchedSignals`	No	Signal names that were detected on the page.

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnly, openWorld, and idempotent hints. The description adds value by disclosing that it is a 'signal scanner' and not authoritative proof, which further clarifies behavioral traits beyond annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two concise sentences, front-loading the purpose and usage context without any filler. Every sentence serves a clear function.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity (1 parameter, clear annotations, existing output schema), the description is fully adequate. It covers the tool's scope, limitations, and typical signals, leaving no critical gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage with a clear description for the single 'url' parameter. The tool description does not add new semantics beyond the schema's description, so a baseline score of 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Scan') and resource ('public security, trust, compliance, or legal page') and explicitly lists example signals (SOC 2, ISO 27001, etc.), clearly differentiating from sibling tools like 'verify_claim' which is about verification, not scanning.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use ('before you claim a vendor supports a particular compliance posture') and clarifies what it is not ('not proof of certification or legal sufficiency'), providing clear context for usage, though no direct alternatives are named.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_endpointEndpoint Reachability CheckA

Read-onlyIdempotent

Inspect

Perform one live, unauthenticated fetch against a public URL or API endpoint before you recommend it, document it, or build on top of it. Use this when the question is simply whether an endpoint currently responds and what kind of response it returns. It reports HTTP status, content type, elapsed time, likely auth/rate-limit signals, and a short response sample. A successful result only proves basic reachability at fetch time. Do not use it to validate authenticated flows, POST side effects, JavaScript execution, or deeper business logic.

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	Public http(s) URL or bare domain to probe. Bare domains like google.com are accepted and normalized to https:// automatically.

Output Schema

ParametersJSON Schema

Name	Required	Description
`url`	Yes	Normalized URL that was actually fetched.
`error`	No	Validation or network error when the request could not be completed.
`status`	No	HTTP status code returned by the endpoint, when a response was received.
`inputUrl`	No	Original user input when normalization changed it, for example when https:// was added.
`accessible`	Yes	True when the endpoint returned a 2xx HTTP status.
`contentType`	No	Response Content-Type header, if present.
`rateLimited`	No	True when the server responded with 429 Too Many Requests.
`authRequired`	No	True when the server responded with 401 or 403, which usually means credentials are required.
`responseTimeMs`	No	Elapsed request time in milliseconds.
`sampleResponse`	No	First 1,000 characters of the response body for quick inspection. Use this as a debugging hint only; it may be truncated and should not be treated as a complete page capture.

Tool Definition Quality

A5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Beyond annotations (readOnly, idempotent, non-destructive), description adds that fetch is unauthenticated, reports HTTP status, content type, elapsed time, auth/rate-limit signals, and a response sample. Also warns that success only proves reachability at fetch time. No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Description is concise, two sentences. Front-loaded with purpose, then usage and output details. Every sentence is necessary and informative.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's simplicity, the description covers purpose, usage guidelines, behavioral traits, and output details (even though output schema exists). No gaps identified.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema has 100% coverage on the single required parameter 'url'. Description adds value by clarifying that bare domains are accepted and normalized to https:// automatically, which goes beyond schema description.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the action (perform live unauthenticated fetch), the resource (public URL/API endpoint), and the context (before recommending, documenting, or building). It distinguishes from sibling tools by focusing on basic reachability.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (when question is whether endpoint responds and what response it returns) and when not to use (not for authenticated flows, POST side effects, JS execution, deeper business logic). Provides clear guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_pricingPricing Page ScanA

Read-onlyIdempotent

Inspect

Fetch a public pricing page and extract first-pass pricing signals before you quote plan costs, free tiers, or plan names. Use this when you already have a likely pricing URL and need a quick live scan of visible page text. It returns price-like strings, heuristic plan labels, free or free-trial signals, and cache information. It does not map prices to exact plans, normalize currencies, execute checkout flows, or guarantee that a price applies to a specific region or customer type. JavaScript-rendered, logged-in, or heavily obfuscated pricing details can be missed. Results are cached for 5 minutes.

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	Public pricing or plans URL to analyze. Prefer the specific pricing page, for example https://stripe.com/pricing, rather than a generic homepage.

Output Schema

ParametersJSON Schema

Name	Required	Description
`url`	Yes	Pricing page that was analyzed.
`error`	No	Fetch or parsing error when the pricing page could not be analyzed.
`cached`	No	True when the page body came from the 5-minute cache instead of a new fetch.
`pageLength`	No	Size of the fetched page body in characters.
`pricesFound`	No	Distinct price-like strings extracted from the page text. These are not linked back to specific plans or billing conditions.
`hasFreeTrial`	No	True when the page contains signals that a free trial exists somewhere on the page.
`hasFreeOption`	No	True when the page contains signals that a free plan or $0 option exists somewhere on the page. This is a page-level signal, not proof that the offer is currently self-serve or globally available.
`plansDetected`	No	Lowercased heuristic plan labels detected from the page text. They are useful hints, not authoritative plan identifiers.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate read-only, idempotent, non-destructive behavior. The description adds context: caching for 5 minutes, 'first-pass' scan, and specific limitations (e.g., missed JS-rendered content). No contradiction with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, front-loading the purpose and use case. Every sentence adds critical information without redundancy, making it easy to parse.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple tool (one required param, output schema present), the description is fully adequate. It explains inputs, outputs, limitations, and caching, leaving no significant gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% for the single `url` parameter. The description adds value by advising to prefer a specific pricing page over a homepage, which aids correct usage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's action: 'Fetch a public pricing page and extract first-pass pricing signals before you quote plan costs, free tiers, or plan names.' It specifies the resource (pricing page) and outputs (price strings, plan labels, free signals), and distinguishes it from siblings like compare_pricing_pages.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly states when to use: 'Use this when you already have a likely pricing URL and need a quick live scan.' It also details exclusions: does not map prices to plans, normalize currencies, etc., and notes that JS-rendered/logged-in pages can be missed, guiding the agent to alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

compare_competitorsNamed Package ComparisonA

Read-onlyIdempotent

Inspect

Compare two or more exact package names side by side using live npm or PyPI metadata. Use this when you already know the candidate packages and need evidence for claims such as 'tool A is newer', 'tool B is still maintained', or 'these packages use different licenses'. It returns per-package registry metadata in input order, with field availability varying by registry. Missing or unpublished packages return found=false. Do not use it to discover unknown alternatives, estimate market size, or compare packages across different registries. Registry responses are cached for 5 minutes.

ParametersJSON Schema

Name	Required	Description	Default
`packages`	Yes	Two to ten exact package names from the same registry, for example ['react', 'vue']. Use exact registry names, not search phrases or categories.
`registry`	No	Registry that all package names belong to. All compared packages must come from the same registry, and returned metadata fields differ slightly between npm and PyPI.	npm

Output Schema

ParametersJSON Schema

Name	Required	Description
`packages`	Yes	Package names that were requested for comparison.
`registry`	Yes	Registry used for all comparisons.
`comparisons`	Yes	Per-package lookup results returned in the same order as the input package list. Some fields only exist for npm or only for PyPI, so consumers should treat absent fields as normal.

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Disclosures beyond annotations: returns per-package metadata in input order, field availability varies, missing packages return found=false, registry caching for 5 minutes. No contradictions with annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, front-loaded with core action, each sentence adds value. No filler or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given output schema exists, description covers purpose, usage guidelines, behavioral details (caching, error state), and edge cases. No missing context for effective invocation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so baseline 3 is appropriate. The description reinforces 'exact package names' and 'same registry', but adds marginal new meaning beyond schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool compares exact package names using live npm or PyPI metadata. It distinguishes from sibling tools like estimate_market by explicitly contrasting use cases.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly specifies when to use (known candidate packages needing evidence like maintenance, license) and when not to (discovering unknown alternatives, market size, cross-registry). Implicitly identifies alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

compare_pricing_pagesPricing Page ComparisonA

Read-onlyIdempotent

Inspect

Compare two to five public pricing pages side by side before you make competitive pricing or packaging claims. Use this when you want a quick, live comparison of visible prices, free-plan signals, and plan-name hints across vendors. The output is heuristic and page-level: it does not map every price to every plan or normalize regional billing differences.

ParametersJSON Schema

Name	Required	Description	Default
`pages`	Yes	Two to five named pricing pages to compare side by side.

Output Schema

ParametersJSON Schema

Name	Required	Description
`pages`	Yes	Per-page pricing signals returned in input order.
`summary`	Yes	Aggregate counts across all compared pricing pages.

Tool Definition Quality

A4.1/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations (readOnlyHint, openWorldHint, idempotentHint, destructiveHint=false) already inform safety and idempotency. Description adds heuristic and page-level nature and what it does not do, providing context beyond annotations. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, each adding value: purpose, usage guidance, behavioral constraints. Front-loaded with the key action. No redundant or vague phrases.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has an output schema (not shown), the description doesn't need to cover return values. It covers purpose, usage, and limitations thoroughly for a comparison tool with good annotations and full schema coverage.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so the schema documents parameters fully. Description adds no extra details about individual parameters beyond the range and purpose. Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'compare' and resource 'pricing pages', with scope 'two to five' and 'side by side'. It does not explicitly differentiate from siblings like 'check_pricing' or 'compare_competitors', but the specificity of pricing pages is distinct.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides a clear use case: 'when you want a quick, live comparison of visible prices, free-plan signals, and plan-name hints'. Also states limitations: 'does not map every price to every plan or normalize regional billing differences'. Missing explicit when-not-to-use or alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

estimate_marketPackage Market SearchA

Read-onlyIdempotent

Inspect

Search npm or PyPI to estimate how crowded a package category is before you claim that a market is empty, niche, or competitive. Use this when you have a category or search phrase such as 'edge orm' and want live result counts plus representative matches. Do not use it to compare exact known package names or to infer adoption from downloads; it reflects search results, not market share. Registry responses are cached for 5 minutes.

ParametersJSON Schema

Name	Required	Description	Default
`query`	Yes	Short registry search phrase to evaluate, for example 'mcp memory server' or 'edge orm'.
`registry`	No	Registry to search. Use 'npm' for JavaScript ecosystems and 'pypi' for Python ecosystems.	npm

Output Schema

ParametersJSON Schema

Name	Required	Description
`query`	Yes	Search phrase that was evaluated.
`registry`	Yes	Registry that was searched.
`topResults`	Yes	Representative top search matches that help interpret the market count.
`totalResults`	Yes	Total number of matching packages reported by the registry search.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate safe, read-only, idempotent behavior. Description adds beyond that: 'Registry responses are cached for 5 minutes' and explains the tool's limitation to search results, not market share. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Four sentences, concise, front-loaded with purpose, no extraneous words. Efficiently conveys all necessary information.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the simple two-parameter tool, rich annotations, and output schema existence, the description covers purpose, usage, behavior, and limitations comprehensively.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for both parameters. The description adds value by mentioning 'live result counts plus representative matches', providing extra context beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool searches npm or PyPI to estimate category crowdedness, using specific verbs 'Search' and 'estimate'. It distinguishes from siblings which focus on compliance, pricing, security headers, etc.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says when to use (when you have a category or search phrase) and when not to (not for comparing exact package names or inferring adoption from downloads). Provides clear context on what the tool reflects.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

inspect_security_headersSecurity Header InspectionA

Read-onlyIdempotent

Inspect

Fetch a public URL and inspect security-relevant response headers before you claim that a product or endpoint has a strong browser-facing security baseline. Use this for quick due diligence on public apps and docs sites. It checks for common headers such as HSTS, CSP, X-Frame-Options, Referrer-Policy, Permissions-Policy, and X-Content-Type-Options. It does not replace a real security review, authenticated testing, or vulnerability scanning.

ParametersJSON Schema

Name	Required	Description	Default
`url`	Yes	Public http(s) URL or bare domain to inspect. Bare domains are normalized to https:// automatically.

Output Schema

ParametersJSON Schema

Name	Required	Description
`url`	Yes	Normalized URL that was fetched.
`error`	No	Validation or network error when the request could not be completed.
`https`	Yes	True when the normalized URL used https.
`score`	No	Heuristic security-header score based on how many tracked headers were present.
`status`	No	HTTP status code returned by the endpoint.
`headers`	No	Tracked response headers and their raw values when present.
`inputUrl`	No	Original user input when normalization changed it.
`accessible`	Yes	True when the endpoint returned an HTTP response.
`presentCount`	No	Number of tracked security headers that were present.
`missingRecommended`	No	Tracked headers that were not present on the response.

Tool Definition Quality

A5/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Adds context beyond annotations by listing specific headers checked (HSTS, CSP, etc.) and reiterating non-replacement of full review. No contradiction with readOnlyHint, idempotentHint.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three concise sentences front-load the purpose, then add useful details and limitations.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with one parameter and an output schema, the description covers purpose, usage, headers checked, and limitations completely.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% and the description adds meaning about automatic normalization of bare domains to HTTPS.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool fetches a URL and inspects security headers, and distinguishes it from sibling tools like assess_compliance_posture or check_endpoint.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use it (before claiming strong browser security) and what it does not replace (real security review, authenticated testing, vulnerability scanning).

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

test_hypothesisMulti-step Hypothesis TestA

Read-onlyIdempotent

Inspect

Run a small verification plan made of concrete live checks and summarize whether a hypothesis is supported. Use this when one conclusion depends on multiple simple checks such as endpoint reachability, npm search counts, or whether a page contains an exact substring. This is a coordination tool, not an open-ended research agent: every test must be explicitly defined in advance, and tests run in order with no branching or early exit. The final verdict is mechanical: all tests passing => SUPPORTED, zero passing => REFUTED, otherwise PARTIALLY SUPPORTED. Use verify_claim when you already have evidence URLs, estimate_market for category sizing, and compare_competitors when you already know exact package names.

ParametersJSON Schema

Name	Required	Description	Default
`tests`	Yes	Ordered list of one to ten checks to run. Each test object uses only the fields required by its type.
`hypothesis`	Yes	Claim to test, for example 'there are fewer than 50 MCP email servers on npm'.

Output Schema

ParametersJSON Schema

Name	Required	Description
`tests`	Yes	Per-test execution results in input order.
`verdict`	Yes	High-level verdict for the hypothesis.
`hypothesis`	Yes	Hypothesis that was evaluated.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

The description reveals important behaviors beyond annotations: tests run sequentially with no branching or early exit, verdict is mechanical (SUPPORTED/REFUTED/PARTIALLY SUPPORTED), and it is not an open-ended research agent. Annotations already indicate safe behavior, but description adds coordination details.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is approximately 5 sentences, front-loaded with purpose, followed by usage guidelines and behavioral details. Every sentence adds value with no redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity of multi-step checks with ordered execution and mechanical verdict, the description covers purpose, usage, behavior, parameters, and alternatives. Output schema exists, so return value explanation is not needed.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, baseline 3. Description adds meaning by explaining the hypothesis parameter and detailing allowed test types (endpoint_exists, npm_count_above, etc.) and their semantics, as well as constraints like explicit definition and ordered execution.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool runs a small verification plan of concrete live checks and summarizes support for a hypothesis. It distinguishes from siblings like verify_claim, estimate_market, and compare_competitors by specifying use cases.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says when to use (multiple simple checks) and when not to (not open-ended research), and lists alternative tools for specific scenarios, providing excellent guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

verify_claimClaim Support CheckA

Read-onlyIdempotent

Inspect

Check whether a factual claim is supported by a specific set of public evidence URLs that you already have. For each source, the tool performs a case-insensitive keyword match over the fetched page body, then marks that source as supporting the claim when at least half of the supplied keywords appear. Use this for evidence-backed claim checks on known pages, not for open-ended search, semantic reasoning, or contradiction extraction. The aggregate verdict is driven only by the per-page keyword support ratio. Fetched pages are cached for 5 minutes.

ParametersJSON Schema

Name	Required	Description
`claim`	Yes	Plain-language claim to verify, for example 'AWS Business support includes 24/7 phone support'.
`keywords`	Yes	Keywords or short phrases that should appear on supporting pages. Matching is case-insensitive substring matching, so choose phrases that are likely to appear verbatim.
`evidence_urls`	Yes	One to ten public documentation, pricing, policy, or support URLs that are likely to contain direct evidence for the claim.

Output Schema

ParametersJSON Schema

Name	Required	Description
`claim`	Yes	Claim that was evaluated.
`sources`	Yes	Per-source evidence results.
`verdict`	Yes	Aggregate verdict across all supplied sources.

Tool Definition Quality

A4.9/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses matching logic (case-insensitive keyword match, half-keywords threshold), aggregate verdict derivation, and caching behavior. Annotations already indicate readOnly/ openWorld/ idempotent, but description adds valuable detail beyond that. No contradiction.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Only 4-5 sentences, front-loaded with core purpose. Each sentence adds essential information without repetition or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given output schema exists and schema coverage is 100%, the description covers matching logic, caching, and usage boundaries comprehensively. No missing information for an agent to correctly invoke the tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with good descriptions. Description adds context about the keyword matching mechanism (substring, case-insensitive, half threshold) that is not in the schema, enhancing understanding of how parameters are used.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states verb ('Check'), resource ('factual claim'), and scope ('specific set of public evidence URLs'). Differentiates from siblings like check_endpoint and check_pricing by focusing on claim verification with known URLs.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use (evidence-backed claim checks on known pages) and when not to use (open-ended search, semantic reasoning, contradiction extraction). Caching behavior (5 minutes) also guides usage expectations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Ground Truth

Server Details

Tool Definition Quality

Available Tools

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Output Schema

Tool Definition Quality

Discussions

Your Connectors

Resources