Reality Graph Verification Tools
Server Details
Free read-only AI coding verification tools: verification-debt calculator, task-spec lint, search.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4.5/5 across 7 of 7 tools scored.
Each tool has a distinct purpose: compute metrics, fetch documents, retrieve templates, lint tasks, search, and validate contracts. No overlap; even the two template tools are clearly differentiated by their target artifact.
All tool names follow a consistent lowercase_with_underscores pattern, predominantly verb_noun structure (e.g., check_verification_debt, validate_task_contract). Exceptions like 'fetch' and 'search' are single verbs but fit the pattern and are not confusing.
7 tools is an ideal size for this domain. Each tool addresses a specific need in the verification workflow without redundancy. The scope is narrow enough to avoid feature bloat but broad enough to be useful.
The set covers the full verification lifecycle: retrieving templates, linting task specs, validating contracts, computing verification debt, and accessing the knowledge base. There are no obvious gaps; the tools feel complete for their stated purpose.
Available Tools
7 toolscheck_verification_debtCheck verification debtARead-onlyInspect
Estimate a software team's verification debt from team parameters. Computes the four published metrics (generation-to-verification ratio, review depth, unverified-merge rate, two-week churn) and an annual cost estimate, with the full calculation path, labeled assumptions, thresholds, and sources (GitClear, Sonar, Faros, Veracode). Deterministic arithmetic from published models — no benchmark claims. Only team_size is required; every additional parameter refines the estimate. Set lang='de' for a German report.
| Name | Required | Description | Default |
|---|---|---|---|
| lang | No | Report language (default: en) | |
| team_size | Yes | Number of developers on the team (required) | |
| prs_per_month | No | Total merged PRs per month (default: derived from team size) | |
| hourly_rate_eur | No | Loaded cost per engineer hour in EUR (default: 75, assumption) | |
| ai_share_percent | No | Share of merges that are AI-assisted, in percent (default: 60, assumption) | |
| ai_merges_per_month | No | AI-assisted merges per month (enables the unverified-merge rate) | |
| merged_loc_per_week | No | Merged changed lines of code per week (enables the GVR and review-depth metrics) | |
| two_week_churn_percent | No | Share of new lines revised or reverted within 14 days, in percent (default: published GitClear trend delta as assumption) | |
| reviewer_hours_per_week | No | Reviewer hours actually spent per week (enables the GVR metric) | |
| hours_per_reworked_change | No | Average hours per reworked change (default: 6, assumption) | |
| incident_allowance_eur_per_year | No | Annual incident allowance in EUR (default: 20000, widest error bar) | |
| ai_merges_with_evidence_per_month | No | AI-assisted merges per month with recorded validation evidence (enables the unverified-merge rate) | |
| review_reconstruction_hours_per_pr | No | Average reviewer hours spent reconstructing intent per AI-assisted PR (default: 0.5, assumption) | |
| substantive_review_comments_per_week | No | Substantive review comments per week, excluding bots and nitpicks (enables the review-depth metric) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations declare readOnlyHint=true, and the description adds significant behavioral context: 'Deterministic arithmetic from published models — no benchmark claims,' and mentions the full calculation path, labeled assumptions, thresholds, and sources. This goes well beyond the annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise and well-structured. It starts with the main action, lists outputs, clarifies determinism, and ends with usage guidance. Every sentence adds value, with no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema, the description explains what the output includes (metrics and cost estimate) and the nature (full calculation path, assumptions). It does not specify the exact format (e.g., JSON vs. textual report), but the mention of a 'report' and language option implies a readable format, which may be sufficient.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100%, so parameters are well-documented in the schema. The description adds overarching context ('every additional parameter refines the estimate') and the lang parameter hint, but no further semantic enrichment for individual parameters.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Estimate a software team's verification debt from team parameters.' It lists the four metrics computed and the annual cost estimate, distinguishing it from siblings like 'get_verification_report_template' which likely provide templates rather than computation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explains that only team_size is required and additional parameters refine the estimate, guiding usage. It also mentions setting lang='de' for a German report. However, it does not explicitly contrast with alternatives or state when not to use the tool.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
fetchFetch a knowledge base documentARead-onlyInspect
Fetch a document from the Reality Graph knowledge base by id (as returned by search, e.g. '/verification-debt') or by full realitygraph.dev URL. Returns the document's summary, definitions, key facts, FAQ, and sources as text, plus the canonical URL.
| Name | Required | Description | Default |
|---|---|---|---|
| id | Yes | Document id from search results, or a realitygraph.dev URL |
Output Schema
| Name | Required | Description |
|---|---|---|
| id | Yes | |
| url | Yes | |
| text | Yes | |
| title | Yes | |
| metadata | No |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, destructiveHint=false. Description adds value by detailing the returned content (summary, definitions, key facts, FAQ, sources, canonical URL).
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, no wasted words, front-loaded with verb and resource.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Output schema exists for return values; description explains what is returned. Complete for a retrieval tool with good annotations and single parameter.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description covers 100% of parameter, but description adds example of id format ('/verification-debt') and clarifies that full URL is also accepted, adding meaning beyond schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states 'Fetch a document from the Reality Graph knowledge base by id or by full URL' with example format, distinguishing it from search and other siblings.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Implies usage after search (by id as returned by search) but does not explicitly state when not to use or provide alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_task_contract_templateGet the verifiable task contract templateARead-onlyInspect
Returns Reality Graph's free fill-in template (v0) for a verifiable task contract: goal, non-goals, boundaries (may change / must not change / forbidden), 3-7 yes/no acceptance criteria, validation plan, expected evidence, assumptions, open questions — with a filled example and fill-in guidance. Write the contract before an AI agent runs; verify the result against it after. format='json' returns a machine-fillable JSON structure; default is a compact markdown skeleton. Set lang='de' for German. Static content, nothing stored.
| Name | Required | Description | Default |
|---|---|---|---|
| lang | No | Language (default: en) | |
| format | No | Template format (default: markdown) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=true and destructiveHint=false. The description adds that content is static and nothing is stored, aligning with annotations and providing extra context about the tool's non-destructive, read-only behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is concise, three sentences, no redundant information. It front-loads the main purpose and each sentence contributes useful information without waste.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema, the description details the template contents (goal, non-goals, boundaries, etc.) and mentions version (v0). It is sufficiently complete for a template retrieval tool with clear annotations.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with parameter descriptions for lang and format. The description adds value by explaining the effect of each parameter (e.g., 'format='json' returns a machine-fillable JSON structure; default is a compact markdown skeleton') and using them in context.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states it returns a fill-in template for a verifiable task contract, listing its contents (goal, non-goals, etc.) and distinguishing it from sibling tools like validate_task_contract. It uses specific verbs and resources.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides clear usage guidance: 'Write the contract before an AI agent runs; verify the result against it after.' It explains format and language options, though it does not explicitly exclude alternative tools.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
get_verification_report_templateGet the verification report templateARead-onlyInspect
Returns the free fill-in template (v0) for a verification report — the artifact you write right after an AI-assisted run: task recap, files changed AND files confirmed untouched, validation results per acceptance criterion (not authored by the generating model), what was skipped, limitations, and the explicit decision. format='json' for a machine-fillable structure; default is a compact markdown file. Static content, nothing stored. lang='de' for German.
| Name | Required | Description | Default |
|---|---|---|---|
| lang | No | Language (default: en) | |
| format | No | Template format (default: markdown) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already declare readOnlyHint=true, destructiveHint=false. Description adds 'Static content, nothing stored' reinforcing safe behavior, and explains format options. No contradictions.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences front-load purpose, then detail. Every sentence adds value: purpose, contents, format, language. No redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a simple static template retrieval with 2 params and no output schema, description fully explains what is returned, usage context, and parameter options. No gaps.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema provides enum values with basic descriptions. Description adds meaning: 'format='json' for a machine-fillable structure; default is a compact markdown file' and 'lang='de' for German', enriching parameter understanding.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states 'Returns the free fill-in template (v0) for a verification report' with specific contents. Distinguishes from sibling tool 'get_task_contract_template' which is for contracts.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Describes usage context ('right after an AI-assisted run') and template content, but does not explicitly state when not to use or mention alternative tools like 'check_verification_debt'.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
lint_task_specLint a task specificationARead-onlyInspect
Check whether a free-text work order for an AI coding agent is verifiable BEFORE handing it over. Heuristic, deterministic lint of the task's form against the four building blocks of a checkable task (goal, boundaries, acceptance criteria, validation plan) plus rule checks (vague adjectives without numbers, unnamed unhappy paths, missing file anchors). Returns a status table with evidence, the concrete questions that close each gap, and a fill-in skeleton. It checks form, not content — no LLM, nothing stored. Set lang='de' for a German report.
| Name | Required | Description | Default |
|---|---|---|---|
| lang | No | Report language (default: en) | |
| task | Yes | The work order / task text you intend to give an AI coding agent (English or German) |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=true, and the description adds value by disclosing that the tool is deterministic, uses no LLM, and stores nothing. This goes beyond the annotations, providing full transparency about behavior and side effects.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single paragraph of about 100 words, front-loading the core purpose. Each sentence adds essential information: the function, what it checks, what it returns, and key qualifiers (no LLM, nothing stored). No wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The tool has no output schema, but the description explicitly details the return value: 'a status table with evidence, the concrete questions that close each gap, and a fill-in skeleton.' This compensates well. The sibling tools are related but distinct, and the description leaves no major gaps for a lint operation.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 100% description coverage, setting a baseline of 3. The description adds extra meaning by specifying that 'task' is the work order for an AI agent and that setting 'lang=\'en\'' or 'lang=\'de\'' controls the report language. This provides helpful context beyond the schema descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Check whether a free-text work order for an AI coding agent is verifiable BEFORE handing it over.' It uses specific verbs ('Check') and resources ('task specification'), and distinguishes itself from sibling tools by emphasizing form-checking vs. content validation or debt checking.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly says to use the tool 'BEFORE handing [the task] over' to an AI agent, providing clear context. It also notes what it checks (form, not content) and that it's deterministic with no LLM, implicitly excluding scenarios requiring content verification. However, it does not explicitly name alternatives or conditions to avoid using this tool.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
searchSearch the Reality Graph knowledge baseARead-onlyInspect
Full-text search over the Reality Graph knowledge base on AI coding verification: 40+ glossary definitions, 700+ FAQ answers, sourced statistics, and article summaries on verification debt, AI code review, spec-vs-implementation checking, EU compliance (EU AI Act, GDPR, NIS2), and AI coding governance — in English and German. Returns matching documents with title, URL, and snippet. Use fetch to read a result.
| Name | Required | Description | Default |
|---|---|---|---|
| lang | No | Restrict results to one language (default: both) | |
| query | Yes | Search query (English or German) |
Output Schema
| Name | Required | Description |
|---|---|---|
| results | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses return format (documents with title, URL, snippet) beyond annotations (readOnlyHint, destructiveHint). No contradictions with annotations. Adds useful behavioral context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two focused sentences. First sentence states purpose and scope, second provides usage hint. No wasted words; effectively front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given high schema coverage and presence of an output schema, the description fully covers purpose, behavior, and return format. No gaps for an agent to operate.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with detailed descriptions for both parameters. The description adds the language context ('English and German') but does not significantly enhance semantic understanding beyond schema defaults.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states 'Full-text search over the Reality Graph knowledge base' and lists specific content (40+ glossary definitions, 700+ FAQ answers), clearly distinguishing it from sibling tools like fetch and check_verification_debt.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Provides explicit guidance to 'Use fetch to read a result', indicating post-search action. However, it does not explicitly state when not to use search or mention alternatives beyond fetch.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
validate_task_contractValidate a filled task contractARead-onlyInspect
Deterministically validates a FILLED task contract (the JSON structure from get_task_contract_template): completeness of goal/non-goals/boundaries, decidability of each acceptance criterion (vague words, missing measurable markers), automated checks in the validation plan, expected evidence, and leftover placeholders. Returns a verdict (PASS / PASS WITH WARNINGS / FAIL), four dimension scores, and a concrete fix per finding. Validates form and completeness, not correctness. No LLM, nothing stored. lang='de' for German.
| Name | Required | Description | Default |
|---|---|---|---|
| lang | No | Report language (default: en) | |
| contract | Yes | The filled task contract as a JSON string (structure from get_task_contract_template, format='json') |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Disclosures beyond annotations: deterministic, no LLM, nothing stored. Also describes return values (verdict, scores, fix). No contradictions with readOnlyHint=true.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single paragraph with all key information, but could be slightly more structured for quick scanning. Still concise and no fluff.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Describes return values despite no output schema, covers all aspects of tool behavior. No missing context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema covers 100% of params but description adds meaning: contract must be from get_task_contract_template and format='json', lang 'de' for German. Adds value beyond schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Clearly states it validates a filled task contract, specifies what it checks (goal/non-goals, acceptance criteria, etc.), and distinguishes from siblings like lint_task_spec which is separate.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly says it validates a FILLED contract, mentions it validates form and completeness not correctness, but does not explicitly state when to use vs alternatives or scenarios to avoid.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!