Reality Graph Verification Tools

by dev.realitygraph

Server Details

Read-only AI coding tools for change verification, release readiness, capacity, and guidance.

Status: Healthy
Last Tested: 2026-07-24 13:26
Transport: Streamable HTTP
URL

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client

Glama

MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.

Tool Definition Quality

A4.2/5.0

Tool DescriptionsA

Average 4.3/5 across 10 of 10 tools scored. Lowest: 3.4/5.

Server CoherenceA

Disambiguation5/5

Each tool has a distinct purpose: capacity calculation, readiness check, debt estimation, knowledge base access, template retrieval, specification linting, verification planning, search, and contract validation. No two tools overlap in functionality.

Naming Consistency5/5

All tool names use consistent snake_case and follow a verb_noun pattern (e.g., calculate_verification_capacity, check_release_readiness, get_task_contract_template). Even simpler names like fetch and search fit the pattern.

Tool Count5/5

With 10 tools, the server is well-scoped for its domain. Each tool addresses a specific verification need without unnecessary redundancy or gaps.

Completeness5/5

The tool set covers the full lifecycle of verification: planning, specification linting, contract validation, readiness checking, debt estimation, capacity calculation, and knowledge base access. No obvious missing operations for the stated purpose.

Available Tools

10 tools

calculate_verification_capacityCalculate verification capacityB

Read-only

Inspect

Calculate weekly review demand, utilization, capacity gap, supported change throughput, and changes lacking evidence from measured team inputs. No cost model, benchmark, or hidden industry assumption is applied; the output shows the arithmetic and a concrete balancing action.

ParametersJSON Schema

Name	Required	Description
`lang`	No	Response language (default: en)
`ai_changes_per_week`	Yes
`two_week_churn_percent`	No
`evidence_coverage_percent`	Yes
`available_reviewer_hours_per_week`	Yes
`average_review_minutes_per_change`	Yes

Tool Definition Quality

B3.4/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already mark the tool as read-only and non-destructive. The description adds value by stating that no cost model, benchmark, or hidden assumption is applied, and that the output shows arithmetic and a concrete balancing action, increasing transparency.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, front-loaded with the outputs it calculates. Every sentence provides essential information without redundancy, making it highly efficient.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool has 6 parameters and no output schema, the description is incomplete. It does not explain the return format, how each output metric is derived, or the meaning of parameters beyond the generic 'measured team inputs.'

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With only 17% schema description coverage (only 'lang' is described), the description fails to clarify the meaning of the other 5 parameters, such as ai_changes_per_week or two_week_churn_percent. Although it mentions 'measured team inputs,' it does not explain individual parameter semantics.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: calculating weekly review demand, utilization, capacity gap, etc. It identifies specific outputs and inputs, but does not explicitly differentiate from sibling tools like check_verification_debt or plan_change_verification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies it should be used with measured team inputs and notes that no hidden assumptions are applied. However, it lacks explicit guidance on when not to use it or how it compares to alternative tools, leaving room for misinterpretation.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_release_readinessCheck release readinessA

Read-only

Inspect

Return GO, CONDITIONAL, or NO_GO from supplied acceptance-criterion results, check evidence, rollback, monitoring, limitations, and independent review. The verdict is deliberately based only on supplied evidence; this tool does not inspect code, CI, or a deployment.

ParametersJSON Schema

Name	Required	Description
`lang`	No	Response language (default: en)
`checks`	Yes
`rollback`	Yes	Current rollback or recovery state
`blast_radius`	Yes	Largest expected impact boundary
`change_types`	Yes	Technical and risk-relevant change types
`change_summary`	Yes	Plain-language summary of the change
`rollback_ready`	Yes
`monitoring_ready`	Yes
`independent_review`	Yes
`acceptance_criteria_failed`	Yes
`acceptance_criteria_passed`	Yes
`known_limitations_recorded`	Yes
`acceptance_criteria_not_run`	Yes

Tool Definition Quality

A3.9/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, and the description confirms no external inspection. The description adds that the verdict is deliberately based only on supplied evidence, which is useful. No contradictions, but could disclose error handling or edge cases.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences that efficiently convey the core purpose and a key constraint. No unnecessary words or repetition.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness2/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Despite 13 parameters and no output schema, the description omits how inputs combine to produce the verdict. There is no explanation of the GO/CONDITIONAL/NO_GO logic, thresholds, or potential failure modes. For a decision tool, this is a significant gap.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters2/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is only 38%, and the description only groups parameters into categories (acceptance criteria, checks, rollback, etc.) without explaining individual parameters like 'blast_radius' or 'change_types'. The description does not sufficiently compensate for the low coverage.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description clearly states the tool returns a verdict (GO, CONDITIONAL, NO_GO) based on supplied acceptance criteria and other evidence. It distinguishes from siblings by explicitly noting it does not inspect code, CI, or deployment, which sets it apart from tools like check_verification_debt or plan_change_verification.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides guidance on when to use this tool: when the user has the evidence already, as it does not inspect external systems. However, it does not explicitly state when not to use it or suggest alternatives, leaving some ambiguity.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

check_verification_debtCheck verification debtA

Read-only

Inspect

Estimate a software team's verification debt from team parameters. Computes the four published metrics (generation-to-verification ratio, review depth, unverified-merge rate, two-week churn) and an annual cost estimate, with the full calculation path, labeled assumptions, thresholds, and sources (GitClear, Sonar, Faros, Veracode). Deterministic arithmetic from published models — no benchmark claims. Only team_size is required; every additional parameter refines the estimate. Set lang='de' for a German report.

ParametersJSON Schema

Name	Required	Description
`lang`	No	Report language (default: en)
`team_size`	Yes	Number of developers on the team (required)
`prs_per_month`	No	Total merged PRs per month (default: derived from team size)
`hourly_rate_eur`	No	Loaded cost per engineer hour in EUR (default: 75, assumption)
`ai_share_percent`	No	Share of merges that are AI-assisted, in percent (default: 60, assumption)
`ai_merges_per_month`	No	AI-assisted merges per month (enables the unverified-merge rate)
`merged_loc_per_week`	No	Merged changed lines of code per week (enables the GVR and review-depth metrics)
`two_week_churn_percent`	No	Share of new lines revised or reverted within 14 days, in percent (default: published GitClear trend delta as assumption)
`reviewer_hours_per_week`	No	Reviewer hours actually spent per week (enables the GVR metric)
`hours_per_reworked_change`	No	Average hours per reworked change (default: 6, assumption)
`incident_allowance_eur_per_year`	No	Annual incident allowance in EUR (default: 20000, widest error bar)
`ai_merges_with_evidence_per_month`	No	AI-assisted merges per month with recorded validation evidence (enables the unverified-merge rate)
`review_reconstruction_hours_per_pr`	No	Average reviewer hours spent reconstructing intent per AI-assisted PR (default: 0.5, assumption)
`substantive_review_comments_per_week`	No	Substantive review comments per week, excluding bots and nitpicks (enables the review-depth metric)

Tool Definition Quality

A4.4/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, and the description adds significant behavioral context: 'Deterministic arithmetic from published models — no benchmark claims,' and mentions the full calculation path, labeled assumptions, thresholds, and sources. This goes well beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise and well-structured. It starts with the main action, lists outputs, clarifies determinism, and ends with usage guidance. Every sentence adds value, with no wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description explains what the output includes (metrics and cost estimate) and the nature (full calculation path, assumptions). It does not specify the exact format (e.g., JSON vs. textual report), but the mention of a 'report' and language option implies a readable format, which may be sufficient.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100%, so parameters are well-documented in the schema. The description adds overarching context ('every additional parameter refines the estimate') and the lang parameter hint, but no further semantic enrichment for individual parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Estimate a software team's verification debt from team parameters.' It lists the four metrics computed and the annual cost estimate, distinguishing it from siblings like 'get_verification_report_template' which likely provide templates rather than computation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explains that only team_size is required and additional parameters refine the estimate, guiding usage. It also mentions setting lang='de' for a German report. However, it does not explicitly contrast with alternatives or state when not to use the tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

fetchFetch a knowledge base documentA

Read-only

Inspect

Fetch a document from the Reality Graph knowledge base by id (as returned by search, e.g. '/verification-debt') or by full realitygraph.dev URL. Returns the document's summary, definitions, key facts, FAQ, and sources as text, plus the canonical URL.

ParametersJSON Schema

Name	Required	Description	Default
`id`	Yes	Document id from search results, or a realitygraph.dev URL

Output Schema

ParametersJSON Schema

Name	Required	Description
`id`	Yes
`url`	Yes
`text`	Yes
`title`	Yes
`metadata`	No

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, destructiveHint=false. Description adds value by detailing the returned content (summary, definitions, key facts, FAQ, sources, canonical URL).

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no wasted words, front-loaded with verb and resource.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Output schema exists for return values; description explains what is returned. Complete for a retrieval tool with good annotations and single parameter.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description covers 100% of parameter, but description adds example of id format ('/verification-debt') and clarifies that full URL is also accepted, adding meaning beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Fetch a document from the Reality Graph knowledge base by id or by full URL' with example format, distinguishing it from search and other siblings.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Implies usage after search (by id as returned by search) but does not explicitly state when not to use or provide alternatives.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_task_contract_templateGet the verifiable task contract templateA

Read-only

Inspect

Returns Reality Graph's free fill-in template (v0) for a verifiable task contract: goal, non-goals, boundaries (may change / must not change / forbidden), 3-7 yes/no acceptance criteria, validation plan, expected evidence, assumptions, open questions — with a filled example and fill-in guidance. Write the contract before an AI agent runs; verify the result against it after. format='json' returns a machine-fillable JSON structure; default is a compact markdown skeleton. Set lang='de' for German. Static content, nothing stored.

ParametersJSON Schema

Name	Required	Description	Default
`lang`	No	Language (default: en)
`format`	No	Template format (default: markdown)

Tool Definition Quality

A4.7/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true and destructiveHint=false. The description adds that content is static and nothing is stored, aligning with annotations and providing extra context about the tool's non-destructive, read-only behavior.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is concise, three sentences, no redundant information. It front-loads the main purpose and each sentence contributes useful information without waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given no output schema, the description details the template contents (goal, non-goals, boundaries, etc.) and mentions version (v0). It is sufficiently complete for a template retrieval tool with clear annotations.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with parameter descriptions for lang and format. The description adds value by explaining the effect of each parameter (e.g., 'format='json' returns a machine-fillable JSON structure; default is a compact markdown skeleton') and using them in context.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states it returns a fill-in template for a verifiable task contract, listing its contents (goal, non-goals, etc.) and distinguishing it from sibling tools like validate_task_contract. It uses specific verbs and resources.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides clear usage guidance: 'Write the contract before an AI agent runs; verify the result against it after.' It explains format and language options, though it does not explicitly exclude alternative tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

get_verification_report_templateGet the verification report templateA

Read-only

Inspect

Returns the free fill-in template (v0) for a verification report — the artifact you write right after an AI-assisted run: task recap, files changed AND files confirmed untouched, validation results per acceptance criterion (not authored by the generating model), what was skipped, limitations, and the explicit decision. format='json' for a machine-fillable structure; default is a compact markdown file. Static content, nothing stored. lang='de' for German.

ParametersJSON Schema

Name	Required	Description	Default
`lang`	No	Language (default: en)
`format`	No	Template format (default: markdown)

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already declare readOnlyHint=true, destructiveHint=false. Description adds 'Static content, nothing stored' reinforcing safe behavior, and explains format options. No contradictions.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences front-load purpose, then detail. Every sentence adds value: purpose, contents, format, language. No redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a simple static template retrieval with 2 params and no output schema, description fully explains what is returned, usage context, and parameter options. No gaps.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema provides enum values with basic descriptions. Description adds meaning: 'format='json' for a machine-fillable structure; default is a compact markdown file' and 'lang='de' for German', enriching parameter understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states 'Returns the free fill-in template (v0) for a verification report' with specific contents. Distinguishes from sibling tool 'get_task_contract_template' which is for contracts.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Describes usage context ('right after an AI-assisted run') and template content, but does not explicitly state when not to use or mention alternative tools like 'check_verification_debt'.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

lint_task_specLint a task specificationA

Read-only

Inspect

Check whether a free-text work order for an AI coding agent is verifiable BEFORE handing it over. Heuristic, deterministic lint of the task's form against the four building blocks of a checkable task (goal, boundaries, acceptance criteria, validation plan) plus rule checks (vague adjectives without numbers, unnamed unhappy paths, missing file anchors). Returns a status table with evidence, the concrete questions that close each gap, and a fill-in skeleton. It checks form, not content — no LLM, nothing stored. Set lang='de' for a German report.

ParametersJSON Schema

Name	Required	Description	Default
`lang`	No	Report language (default: en)
`task`	Yes	The work order / task text you intend to give an AI coding agent (English or German)

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations already indicate readOnlyHint=true, and the description adds value by disclosing that the tool is deterministic, uses no LLM, and stores nothing. This goes beyond the annotations, providing full transparency about behavior and side effects.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single paragraph of about 100 words, front-loading the core purpose. Each sentence adds essential information: the function, what it checks, what it returns, and key qualifiers (no LLM, nothing stored). No wasted words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The tool has no output schema, but the description explicitly details the return value: 'a status table with evidence, the concrete questions that close each gap, and a fill-in skeleton.' This compensates well. The sibling tools are related but distinct, and the description leaves no major gaps for a lint operation.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 100% description coverage, setting a baseline of 3. The description adds extra meaning by specifying that 'task' is the work order for an AI agent and that setting 'lang=\'en\'' or 'lang=\'de\'' controls the report language. This provides helpful context beyond the schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'Check whether a free-text work order for an AI coding agent is verifiable BEFORE handing it over.' It uses specific verbs ('Check') and resources ('task specification'), and distinguishes itself from sibling tools by emphasizing form-checking vs. content validation or debt checking.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description explicitly says to use the tool 'BEFORE handing [the task] over' to an AI agent, providing clear context. It also notes what it checks (form, not content) and that it's deterministic with no LLM, implicitly excluding scenarios requiring content verification. However, it does not explicitly name alternatives or conditions to avoid using this tool.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

plan_change_verificationPlan verification for a changeA

Read-only

Inspect

Turn explicit change characteristics into a risk tier, required automated checks, manual scenarios, evidence, release blockers, role handoff, and canonical Reality Graph guidance. Use before implementation or review. It does not inspect code and never invents a confidence score.

ParametersJSON Schema

Name	Required	Description
`lang`	No	Response language (default: en)
`rollback`	Yes	Current rollback or recovery state
`blast_radius`	Yes	Largest expected impact boundary
`change_types`	Yes	Technical and risk-relevant change types
`change_summary`	Yes	Plain-language summary of the change

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations declare readOnlyHint=true, destructiveHint=false, and the description reinforces this by stating it does not inspect code or invent confidence scores. It adds behavioral details (no code inspection, no confidence score) beyond the annotations.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is a single, well-structured sentence that front-loads the core purpose and constraints. Every part adds value without redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (5 params, no output schema), the description covers purpose, usage, and limitations. It could mention output format briefly, but the listed outputs (risk tier, checks, etc.) provide adequate completeness.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100%, so the schema fully documents parameters. The description adds context by explaining that parameters (change characteristics) are used to produce outputs, but does not repeat schema details. This is appropriate for high-coverage schemas.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description uses a specific verb ('Turn') and resource ('change characteristics') and lists concrete outputs (risk tier, checks, scenarios, etc.). It explicitly excludes code inspection and confidence scoring, distinguishing it from siblings like lint_task_spec.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description says 'Use before implementation or review' and clarifies what the tool does not do, providing context. However, it lacks explicit when-not-to-use guidance or direct comparisons to sibling tools.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

searchSearch the Reality Graph knowledge baseA

Read-only

Inspect

Full-text search over the Reality Graph knowledge base on AI coding verification: 40+ glossary definitions, 700+ FAQ answers, sourced statistics, and article summaries on verification debt, AI code review, spec-vs-implementation checking, EU compliance (EU AI Act, GDPR, NIS2), and AI coding governance — in English and German. Returns matching documents with title, URL, and snippet. Use fetch to read a result.

ParametersJSON Schema

Name	Required	Description	Default
`lang`	No	Restrict results to one language (default: both)
`query`	Yes	Search query (English or German)

Output Schema

ParametersJSON Schema

Name	Required	Description
`results`	Yes

Tool Definition Quality

A4.3/5.0

Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Discloses return format (documents with title, URL, snippet) beyond annotations (readOnlyHint, destructiveHint). No contradictions with annotations. Adds useful behavioral context.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two focused sentences. First sentence states purpose and scope, second provides usage hint. No wasted words; effectively front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given high schema coverage and presence of an output schema, the description fully covers purpose, behavior, and return format. No gaps for an agent to operate.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with detailed descriptions for both parameters. The description adds the language context ('English and German') but does not significantly enhance semantic understanding beyond schema defaults.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description explicitly states 'Full-text search over the Reality Graph knowledge base' and lists specific content (40+ glossary definitions, 700+ FAQ answers), clearly distinguishing it from sibling tools like fetch and check_verification_debt.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Provides explicit guidance to 'Use fetch to read a result', indicating post-search action. However, it does not explicitly state when not to use search or mention alternatives beyond fetch.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

validate_task_contractValidate a filled task contractA

Read-only

Inspect

Deterministically validates a FILLED task contract (the JSON structure from get_task_contract_template): completeness of goal/non-goals/boundaries, decidability of each acceptance criterion (vague words, missing measurable markers), automated checks in the validation plan, expected evidence, and leftover placeholders. Returns a verdict (PASS / PASS WITH WARNINGS / FAIL), four dimension scores, and a concrete fix per finding. Validates form and completeness, not correctness. No LLM, nothing stored. lang='de' for German.

ParametersJSON Schema

Name	Required	Description	Default
`lang`	No	Report language (default: en)
`contract`	Yes	The filled task contract as a JSON string (structure from get_task_contract_template, format='json')

Tool Definition Quality

A4.6/5.0

Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Disclosures beyond annotations: deterministic, no LLM, nothing stored. Also describes return values (verdict, scores, fix). No contradictions with readOnlyHint=true.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single paragraph with all key information, but could be slightly more structured for quick scanning. Still concise and no fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Describes return values despite no output schema, covers all aspects of tool behavior. No missing context.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema covers 100% of params but description adds meaning: contract must be from get_task_contract_template and format='json', lang 'de' for German. Adds value beyond schema.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Clearly states it validates a filled task contract, specifies what it checks (goal/non-goals, acceptance criteria, etc.), and distinguishes from siblings like lint_task_spec which is separate.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly says it validates a FILLED contract, mentions it validates form and completeness not correctness, but does not explicitly state when to use vs alternatives or scenarios to avoid.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:

{
  "$schema": "https://glama.ai/mcp/schemas/connector.json",
  "maintainers": [{ "email": "your-email@example.com" }]
}

The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Resources

Need Help?