Skip to main content
Glama

Server Details

Post-tx outcome verification via LLM-as-judge. Zero Core Verify.

Status
Healthy
Last Tested
Transport
Streamable HTTP
URL
Repository
meltingpixelsai/harvey-verify
GitHub Stars
0

Glama MCP Gateway

Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.

MCP client
Glama
MCP server

Full call logging

Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.

Tool access control

Enable or disable individual tools per connector, so you decide what your agents can and cannot do.

Managed credentials

Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.

Usage analytics

See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.

100% free. Your data is private.
Tool DescriptionsA

Average 4/5 across 5 of 5 tools scored.

Server CoherenceA
Disambiguation5/5

Each tool performs a distinct function: health checks server status, list_tools discovers other tools, get_service_quality retrieves aggregated metrics, and report_outcome/verify_outcome record results at different granularities. Descriptions clearly differentiate between simple and detailed outcome reporting.

Naming Consistency4/5

Four tools follow a consistent verb_noun pattern (get_service_quality, list_tools, report_outcome, verify_outcome), but 'health' is a bare noun, breaking the pattern. This slight inconsistency is minor and still readable.

Tool Count5/5

With 5 tools, the server is well-scoped for verification tasks: system health check, tool discovery, aggregated quality retrieval, and two outcome recording methods. Each tool has a clear purpose without redundancy.

Completeness4/5

The tool set covers core verification workflows: recording outcomes and retrieving aggregated quality. However, lacking a tool to retrieve individual verification records or list services creates minor gaps that agents might work around.

Available Tools

5 tools
get_service_qualityAInspect

Get aggregated quality scores for a service based on all past verifications. Returns average completeness, accuracy, pass rate, format compliance rate, SLA compliance rate, and quality trend.

ParametersJSON Schema
NameRequiredDescriptionDefault
service_idYesService identifier to query (e.g. 'harvey-tools/scrape_url')
Behavior2/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are empty, so the description carries full burden. However, it only states what is returned, not behavioral traits like whether it's read-only, idempotent, or requires specific permissions. No side effects or rate limits are disclosed.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Two sentences, no redundant wording. Every part adds value: action, resource, and return values.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

With no output schema, the description adequately lists return metrics. However, it lacks behavioral context (e.g., if it's read-only). For a simple one-parameter tool, it is mostly complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema description coverage is 100% for the single parameter, and the description does not add significant additional meaning beyond what the schema provides (e.g., example values). Baseline 3 is appropriate.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action ('Get aggregated quality scores') and the resource ('for a service based on all past verifications'), and lists the specific metrics returned. It distinguishes itself from siblings like 'report_outcome' and 'verify_outcome' by focusing on quality scores aggregation.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage when needing quality scores, but provides no explicit guidance on when to use this tool versus alternatives (e.g., when to use 'verify_outcome' instead). No exclusions or prerequisites are mentioned.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

healthAInspect

Check Harvey Verify server status, uptime, and payment network configuration.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. 'Check' implies read-only, but does not explicitly state non-destructive behavior, authentication needs, or response format. Adequate for a simple health check but lacks explicit safety disclosures.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Single sentence conveying core purpose and scope. Front-loaded with verb and resource, no unnecessary words.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a zero-parameter health check without output schema, the description provides sufficient context: what aspects are checked. Missing details like response type (e.g., JSON object with fields) but acceptable given simplicity.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

No parameters, so baseline 4. Description adds meaning by specifying what the tool checks (status, uptime, payment configuration), which effectively informs the agent of the tool's scope without needing parameter details.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

Description states a specific verb 'Check' and resource 'Harvey Verify server', listing specific aspects: status, uptime, payment network configuration. It clearly distinguishes from sibling tools like get_service_quality or verify_outcome.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines2/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

No guidance on when to use this tool vs alternatives. Does not mention prerequisites, such as connection requirements, or when it is appropriate to call before other operations.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

list_toolsAInspect

List all available Harvey Verify tools with pricing and input requirements. Use this for discovery.

ParametersJSON Schema
NameRequiredDescriptionDefault

No parameters

Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of transparency. It reveals that the tool returns a list of tools with pricing and input requirements, but does not mention behavioral traits such as idempotency, authentication needs, or response format. However, for a read-only listing tool, this level of disclosure is minimally adequate.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description consists of two concise, informative sentences. The first sentence states the tool's action and scope, and the second provides usage guidance. No unnecessary words or redundancy.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

The description sufficiently explains what the tool returns (list of all tools with pricing and input requirements), which is adequate given no output schema. It could be slightly more specific about the format (e.g., JSON), but overall it is complete enough for an agent to understand the tool's value.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters4/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has no parameters, so schema description coverage is 100%. The description does not need to add parameter semantics, and a baseline score of 4 is appropriate given there is nothing to add beyond what the schema already conveys.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the action (list), resource (all available Harvey Verify tools), and what it includes (pricing and input requirements). It also adds 'Use this for discovery,' which helps distinguish it from sibling tools like get_service_quality, health, report_outcome, and verify_outcome, which serve different purposes.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines4/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description advises using this tool 'for discovery,' implying it is the starting point for tool exploration. While it does not explicitly exclude other use cases or mention alternatives, the sibling tools are contextually distinct enough that an agent can infer when to use this tool versus others.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

report_outcomeAInspect

Record a simple pass/fail outcome report for a service call. No LLM analysis - just logs the result to the quality database. Cheaper alternative to verify_outcome when you only need to record success/failure.

ParametersJSON Schema
NameRequiredDescriptionDefault
notesNoOptional notes about the outcome
service_idYesService identifier (e.g. 'harvey-tools/scrape_url')
was_successfulYesWhether the service call succeeded
response_time_msNoResponse time in milliseconds
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

No annotations provided, so description carries full burden. It explains the tool is simple, logs to database, no analysis, and cheap. Could add more detail on idempotency or side effects, but sufficient for safe selection.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

Three sentences, no wasted words. First sentence immediately states purpose, second adds key differentiator, third gives cost context. Efficient and front-loaded.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the 4 parameters (all described in schema), no output schema, and no annotations, the description covers purpose, usage guidance, and cost trade-off. Complete for an agent to decide when to call this tool.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Schema coverage is 100% with descriptions for all parameters. The description adds no extra meaning beyond repeating schema info; baseline of 3 is appropriate since the schema already explains parameters.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the verb 'Record' and the resource 'simple pass/fail outcome report for a service call' and distinguishes from the sibling 'verify_outcome' by noting it's a 'cheaper alternative' with 'no LLM analysis'.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

Explicitly states when to use: 'Cheaper alternative to verify_outcome when you only need to record success/failure', implying not to use when more detailed analysis is needed.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

verify_outcomeAInspect

Post-transaction verification using LLM-as-judge. Checks if a service delivered what was promised. Returns completeness score, accuracy score, format compliance, SLA adherence, issues list, and overall pass/fail.

ParametersJSON Schema
NameRequiredDescriptionDefault
service_idNoService identifier for quality tracking (e.g. 'harvey-tools/scrape_url')
response_dataYesThe actual response/output received from the service
expected_schemaNoExpected output format or JSON schema
sla_requirementsNoSLA requirements to check against (e.g. 'response under 5s, must include all fields')
request_descriptionYesWhat was requested from the service
Behavior3/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

Annotations are absent, so the description must disclose behavioral traits. It explains the tool uses an LLM-judge and returns verification scores, but does not clarify if the operation is read-only, has side effects, external calls, or cost implications. Adequate but not comprehensive.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness5/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is two sentences, immediately front-loading the purpose and output. Every word adds value with no redundancy or fluff.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

For a tool with 5 parameters and no output schema, the description summarizes the key output elements (completeness score, accuracy, etc.) and the verification approach. It could benefit from more detail on scoring criteria or edge cases, but is largely complete.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters3/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

Input schema coverage is 100% with descriptions for all 5 parameters. The tool description adds no additional parameter-level information beyond the schema, so it meets the baseline but does not enhance understanding.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose as 'post-transaction verification using LLM-as-judge' and lists the specific checks (completeness, accuracy, format compliance, SLA adherence) and outputs (issues list, pass/fail). This distinguishes it from sibling tools like get_service_quality or report_outcome.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines3/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description implies usage after a service transaction to verify outcome, but does not explicitly state when to use versus alternatives, nor does it provide exclusions or prerequisites. Context signals and sibling names help but the description itself lacks direct guidance.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Discussions

No comments yet. Be the first to start the discussion!

Try in Browser

Your Connectors

Sign in to create a connector for this server.