zero-core-verify
Server Details
Post-tx outcome verification via LLM-as-judge. Zero Core Verify.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
- Repository
- meltingpixelsai/harvey-verify
- GitHub Stars
- 0
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4/5 across 5 of 5 tools scored.
Each tool performs a distinct function: health checks server status, list_tools discovers other tools, get_service_quality retrieves aggregated metrics, and report_outcome/verify_outcome record results at different granularities. Descriptions clearly differentiate between simple and detailed outcome reporting.
Four tools follow a consistent verb_noun pattern (get_service_quality, list_tools, report_outcome, verify_outcome), but 'health' is a bare noun, breaking the pattern. This slight inconsistency is minor and still readable.
With 5 tools, the server is well-scoped for verification tasks: system health check, tool discovery, aggregated quality retrieval, and two outcome recording methods. Each tool has a clear purpose without redundancy.
The tool set covers core verification workflows: recording outcomes and retrieving aggregated quality. However, lacking a tool to retrieve individual verification records or list services creates minor gaps that agents might work around.
Available Tools
5 toolsget_service_qualityAInspect
Get aggregated quality scores for a service based on all past verifications. Returns average completeness, accuracy, pass rate, format compliance rate, SLA compliance rate, and quality trend.
| Name | Required | Description | Default |
|---|---|---|---|
| service_id | Yes | Service identifier to query (e.g. 'harvey-tools/scrape_url') |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations are empty, so the description carries full burden. However, it only states what is returned, not behavioral traits like whether it's read-only, idempotent, or requires specific permissions. No side effects or rate limits are disclosed.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, no redundant wording. Every part adds value: action, resource, and return values.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With no output schema, the description adequately lists return metrics. However, it lacks behavioral context (e.g., if it's read-only). For a simple one-parameter tool, it is mostly complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% for the single parameter, and the description does not add significant additional meaning beyond what the schema provides (e.g., example values). Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action ('Get aggregated quality scores') and the resource ('for a service based on all past verifications'), and lists the specific metrics returned. It distinguishes itself from siblings like 'report_outcome' and 'verify_outcome' by focusing on quality scores aggregation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage when needing quality scores, but provides no explicit guidance on when to use this tool versus alternatives (e.g., when to use 'verify_outcome' instead). No exclusions or prerequisites are mentioned.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
healthAInspect
Check Harvey Verify server status, uptime, and payment network configuration.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. 'Check' implies read-only, but does not explicitly state non-destructive behavior, authentication needs, or response format. Adequate for a simple health check but lacks explicit safety disclosures.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence conveying core purpose and scope. Front-loaded with verb and resource, no unnecessary words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a zero-parameter health check without output schema, the description provides sufficient context: what aspects are checked. Missing details like response type (e.g., JSON object with fields) but acceptable given simplicity.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
No parameters, so baseline 4. Description adds meaning by specifying what the tool checks (status, uptime, payment configuration), which effectively informs the agent of the tool's scope without needing parameter details.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description states a specific verb 'Check' and resource 'Harvey Verify server', listing specific aspects: status, uptime, payment network configuration. It clearly distinguishes from sibling tools like get_service_quality or verify_outcome.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No guidance on when to use this tool vs alternatives. Does not mention prerequisites, such as connection requirements, or when it is appropriate to call before other operations.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_toolsAInspect
List all available Harvey Verify tools with pricing and input requirements. Use this for discovery.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of transparency. It reveals that the tool returns a list of tools with pricing and input requirements, but does not mention behavioral traits such as idempotency, authentication needs, or response format. However, for a read-only listing tool, this level of disclosure is minimally adequate.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description consists of two concise, informative sentences. The first sentence states the tool's action and scope, and the second provides usage guidance. No unnecessary words or redundancy.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
The description sufficiently explains what the tool returns (list of all tools with pricing and input requirements), which is adequate given no output schema. It could be slightly more specific about the format (e.g., JSON), but overall it is complete enough for an agent to understand the tool's value.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has no parameters, so schema description coverage is 100%. The description does not need to add parameter semantics, and a baseline score of 4 is appropriate given there is nothing to add beyond what the schema already conveys.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the action (list), resource (all available Harvey Verify tools), and what it includes (pricing and input requirements). It also adds 'Use this for discovery,' which helps distinguish it from sibling tools like get_service_quality, health, report_outcome, and verify_outcome, which serve different purposes.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description advises using this tool 'for discovery,' implying it is the starting point for tool exploration. While it does not explicitly exclude other use cases or mention alternatives, the sibling tools are contextually distinct enough that an agent can infer when to use this tool versus others.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
report_outcomeAInspect
Record a simple pass/fail outcome report for a service call. No LLM analysis - just logs the result to the quality database. Cheaper alternative to verify_outcome when you only need to record success/failure.
| Name | Required | Description | Default |
|---|---|---|---|
| notes | No | Optional notes about the outcome | |
| service_id | Yes | Service identifier (e.g. 'harvey-tools/scrape_url') | |
| was_successful | Yes | Whether the service call succeeded | |
| response_time_ms | No | Response time in milliseconds |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations provided, so description carries full burden. It explains the tool is simple, logs to database, no analysis, and cheap. Could add more detail on idempotency or side effects, but sufficient for safe selection.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three sentences, no wasted words. First sentence immediately states purpose, second adds key differentiator, third gives cost context. Efficient and front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the 4 parameters (all described in schema), no output schema, and no annotations, the description covers purpose, usage guidance, and cost trade-off. Complete for an agent to decide when to call this tool.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 100% with descriptions for all parameters. The description adds no extra meaning beyond repeating schema info; baseline of 3 is appropriate since the schema already explains parameters.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Record' and the resource 'simple pass/fail outcome report for a service call' and distinguishes from the sibling 'verify_outcome' by noting it's a 'cheaper alternative' with 'no LLM analysis'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explicitly states when to use: 'Cheaper alternative to verify_outcome when you only need to record success/failure', implying not to use when more detailed analysis is needed.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
verify_outcomeAInspect
Post-transaction verification using LLM-as-judge. Checks if a service delivered what was promised. Returns completeness score, accuracy score, format compliance, SLA adherence, issues list, and overall pass/fail.
| Name | Required | Description | Default |
|---|---|---|---|
| service_id | No | Service identifier for quality tracking (e.g. 'harvey-tools/scrape_url') | |
| response_data | Yes | The actual response/output received from the service | |
| expected_schema | No | Expected output format or JSON schema | |
| sla_requirements | No | SLA requirements to check against (e.g. 'response under 5s, must include all fields') | |
| request_description | Yes | What was requested from the service |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations are absent, so the description must disclose behavioral traits. It explains the tool uses an LLM-judge and returns verification scores, but does not clarify if the operation is read-only, has side effects, external calls, or cost implications. Adequate but not comprehensive.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two sentences, immediately front-loading the purpose and output. Every word adds value with no redundancy or fluff.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a tool with 5 parameters and no output schema, the description summarizes the key output elements (completeness score, accuracy, etc.) and the verification approach. It could benefit from more detail on scoring criteria or edge cases, but is largely complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Input schema coverage is 100% with descriptions for all 5 parameters. The tool description adds no additional parameter-level information beyond the schema, so it meets the baseline but does not enhance understanding.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose as 'post-transaction verification using LLM-as-judge' and lists the specific checks (completeness, accuracy, format compliance, SLA adherence) and outputs (issues list, pass/fail). This distinguishes it from sibling tools like get_service_quality or report_outcome.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage after a service transaction to verify outcome, but does not explicitly state when to use versus alternatives, nor does it provide exclusions or prerequisites. Context signals and sibling names help but the description itself lacks direct guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!