Hive A/B Test
Server Details
A/B experiment runner for the A2A network with x402 micropayments on Base USDC.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
- Repository
- srotzin/hive-mcp-abtest
- GitHub Stars
- 0
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4/5 across 3 of 3 tools scored.
Each tool targets a distinct action: assigning variants, recording conversions, and retrieving results. There is no overlap in purpose; an agent can easily distinguish them.
All tool names follow a consistent `abtest_verb` pattern using snake_case: `assign`, `record_conversion`, `results`. This makes the set predictable and easy to navigate.
Three tools cover the core A/B test workflow (assignment, conversion tracking, results). This is slightly minimal but appropriate for a focused service; a tool for managing experiments might be expected but not strictly necessary.
The surface covers the essential loop of assigning, converting, and analyzing results. However, it lacks tools for creating or managing experiments, which could cause agents to depend on external setup. This is a notable gap.
Available Tools
3 toolsabtest_assignAInspect
Assign an agent DID to a variant for an experiment. Bucket is deterministic via SHA-256(experiment_id, agent_did) modulo total weight, sticky across calls. $0.001/assignment via x402.
| Name | Required | Description | Default |
|---|---|---|---|
| variants | No | Optional. Required only on first call when the experiment has not been registered. Items: { id, weight? }. | |
| agent_did | Yes | DID or any opaque agent identifier. | |
| experiment_id | Yes | Stable experiment identifier. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Discloses deterministic hashing, stickiness across calls, cost, and variant optionality on first call. Lacks details on error handling or state changes, but for a simple assignment tool, the provided context is adequate.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Three concise sentences cover purpose, algorithm, behavior, and cost. Could be slightly more structured, but no unnecessary words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema, the description would benefit from mentioning what is returned (e.g., assigned variant ID). Otherwise, it adequately covers input semantics and behavior.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema already covers all parameters with descriptions. Description adds value by clarifying that 'variants' is optional only on first call and explaining the hashing algorithm, which goes beyond schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states 'Assign an agent DID to a variant for an experiment', using a specific verb and resource. It adds algorithmic detail (SHA-256) and distinguishes from sibling tools abtest_record_conversion and abtest_results by its core function.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
Explains when the tool is used (first call with optional variants), the deterministic and sticky behavior, and cost. Does not explicitly exclude scenarios or compare to siblings, but context is sufficient.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
abtest_record_conversionAInspect
Record a conversion event for an agent that was previously assigned. Variant is read from the assignment, not supplied by caller. $0.005/event via x402.
| Name | Required | Description | Default |
|---|---|---|---|
| value | No | Optional numeric value. Default 1. | |
| metric | No | Optional metric label. Default 'conversion'. | |
| agent_did | Yes | ||
| experiment_id | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations, the description discloses that the variant is read from the assignment and includes cost, but does not clarify behavior for missing assignments or idempotency, leaving some behavioral gaps.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is two concise sentences, front-loaded with the purpose and key behavioral detail, with no wasted words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
While the description covers the main concept, it omits important details like error conditions, side effects, and whether multiple conversions per agent are allowed, making it incomplete for a 4-parameter tool with no output schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema coverage is 50% (metric and value have descriptions), but the description does not add meaning beyond the schema; it fails to explain the role of experiment_id and agent_did or how they relate to the conversion.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states that it records a conversion event for a previously assigned agent, and specifies that the variant is read from the assignment, which distinguishes it from siblings 'abtest_assign' and 'abtest_results'.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies it should be used after assignment by stating 'for an agent that was previously assigned' and mentions cost, but does not explicitly state when not to use it or provide alternatives.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
abtest_resultsAInspect
Return per-variant samples, conversions, conversion rate, and a two-proportion Z-test on the first two variants. Free, read-only.
| Name | Required | Description | Default |
|---|---|---|---|
| metric | No | Optional metric label. Default 'conversion'. | |
| experiment_id | Yes |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries full burden. It discloses that the tool is free and read-only, and that the Z-test only covers the first two variants. This is helpful, but it does not disclose potential rate limits, error handling, or behavior with more than two variants.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single concise sentence plus the annotation-like phrase 'Free, read-only.' It is front-loaded with the core functionality and avoids unnecessary words.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no output schema and no annotations, the description explains what the tool returns and its safety profile. However, it lacks detail on error conditions, pagination, or handling of edge cases like experiments with more than two variants.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 50% (only 'metric' has a description). The tool description does not add meaning to the parameters beyond what is in the schema, failing to compensate for the undocumented 'experiment_id' parameter.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description explicitly states it returns per-variant samples, conversions, conversion rate, and a two-proportion Z-test on the first two variants. It clearly distinguishes itself from sibling tools (abtest_assign and abtest_record_conversion) by being the read-only results retrieval tool.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage for retrieving A/B test results and mentions 'Free, read-only' indicating safe use. However, it does not explicitly state when not to use it or provide alternatives beyond the sibling names.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!