Bench Agent Discovery
Server Details
Discover public AI agents, reusable recipes, and trusted benchmark evidence by task.
- Status
- Healthy
- Last Tested
- Transport
- Streamable HTTP
- URL
Glama MCP Gateway
Connect through Glama MCP Gateway for full control over tool access and complete visibility into every call.
Full call logging
Every tool call is logged with complete inputs and outputs, so you can debug issues and audit what your agents are doing.
Tool access control
Enable or disable individual tools per connector, so you decide what your agents can and cannot do.
Managed credentials
Glama handles OAuth flows, token storage, and automatic rotation, so credentials never expire on your clients.
Usage analytics
See which tools your agents call, how often, and when, so you can understand usage patterns and catch anomalies.
Tool Definition Quality
Average 4.1/5 across 3 of 3 tools scored.
Each tool has a clearly distinct purpose: get_agent retrieves details of a specific agent, list_benchmarks returns benchmark contracts and submissions, and search_agents finds agents by various criteria. No overlap or ambiguity.
All tool names follow a consistent verb_noun pattern (get_agent, list_benchmarks, search_agents) using snake_case, making the interface predictable.
Three tools is slightly low but appropriate for a read-only discovery server, covering core operations (get, list, search). Could potentially benefit from an additional tool for benchmark details, but the count is reasonable.
The tool set covers the main discovery workflows: retrieving an agent, searching agents, and listing benchmarks. A minor gap is the lack of a dedicated tool to get detailed benchmark information, but agents can still work around this.
Available Tools
3 toolsget_agentGet a public agentARead-onlyIdempotentInspect
Get one public agent's recipe, public capability manifest, coarse invocation status, owner telemetry, and verified benchmark submissions.
| Name | Required | Description | Default |
|---|---|---|---|
| handle | Yes | Bench handle in @owner/agent-slug form. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already indicate readOnlyHint=true, idempotentHint=true, and destructiveHint=false, so the safety profile is covered. The description adds valuable behavioral context by enumerating the returned data fields (recipe, capability manifest, coarse invocation status, owner telemetry, verified benchmark submissions), which goes beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is a single sentence that efficiently lists the returned data. No redundant words or filler. Perfectly concise for a simple GET endpoint.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool has only one parameter and no output schema, the description provides a reasonable enumeration of response components. It lacks mention of error conditions or availability constraints, but for a read-only single-agent retrieval, it is largely complete.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 100% (the only parameter 'handle' has a description: 'Bench handle in @owner/agent-slug form.'). The tool description does not add extra meaning beyond the schema; it only reiterates that the agent is public. Baseline 3 is appropriate.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb 'Get' and the resource 'one public agent', listing specific data components (recipe, capability manifest, invocation status, telemetry, benchmark submissions). This distinguishes it from sibling tools list_benchmarks and search_agents, which target different resources or operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description does not provide explicit guidance on when to use this tool versus its siblings. There is no mention of when not to use it (e.g., for searching across agents) or alternatives. The context is implied but not spelled out.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
list_benchmarksList verified benchmark contractsARead-onlyIdempotentInspect
List public, versioned benchmark contracts and only their trusted-runner-verified submissions.
| Name | Required | Description | Default |
|---|---|---|---|
No parameters | |||
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations already provide readOnlyHint, idempotentHint, destructiveHint. Description adds that only 'public, versioned' and 'trusted-runner-verified submissions' are listed, offering valuable context beyond annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Single sentence with no wasted words. Includes all necessary qualifiers (public, versioned, verified).
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a parameterless list tool, description fully covers what is listed and the filtering criteria. No output schema is needed as the purpose is clear.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
No parameters are needed, so baseline is 4. The description doesn't need to explain parameters, and it appropriately avoids doing so.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
Description clearly states 'List public, versioned benchmark contracts' with verb 'list' and specific resource. Specifies filtering to 'trusted-runner-verified submissions', distinguishing it from sibling tools about agents.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
No explicit when-to-use or alternatives, but context suggests it's for listing benchmarks. The simplicity of the tool (no parameters) reduces the need for extensive guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
search_agentsSearch public AI agentsARead-onlyIdempotentInspect
Find listed public agents by task, capability, category, framework, model, verified evidence, or reuse configuration. Owner telemetry and controlled benchmark evidence are returned separately.
| Name | Required | Description | Default |
|---|---|---|---|
| sort | No | verified | |
| limit | No | ||
| model | No | ||
| query | No | Task or capability to search for, such as grounded research or code review. | |
| license | No | Exact SPDX-style license id from the agent's manifest provenance, such as MIT or Apache-2.0. | |
| category | No | ||
| reusable | No | True returns agents whose owners configured an invocation policy and capability manifest. | |
| verified | No | True returns agents with at least one trusted-runner-verified benchmark submission. | |
| framework | No | ||
| liveCallable | No | True returns agents with a reusable invocation policy and an owner-verified, currently reachable endpoint. | |
| maxP50LatencyMs | No | Upper bound on the agent's observed p50 latency in milliseconds. | |
| maxCostPerRunUsd | No | Upper bound on lifetime total_cost_usd / total_runs, i.e. average observed cost per run. |
Tool Definition Quality
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Annotations provide readOnlyHint, idempotentHint, destructiveHint. The description adds value by stating that owner telemetry and controlled benchmark evidence are returned separately, and that only 'public' agents are searchable. No contradiction with annotations.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Is the description appropriately sized, front-loaded, and free of redundancy?
Two sentences, efficiently structured. First sentence enumerates searchable attributes, second clarifies return behavior. No unnecessary words. Front-loaded with key information.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
With 12 parameters and no output schema, the description is brief. It does not explain sorting, default behavior (no filters), or pagination. The output format is hinted but not detailed. More could be said about the result structure given the complexity.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 58%. The description lists types of search criteria but does not explain all parameters (e.g., sort, limit). It adds some semantic grouping but not per-parameter details. Baseline 3 because coverage is moderate and description partially compensates.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the verb ('Find'), resource ('public agents'), and specifies multiple criteria (task, capability, category, framework, model, verified evidence, reuse configuration). It distinguishes from sibling tools: 'get_agent' retrieves a single agent, 'list_benchmarks' lists benchmarks.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage when you want to search for agents by various filters. It mentions that owner telemetry and benchmark evidence are returned separately, giving context. However, it does not explicitly exclude cases when to use alternatives (e.g., if you know the agent ID, use get_agent). No explicit when-not guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Claim this connector by publishing a /.well-known/glama.json file on your server's domain with the following structure:
{
"$schema": "https://glama.ai/mcp/schemas/connector.json",
"maintainers": [{ "email": "your-email@example.com" }]
}The email address must match the email associated with your Glama account. Once published, Glama will automatically detect and verify the file within a few minutes.
Control your server's listing on Glama, including description and metadata
Access analytics and receive server usage reports
Get monitoring and health status updates for your server
Feature your server to boost visibility and reach more users
For users:
Full audit trail – every tool call is logged with inputs and outputs for compliance and debugging
Granular tool control – enable or disable individual tools per connector to limit what your AI agents can do
Centralized credential management – store and rotate API keys and OAuth tokens in one place
Change alerts – get notified when a connector changes its schema, adds or removes tools, or updates tool definitions, so nothing breaks silently
For server owners:
Proven adoption – public usage metrics on your listing show real-world traction and build trust with prospective users
Tool-level analytics – see which tools are being used most, helping you prioritize development and documentation
Direct user feedback – users can report issues and suggest improvements through the listing, giving you a channel you would not have otherwise
The connector status is unhealthy when Glama is unable to successfully connect to the server. This can happen for several reasons:
The server is experiencing an outage
The URL of the server is wrong
Credentials required to access the server are missing or invalid
If you are the owner of this MCP connector and would like to make modifications to the listing, including providing test credentials for accessing the server, please contact support@glama.ai.
Discussions
No comments yet. Be the first to start the discussion!