Skip to main content
Glama

Server Quality Checklist

67%
Profile completionA complete profile improves this server's visibility in search results.
  • Disambiguation3/5

    The two review tools (review_output and review_dual) have overlapping purposes—both perform adversarial reviews of AI outputs, differing only in reviewer count. While descriptions clarify the high-stakes distinction, agents may struggle to determine which strictness level applies. The marketplace tools are clearly distinct from each other and from the review utilities.

    Naming Consistency3/5

    Three tools follow a verb_noun pattern (execute_service, list_services, review_output), but review_dual breaks convention by using an adjective descriptor (dual) rather than specifying the target object. This creates inconsistency: review_output indicates what is reviewed, while review_dual indicates how, making the relationship between the two unclear from names alone.

    Tool Count3/5

    Four tools is borderline thin for the apparent scope, particularly as the server mixes two distinct domains: AgentDesk marketplace operations and general AI output review utilities. With only two tools per domain, the set feels like an incomplete surface or an awkward combination of separate concerns rather than a cohesive toolkit.

    Completeness2/5

    Major gaps exist in the marketplace workflow: no tool to retrieve a specific service by ID (only bulk listing), no execution status tracking, cancellation capability, or result retrieval after initiating execute_service. The review side offers only adversarial methodologies without alternatives, and the connection between marketplace services and review tools is undefined.

  • Average 3.7/5 across 4 of 4 tools scored. Lowest: 2.7/5.

    See the tool scores section below for per-tool breakdowns.

  • This repository includes a README.md file.

  • This repository includes a LICENSE file.

  • Latest release: v1.3.0

  • No tool usage detected in the last 30 days. Usage tracking helps demonstrate server value.

    Tip: use the "Try in Browser" feature on the server page to seed initial usage.

  • Add a glama.json file to provide metadata about your server.

  • This server provides 4 tools. View schema
  • No known security issues or vulnerabilities reported.

    Report a security issue

  • This server has been verified by its author.

  • Add related servers to improve discoverability.

Tool Scores

  • Behavior2/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries full burden for behavioral disclosure but fails to indicate whether execution is destructive, idempotent, or what return format to expect. It mentions authentication requirements but contains inaccuracies (claiming AgentDesk key when the schema specifies Anthropic, and implying it is required when the schema marks it optional).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness4/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    The description is appropriately brief at three sentences, front-loading the core purpose in the first sentence. The final sentence ('Pass service-specific input parameters') is slightly redundant given the schema already documents this, but overall structure is efficient.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness2/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given the lack of annotations and output schema, the description should disclose safety characteristics (destructive vs. read-only) and return value expectations for a marketplace execution tool. It omits both, and the contradiction regarding authentication requirements further undermines completeness.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters2/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Although the schema has 100% description coverage (baseline 3), the description introduces conflicting semantics for the 'api_key' parameter—describing it as an 'AgentDesk API key' when the schema specifies 'Anthropic API key', and implying it is required when it is optional per the required array. This creates confusion rather than additive value.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly states the core action ('Execute a service') and the domain ('AgentDesk marketplace'), providing specific verb and resource identification. However, it does not explicitly differentiate when to use this generic execution tool versus sibling-specific tools like 'review_dual' or 'review_output'.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines2/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    While the description mentions an authentication requirement ('Requires an AgentDesk API key'), it provides no guidance on when to use this tool versus the specialized review siblings, nor does it clarify prerequisites beyond the API key. The authentication claim also contradicts the schema, which lists the API key as optional.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior3/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries the full burden. It discloses that the tool returns a catalog with 'pricing, quality scores, and capabilities,' but omits behavioral details like pagination, result limits, or whether all parameters are optional (which they are).

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Three efficiently structured sentences: action ('List all available services...'), return value ('Returns service catalog...'), and parameters ('Filter by...'). Every sentence earns its place with zero redundancy.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    For a catalog listing tool with 100% schema coverage, the description adequately covers the return data structure (pricing, quality scores, capabilities). It could be improved by noting that filtering is optional or mentioning pagination behavior.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema coverage is 100%, so the baseline is 3. The description consolidates the four filter parameters ('Filter by category, minimum quality score, maximum price, or capability') but adds no semantic meaning beyond what the schema already provides for each individual field.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description uses a specific verb ('List') with a clear resource ('services on the AgentDesk marketplace'), clearly distinguishing it from sibling tools like 'execute_service' (which runs services) and 'review_dual/review_output' (which review outputs).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    While the filtering capabilities imply browsing behavior, the description lacks explicit guidance on when to use this versus 'execute_service' (e.g., 'Use this to find service IDs before execution'). Usage is implied by the verb 'List' but not explicitly stated.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior4/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    With no annotations provided, the description carries full disclosure burden. It successfully explains the adversarial methodology ('actively looks for problems') and detailed return structure (verdict types, scoring range, categorized issues). Could improve by explicitly stating this is a read-only analysis operation.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Four tightly constructed sentences with zero redundancy. Front-loaded with core purpose ('Adversarial quality review'), followed by methodology, return format specification, and applicability examples. Every sentence earns its place.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Despite lacking an output schema, the description comprehensively details return values (verdict, score, issues, checklist). With 100% parameter coverage and clear behavioral description, it provides sufficient context for a 4-parameter tool, though explicit safety declarations would make it complete.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100%, so the structured documentation already defines all four parameters (output, criteria, review_type, model). The description adds context that 'Works for any output type' relates to the review_type parameter, but largely relies on the schema for parameter semantics, meeting the baseline for high-coverage schemas.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose4/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    The description clearly defines the tool as performing 'Adversarial quality review of any AI-generated output' with specific methodology (independent reviewer assumes mistakes). It effectively distinguishes from execute_service and list_services by scope, though it could explicitly contrast with sibling review_dual regarding single vs. dual reviewer approaches.

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines3/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Provides implied usage through examples ('Works for any output type: code, content, summaries...'), helping agents understand applicability. However, lacks explicit guidance on when to use this versus review_dual, or prerequisites like output length limits (despite 100K max in schema).

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

  • Behavior4/5

    Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

    No annotations provided, so description carries full burden. Discloses key behavioral traits: dual adversarial assessment, independent angle review, merge agent arbitration, and strict OR-logic failure condition. Missing operational details like timeout behavior, cost implications (2x inference), or exact return structure, but covers the essential business logic.

    Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

    Conciseness5/5

    Is the description appropriately sized, front-loaded, and free of redundancy?

    Three sentences total with zero waste. First sentence defines mechanism, second explains behavioral logic/comparison, third gives usage context. Front-loaded with the core concept 'Dual adversarial review'. Every clause earns its place.

    Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

    Completeness4/5

    Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

    Given 100% schema coverage and moderate complexity, description adequately explains the review process and verdict logic despite lacking output schema. Mentions 'merged verdict' indicating return value nature, though exact output structure could be clearer. Sufficient for correct agent invocation.

    Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

    Parameters3/5

    Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

    Schema description coverage is 100% (output, criteria, review_type, model all documented). Description mentions 'output' generally but does not add syntax details, validation rules, or examples beyond what the schema already provides. Baseline score appropriate given schema completeness.

    Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

    Purpose5/5

    Does the description clearly state what the tool does and how it differs from similar tools?

    Description opens with specific mechanism 'two independent reviewers assess... then a merge agent combines' and explicitly contrasts with sibling tool via 'Stricter than single review', clearly differentiating from review_output. Uses precise verbs (assess, combine, merge) and identifies the resource (output/findings).

    Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

    Usage Guidelines5/5

    Does the description explain when to use this tool, when not to, or what alternatives exist?

    Explicitly states when to use ('Use for high-stakes outputs where quality is critical') and defines the critical failure condition ('if either reviewer finds a critical issue, the merged verdict is FAIL'), providing clear decision criteria for agent selection over single review alternatives.

    Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

GitHub Badge

Glama performs regular codebase and documentation scans to:

  • Confirm that the MCP server is working as expected.
  • Confirm that there are no obvious security issues.
  • Evaluate tool definition quality.

Our badge communicates server capabilities, safety, and installation instructions.

Card Badge

agentdesk-mcp MCP server

Copy to your README.md:

Score Badge

agentdesk-mcp MCP server

Copy to your README.md:

How to claim the server?

If you are the author of the server, you simply need to authenticate using GitHub.

However, if the MCP server belongs to an organization, you need to first add glama.json to the root of your repository.

{
  "$schema": "https://glama.ai/mcp/schemas/server.json",
  "maintainers": [
    "your-github-username"
  ]
}

Then, authenticate using GitHub.

Browse examples.

How to make a release?

A "release" on Glama is not the same as a GitHub release. To create a Glama release:

  1. Claim the server if you haven't already.
  2. Go to the Dockerfile admin page, configure the build spec, and click Deploy.
  3. Once the build test succeeds, click Make Release, enter a version, and publish.

This process allows Glama to run security checks on your server and enables users to deploy it.

How to add a LICENSE?

Please follow the instructions in the GitHub documentation.

Once GitHub recognizes the license, the system will automatically detect it within a few hours.

If the license does not appear on the server after some time, you can manually trigger a new scan using the MCP server admin interface.

How to sync the server with GitHub?

Servers are automatically synced at least once per day, but you can also sync manually at any time to instantly update the server profile.

To manually sync the server, click the "Sync Server" button in the MCP server admin interface.

How is the quality score calculated?

The overall quality score combines two components: Tool Definition Quality (70%) and Server Coherence (30%).

Tool Definition Quality measures how well each tool describes itself to AI agents. Every tool is scored 1–5 across six dimensions: Purpose Clarity (25%), Usage Guidelines (20%), Behavioral Transparency (20%), Parameter Semantics (15%), Conciseness & Structure (10%), and Contextual Completeness (10%). The server-level definition quality score is calculated as 60% mean TDQS + 40% minimum TDQS, so a single poorly described tool pulls the score down.

Server Coherence evaluates how well the tools work together as a set, scoring four dimensions equally: Disambiguation (can agents tell tools apart?), Naming Consistency, Tool Count Appropriateness, and Completeness (are there gaps in the tool surface?).

Tiers are derived from the overall score: A (≥3.5), B (≥3.0), C (≥2.0), D (≥1.0), F (<1.0). B and above is considered passing.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Rih0z/agentdesk-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server