mcp-server-circleci

Official

by CircleCI-Public

Overview Schema Related Servers Score Discussions

TypeScript

Remote

Server Quality Checklist

Profile completionA complete profile improves this server's visibility in search results.

Latest release: v1.0.0
Disambiguation3/5
Most tools have distinct purposes (e.g., config_helper vs. find_flaky_tests), but there is notable overlap between run_pipeline, run_evaluation_tests, and rerun_workflow, which all involve triggering or re-running CI workflows. Additionally, analyze_diff seems unrelated to CircleCI's core domain, creating confusion about the server's scope. Descriptions help clarify, but the overlap and outlier reduce clarity.
Naming Consistency4/5
The majority of tools follow a consistent verb_noun or verb_adjective_noun pattern (e.g., list_followed_projects, get_build_failure_logs, find_underused_resource_classes). However, there are minor deviations like config_helper (noun_verb) and analyze_diff (verb_noun without underscore), which slightly break the pattern but do not severely impact readability.
Tool Count3/5
With 16 tools, the count is borderline high for a CI-focused server, especially given the inclusion of prompt-related tools (create_prompt_template, recommend_prompt_template_tests) that seem out of scope. This makes the set feel somewhat bloated and less focused on core CircleCI operations, though it's not extreme.
Completeness4/5
For CircleCI operations, the surface covers key areas like project listing, pipeline management, debugging, and usage analysis, with good lifecycle coverage (e.g., run_pipeline, rerun_workflow, get_build_failure_logs). However, gaps exist in areas like user management or detailed configuration editing, and the prompt tools are tangential, slightly reducing coherence for the main domain.
Average 4.4/5 across 16 of 16 tools scored. Lowest: 3.3/5.
See the Tool Scores section below for per-tool breakdowns.
- 1 of 10 issues responded to in the last 6 months
- 10 commits in the last 12 weeks
- No stable releases found
- No critical vulnerability alerts
- No high-severity vulnerability alerts
- No code scanning findings
- CI is passing
This repository is licensed under Apache 2.0.
This repository includes a README.md file.
No tool usage detected in the last 30 days. Usage tracking helps demonstrate server value.
Tip: use the "Try in Browser" feature on the server page to seed initial usage.
Add a glama.json file to provide metadata about your server.
If you are the author, simply .
If the server belongs to an organization, first add glama.json to the root of your repository:
```
{
  "$schema": "https://glama.ai/mcp/schemas/server.json",
  "maintainers": [
    "your-github-username"
  ]
}
```
Then . Browse examples.
Add related servers to improve discoverability.

How to sync the server with GitHub?

Servers are automatically synced at least once per day, but you can also sync manually at any time to instantly update the server profile.

To manually sync the server, click the "Sync Server" button in the MCP server admin interface.

How is the quality score calculated?

The overall quality score combines two components: Tool Definition Quality (70%) and Server Coherence (30%).

Tool Definition Quality measures how well each tool describes itself to AI agents. Every tool is scored 1–5 across six dimensions: Purpose Clarity (25%), Usage Guidelines (20%), Behavioral Transparency (20%), Parameter Semantics (15%), Conciseness & Structure (10%), and Contextual Completeness (10%). The server-level definition quality score is calculated as 60% mean TDQS + 40% minimum TDQS, so a single poorly described tool pulls the score down.

Server Coherence evaluates how well the tools work together as a set, scoring four dimensions equally: Disambiguation (can agents tell tools apart?), Naming Consistency, Tool Count Appropriateness, and Completeness (are there gaps in the tool surface?).

Tiers are derived from the overall score: A (≥3.5), B (≥3.0), C (≥2.0), D (≥1.0), F (<1.0). B and above is considered passing.

Tool Scores

Behavior2/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It describes the tool's function and output format ('structured array of test cases'), but lacks details on permissions, rate limits, side effects, or error handling. For a tool with no annotations, this leaves significant gaps in understanding its operational behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured with clear sections (About, Parameters, Example usage, Tool output instructions) and front-loaded key information. It's appropriately sized for the tool's complexity, though some sentences could be more concise (e.g., the example usage is detailed but necessary). Overall, it's efficient with minimal waste.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's moderate complexity (1 parameter with nested objects), no annotations, and no output schema, the description provides a good foundation but has gaps. It explains the purpose, parameters, and output format, but lacks details on behavioral traits, error cases, and doesn't fully cover all schema parameters (e.g., temperature). It's adequate but not fully complete for safe and effective use.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The description provides a 'Parameters' section that lists and briefly explains each parameter (promptTemplate, contextSchema, promptOrigin, model), adding meaning beyond the input schema. Since schema description coverage is 0%, the description compensates well by documenting the parameters, though it doesn't cover all schema parameters (e.g., temperature is omitted). The value added is substantial but not complete.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'generates an array of recommended tests for a given prompt template.' It specifies the verb ('generates') and resource ('recommended tests'), though it doesn't explicitly differentiate from sibling tools like 'run_evaluation_tests' or 'find_flaky_tests' which might also involve testing. The purpose is clear but lacks sibling differentiation.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context by mentioning it's 'part of a toolchain that generates and provides test cases for a prompt template,' suggesting it should be used in a testing workflow. However, it doesn't explicitly state when to use this tool versus alternatives like 'run_evaluation_tests' or provide clear exclusions. The guidance is implied but not explicit.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden. It discloses key behavioral traits: the tool analyzes, validates, and fixes configs; it returns errors and original config if invalid, and does nothing if valid. However, it misses details like rate limits, authentication needs, or whether 'fix' is automated or suggested, which are important for a mutation tool.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness3/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is front-loaded with the purpose but includes redundant sections like 'Parameters:' that repeat schema info. The example and notes are helpful but could be more streamlined. Overall, it's adequately sized but has some inefficiencies in structure.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no annotations, no output schema, and 1 parameter with 0% schema coverage, the description provides basic completeness: purpose, param details, and output behavior. However, for a tool that 'fixes' configs (implying mutation), it lacks critical context like side effects, error handling specifics, or return format details, making it minimally adequate.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The schema description coverage is 0%, so the description must compensate. It adds significant meaning beyond the schema: it explains that 'configFile' is the raw YAML content as a string, not a file path, and provides an example with formatting notes. This clarifies usage effectively, though it could detail YAML structure or constraints more.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'analyze and validate and fix CircleCI configuration files.' It specifies the verb ('analyze, validate, fix') and resource ('CircleCI configuration files'), making the function unambiguous. However, it doesn't explicitly differentiate from sibling tools like 'run_pipeline' or 'rerun_workflow', which might involve config validation indirectly.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines3/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description implies usage context through the example and output instructions, suggesting this tool is for validating configs before execution. However, it lacks explicit guidance on when to use this versus alternatives (e.g., 'run_pipeline' might handle validation internally) or any prerequisites, leaving some ambiguity for the agent.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
No annotations are provided, so the description carries the full burden of behavioral disclosure. It describes the analysis process and default behavior for diff selection, but lacks details on permissions, rate limits, error handling, or output format beyond 'a list of rule violations.' For a tool with no annotations, this leaves gaps in understanding its operational behavior.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured and appropriately sized. It starts with the core purpose, adds usage context, then details parameters and returns. Each sentence adds value, with no redundant information. A minor deduction because the parameter explanations could be slightly more concise, but overall it's efficient and front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (analyzing git diffs with configurable rules) and the absence of annotations and output schema, the description is adequate but incomplete. It covers the purpose, parameters, and basic return type, but lacks details on error cases, performance implications of speedMode, or examples of rule violation outputs. For a tool with no structured behavioral data, more context would be helpful.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The description adds significant meaning beyond the input schema. The schema has 0% description coverage (no parameter descriptions), but the tool description explains all four parameters (speedMode, filterBy, diff, rules) with practical context, including default behaviors and usage notes (e.g., 'Combine all rules from multiple files by separating them with ---'). This compensates well for the schema's lack of documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'analyze a git diff against IDE rules to identify rule violations.' It specifies the verb ('analyze'), resource ('git diff'), and scope ('against IDE rules'), distinguishing it from sibling tools like config_helper or run_pipeline which have unrelated functions. The description is specific and unambiguous.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use the tool: 'analyze a git diff (unstaged, staged, or all changes) against IDE rules.' It also specifies default behavior: 'By default, the tool will use the staged changes, unless the user explicitly asks for unstaged or all changes.' However, it does not mention when NOT to use this tool or explicitly name alternatives among siblings, which prevents a score of 5.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior3/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It explains the rerun behavior (from start or from failed) and the conditional logic when 'fromFailed' is omitted. However, it doesn't cover important aspects like authentication requirements, rate limits, error handling, or what happens to the original workflow. The description adds some behavioral context but leaves significant gaps.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured and efficiently organized. It starts with a clear purpose statement, lists common use cases, then presents input options in a logical format with bullet points. Every sentence serves a purpose - there's no wasted text. The information is front-loaded and easy to parse.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness3/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (mutating operation with conditional logic) and the lack of both annotations and output schema, the description should do more. While it covers parameters well, it doesn't explain what the tool returns, error conditions, or system behavior during execution. For a mutation tool with no structured safety information, this leaves important gaps for an AI agent.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The schema description coverage is 0%, so the description must fully compensate. It does this excellently by explaining the two input options (Workflow ID vs Workflow URL), the exclusive nature of these options ('EXACTLY ONE'), and the conditional behavior of the 'fromFailed' parameter. The description provides crucial semantic information that the schema lacks, including URL format examples and usage rules.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'rerun a workflow from start or from the failed job.' It specifies the verb ('rerun') and resource ('workflow'), but doesn't explicitly differentiate from sibling tools like 'run_pipeline' or 'run_rollback_pipeline' that might have overlapping functionality. The description is specific about what the tool does but lacks sibling comparison.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use the tool: 'Common use cases: - Rerun a workflow from a failed job - Rerun a workflow from start.' It gives practical scenarios but doesn't explicitly state when NOT to use it or mention alternatives among sibling tools. The guidance is helpful but could be more comprehensive regarding exclusions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes several behavioral traits: the tool's output may be truncated (with specific handling instructions), it requires exactly one of three parameter sets, and it has strict validation requirements for each parameter option. However, it doesn't mention authentication needs, rate limits, or what happens when flaky tests are found (beyond stating the agent should 'implement appropriate fixes').
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness3/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately front-loaded with the core purpose, but contains significant redundancy and instructional content that extends beyond tool description. The 'CRITICAL REQUIREMENTS' section includes agent instructions about output handling that belong in a different context. While well-structured with clear sections, it's verbose (over 400 words) with some sentences that don't directly describe the tool's behavior or parameters.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (multiple parameter patterns, no annotations, no output schema), the description provides substantial context about parameter usage, validation rules, and output handling. It adequately covers the tool's operational context despite the lack of structured metadata. However, it doesn't explain what format the flaky test information returns in or what specific data fields are available, which would be helpful given the absence of an output schema.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage (the schema provides only basic parameter names without meaningful descriptions), the description comprehensively compensates by explaining all four parameters in detail. It clarifies the three mutually exclusive usage patterns, provides specific format examples for each parameter, explains relationships between parameters (e.g., Option 3 requires BOTH workspaceRoot and gitRemoteURL), and gives practical guidance on parameter sourcing and validation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose4/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool 'retrieves information about flaky tests in a CircleCI project', providing a specific verb ('retrieves') and resource ('flaky tests'). It distinguishes from sibling tools like 'get_job_test_results' or 'get_build_failure_logs' by focusing specifically on flaky tests rather than general test results or failure logs. However, it doesn't explicitly contrast with these siblings in the description text.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides extensive, explicit guidance on when and how to use this tool through the 'CRITICAL REQUIREMENTS' and 'Input options' sections. It specifies three mutually exclusive parameter options with clear conditions ('EXACTLY ONE of these THREE options must be used'), includes prerequisites ('If using Option 1, make sure to extract the projectSlug exactly as provided by listFollowedProjects'), and gives explicit fallback instructions ('If none of the options can be fully satisfied, ask the user for the missing information before making the tool call').
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden and does well. It explains the tool's role in a toolchain, describes the transformation process (converting to structured format), specifies output components (template, contextSchema, promptOrigin), and mentions downstream usage with 'recommend_prompt_template_tests'. However, it doesn't address potential limitations like error conditions or processing constraints.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness3/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured with clear sections (ABOUT, WHEN, PARAMETERS, EXAMPLES, OUTPUT), but it's verbose with some redundancy. Sentences like 'This tool will return a structured prompt template...' could be more concise. While organized, it could be tightened without losing clarity.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given no annotations, 0% schema coverage, no output schema, and moderate complexity, the description does well. It explains the tool's purpose, usage, parameters, examples, and output format. However, it doesn't fully address error handling, validation rules, or what happens with malformed inputs, leaving some gaps for a tool with significant transformation responsibility.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, so the description must compensate. It provides a dedicated PARAMETERS section explaining each parameter's purpose, including the distinction between 'codebase' and 'requirements' origins, default values for model/temperature, and usage examples. This adds substantial value beyond the bare schema, though it doesn't fully explain all edge cases for parameter values.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'generate a prompt template based on feature requirements or pre-existing prompts.' It specifies the exact action (generate), resource (prompt template), and distinguishes between two distinct input scenarios. This is specific and unambiguous, with no sibling tools performing similar functions.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description explicitly states when to trigger the tool: for new AI application/feature requirements OR for pre-existing prompts from codebases. It also provides exclusion guidance: 'Similar files should NEVER be generated directly by the AI agent' and specifies to use this tool even when prompt files already exist. This gives clear when/when-not/alternative guidance.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden and adds valuable behavioral context. It discloses that the tool expects specific CSV columns (job_name, resource_class, etc.), works with a subset of CircleCI usage API output, and returns a summary report. It also notes that the CSV path must be absolute and provides a default threshold. However, it doesn't mention error handling, performance, or output format details, leaving some gaps.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness5/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately sized and front-loaded: it starts with the core purpose, then details parameters and CSV requirements. Each sentence adds value—none are redundant. It uses bullet-like formatting for parameters and clear explanations without waste, making it easy to scan and understand.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (analyzing CSV data with specific columns), no annotations, no output schema, and 0% schema coverage, the description is largely complete. It covers purpose, parameters, CSV expectations, and output type ('summary report'). However, it doesn't detail the report's structure or potential errors, which could help an agent use it more effectively.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The description adds significant meaning beyond the input schema, which has 0% description coverage. It explains that 'csvFilePath' must be an absolute path and requires resolution if relative, and that 'threshold' is a usage percentage with default 40. It also clarifies the CSV column expectations and how the tool processes them. This fully compensates for the schema's lack of descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Analyzes a CircleCI usage data CSV file to find jobs/resource classes with average or max CPU/RAM usage below a given threshold.' It specifies the verb ('analyzes'), resource ('CircleCI usage data CSV file'), and outcome ('find jobs/resource classes with usage below threshold'). It distinguishes from siblings by focusing on underused resource analysis rather than other CircleCI operations like downloading data or running pipelines.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines4/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides clear context for when to use this tool: to identify underused resource classes from CSV data. It implies usage with CircleCI usage API output. However, it does not explicitly state when not to use it or name alternatives among sibling tools (e.g., 'download_usage_api_data' for obtaining the CSV). The guidance is practical but lacks explicit exclusions or comparisons.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes critical behavioral traits: truncation handling with specific warning requirements, test result filtering logic, and strict parameter combination rules. However, it doesn't mention rate limits, authentication needs, or error handling, leaving some gaps for a tool with complex input requirements.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness3/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately front-loaded with purpose and priority use cases, but becomes verbose with repetitive sections like 'Get test metadata for...' listing and detailed parameter explanations that could be more streamlined. While all content is valuable, the structure could be more efficient given the length.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (multiple parameter options, no annotations, no output schema), the description does an excellent job covering input requirements and behavioral expectations. It explains truncation handling, filtering logic, and parameter combinations thoroughly. The main gap is lack of output format description, which would help agents interpret results.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Schema description coverage is 0%, so the description must fully compensate. It provides comprehensive parameter semantics: explains three distinct input options with their required combinations, clarifies parameter purposes beyond schema names, and offers practical examples. The description adds significant value by organizing parameters into logical groups and explaining their relationships.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose with specific verb+resource: 'retrieves test metadata for a CircleCI job.' It distinguishes from siblings by focusing on test results rather than pipeline status, build logs, or other CI aspects. The title is null, so the description fully carries the purpose definition.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool with 'PRIORITY USE CASE' section listing specific scenarios like 'are tests passing in CI?' and 'fix failed tests in CI.' It also offers alternatives within the tool via parameter options and distinguishes from sibling tools by its test-focused nature versus general pipeline status or build logs.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool's behavior by detailing three distinct input options with specific requirements (e.g., 'EXACTLY ONE of these THREE options must be used'), constraints like 'ALL THREE parameters must be provided' for Option 3, and error-handling guidance ('ask the user for the missing information'). It lacks details on rate limits or authentication needs, but covers operational constraints well.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness3/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured with clear sections (purpose, use cases, input options, workflow, requirements), but it is verbose. Sentences like 'It can be used to check pipeline status, get latest build status, or view current pipeline state' are redundant with the opening statement. The 'Common use cases' list repeats similar ideas (e.g., 'Check latest pipeline status' and 'View pipeline state'), reducing efficiency. However, the structure aids readability.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity (multiple input options, no annotations, no output schema), the description is largely complete. It thoroughly explains parameter usage, dependencies, and workflows. The main gap is the lack of information on return values (e.g., what status data is provided), which is significant since there's no output schema. Otherwise, it adequately covers the tool's operational context and constraints.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The schema description coverage is 0%, so the description must fully compensate. It adds significant meaning beyond the bare schema by explaining the three input options in detail, specifying exact parameter combinations (e.g., 'Option 1 - Project Slug and branch (BOTH required)'), providing examples for each parameter, and clarifying interdependencies and usage rules. This transforms the schema from a simple list into actionable guidance.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'retrieves the status of the latest pipeline for a CircleCI project.' It specifies the verb ('retrieves'), resource ('latest pipeline'), and scope ('CircleCI project'), distinguishing it from siblings like 'run_pipeline' (executes) or 'get_build_failure_logs' (focuses on logs).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool versus alternatives. It includes a 'Recommended Workflow' section directing users to first use 'listFollowedProjects' to obtain a projectSlug, and it lists 'Common use cases' like checking pipeline status or build progress. It also specifies when not to use it (e.g., 'Never call this tool with incomplete parameters').
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes key behaviors: it's a read-only listing operation (implied by 'lists'), discloses pagination limits ('If pagination limits are reached, the tool will indicate that not all projects could be displayed'), and specifies the return format ('Each entry includes the project name and its projectSlug'). However, it doesn't mention authentication requirements or rate limits, which would be helpful for a complete behavioral picture.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness3/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately front-loaded with the core purpose, but contains some redundancy (e.g., 'Returns' section repeats what's in the initial description, and the workflow section could be more concise). The 'IMPORTANT' warning about not automatically running tools is valuable but lengthy. Overall, it's comprehensive but could be more tightly structured.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness5/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's simplicity (0 parameters, no annotations, no output schema), the description provides complete contextual information. It explains what the tool does, when to use it, what it returns, workflow guidance, and important behavioral constraints. For a listing tool with no complex inputs or outputs, this description covers all necessary context.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters4/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The input schema has 0 parameters (empty object), so there are no parameters to document. The description appropriately doesn't discuss parameters, focusing instead on the tool's purpose and usage. With no parameters to cover, this exceeds the baseline expectation for parameter documentation.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('lists all projects that the user is following') and resource ('on CircleCI'), distinguishing it from siblings like 'run_pipeline' or 'get_latest_pipeline_status' which perform different operations. It goes beyond just restating the name by specifying the scope (user's followed projects).
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool ('Identify which CircleCI projects are available to the user', 'Select a project for subsequent operations', 'Obtain the projectSlug needed for other CircleCI tools') and includes a detailed workflow section. It also explicitly states when NOT to use it automatically ('Do not automatically run any additional tools after this tool is called'), addressing alternatives by requiring explicit user instruction for subsequent actions.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden and does well. It discloses key behaviors: generates temporary configuration files, may return a list of pipelines for selection, requires follow-up calls with pipelineChoiceName when multiple pipelines exist, and returns a URL for monitoring. It doesn't mention rate limits or authentication requirements, but covers most operational aspects.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness3/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured with clear sections (Input options, Test Files, Pipeline Selection, Additional Requirements, Returns), but is quite lengthy. While most sentences earn their place by providing necessary guidance, some redundancy exists (e.g., repeating URL formats in both Option 2 and projectURL description). It could be more front-loaded.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (multiple input options, conditional pipeline selection, no annotations, no output schema), the description is mostly complete. It explains what the tool does, how to use it, and what it returns. The main gap is lack of error handling details or what happens when tests fail, but overall it provides sufficient context for an agent to use the tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The description adds substantial value beyond the input schema, which has 0% description coverage. It explains the three input options in detail, clarifies mutual exclusivity ('EXACTLY ONE of these THREE options'), provides format examples for URLs, and explains the pipeline selection logic. This compensates fully for the schema's lack of descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'run evaluation tests on a circleci pipeline' and specifies it 'triggers a new CircleCI pipeline and returns the URL to monitor its progress.' It distinguishes from siblings like 'run_pipeline' by focusing specifically on evaluation/prompt tests, not general pipeline execution.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit usage guidelines with three distinct input options and clear conditions for each. It includes when-not-to-use guidance: 'Never call this tool with incomplete parameters' and 'If none of the options can be fully satisfied, ask the user for the missing information.' It also references sibling tool 'listFollowedProjects' for obtaining projectSlug.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes the tool's interactive flow, error handling (e.g., clear error messages for missing rollback configuration), constraints (e.g., not guessing project slugs), and fallback behaviors (e.g., suggesting documentation for setup). However, it lacks details on rate limits or authentication needs.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness3/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured with sections like 'Typical Flow' and 'Parameters', but it is overly verbose (e.g., detailing a 10-step flow). Some sentences could be condensed (e.g., repetitive notes on project configuration), reducing clarity through excessive detail. It front-loads key information but includes redundant instructions.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness4/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity, no annotations, and no output schema, the description is largely complete, covering purpose, usage, parameters, behavior, and returns. However, it lacks explicit details on output formats (e.g., structure of rollback ID) and could better integrate with sibling tools like 'rerun_workflow' in the flow description.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
The schema description coverage is 0%, so the description must compensate fully. It provides a detailed 'Parameters' section explaining each parameter's purpose, optionality, and examples (e.g., 'projectSlug' from 'listFollowedProjects'), adding significant value beyond the bare schema. This compensates for the lack of schema descriptions.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Run a rollback pipeline for a CircleCI project.' It specifies the verb ('run'), resource ('rollback pipeline'), and scope ('CircleCI project'), distinguishing it from siblings like 'rerun_workflow' or 'run_pipeline' by focusing on rollback operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool, including prerequisites (e.g., needing projectSlug or projectID), when to call sibling tools like 'listFollowedProjects' or 'listComponentVersions', and alternatives like workflow rerun. It also specifies when not to use it (e.g., if a project lacks rollback configuration).
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior4/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries full burden and does an excellent job disclosing behavioral traits: it explains the multi-step pipeline selection process, clarifies that URLs must be user-provided (not constructed), describes the return value format, and specifies parameter interdependencies. It doesn't mention rate limits or authentication requirements, but provides substantial operational context.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured with clear sections (Input options, Configuration, Pipeline Selection, Additional Requirements, Returns) and front-loads the core purpose. While comprehensive, some sentences could be more concise (e.g., the URL format list is detailed but necessary). Every sentence adds value given the complex parameter interactions.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness5/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a complex tool with 7 parameters, 0% schema coverage, no annotations, and no output schema, the description provides exceptional completeness. It covers all usage scenarios, parameter interdependencies, behavioral workflows (multi-step pipeline selection), return values, and error prevention guidance. Nothing essential appears missing for agent understanding.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage, the description fully compensates by providing rich semantic context for all parameters. It explains the three distinct usage patterns, clarifies parameter relationships (mutual exclusivity, required groupings), provides concrete examples for URL formats, and explains conditional parameter usage (pipelineChoiceName only needed for multiple pipelines).
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the specific action ('triggers a new CircleCI pipeline') and outcome ('returns the URL to monitor its progress'). It distinguishes this tool from siblings like 'get_latest_pipeline_status' (monitoring) and 'rerun_workflow' (re-running existing workflows) by focusing on initiating new pipelines.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool vs alternatives, including detailed instructions for three distinct parameter options with clear requirements ('EXACTLY ONE of these THREE options must be used'), prerequisites ('Never call this tool with incomplete parameters'), and fallback actions ('ask the user for the missing information before making the tool call').
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior5/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
Since no annotations are provided, the description carries the full burden of behavioral disclosure. It does this exceptionally well by describing: 1) The tool's two-phase operation (starts export job AND downloads CSV when ready), 2) Mandatory parameter requirements with consequences for omission (rejection, errors), 3) Directory selection logic with fallback hierarchy, 4) File output behavior (CSV saved to specified directory), and 5) Validation requirements for AI agents. This provides comprehensive behavioral context beyond what the schema alone offers.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness3/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is appropriately front-loaded with the core purpose, but contains significant repetition about outputDir requirements and directory selection rules. While all content is valuable, it could be more efficiently organized. The multiple warnings and repeated instructions about outputDir, while important, make the description longer than necessary. Every sentence earns its place, but the structure could be more streamlined.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness5/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the complexity of the tool (two-phase operation, mandatory parameters, file output) and the complete lack of annotations and output schema, the description provides comprehensive context. It covers: purpose, usage requirements, behavioral workflow, parameter semantics, error conditions, and practical implementation guidance for AI agents. The description fully compensates for the absence of structured metadata, making the tool's behavior and requirements completely understandable.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
With 0% schema description coverage (schema has no descriptions), the description must fully compensate, which it does excellently. It provides detailed semantic information for outputDir including: why it's required, directory selection rules, consequences of omission, and practical guidance for AI agents. It also clarifies the purpose of orgId, startDate, and endDate, and explains the relationship between jobId and subsequent calls. The description adds substantial value beyond the bare schema.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'Downloads usage data from the CircleCI Usage API for a given organization and date range. This tool both starts the export job and downloads the resulting CSV file when ready.' This specifies the exact action (downloads usage data), resource (CircleCI Usage API), and scope (organization and date range). It distinguishes itself from sibling tools by focusing specifically on usage data export and download.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool: for downloading usage data from CircleCI. It also includes detailed mandatory requirements for usage: 'The handler will REJECT any call that does not include BOTH outputDir and originalUserMessage' and provides specific directory selection rules. The description clearly states what parameters are required and when they should be used.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior5/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure. It thoroughly explains the tool's interactive behavior: guiding through multi-step selection, automatically resolving orgID, listing options when parameters are missing, requiring multiple calls, and handling pagination limits. It also includes important notes about not automatically running additional tools.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness3/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is comprehensive but lengthy with repetitive sections (e.g., parameters and behavior sections overlap). While well-structured with headings, it could be more concise by eliminating redundancy. Every sentence adds value, but some information is repeated across sections.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness5/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
For a complex tool with no annotations, no output schema, and 0% schema coverage, the description is exceptionally complete. It covers purpose, usage flow, parameters, behavior, return values for different scenarios, common use cases, and important operational notes. Nothing essential appears missing given the tool's complexity.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Given 0% schema description coverage (schema has no descriptions for parameters), the description fully compensates by explaining each parameter's purpose, optionality, and how they interact (e.g., projectSlug vs projectID, automatic orgID resolution). It adds crucial context about parameter dependencies and the tool's response behavior based on which parameters are provided.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'lists all versions for a CircleCI component.' It specifies the resource (component versions) and distinguishes it from siblings like list_followed_projects by focusing on component version listing rather than project listing or other operations.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit guidance on when to use this tool, including initial requirements (projectSlug or projectID), typical flow steps, and common use cases such as identifying deployed versions or selecting versions for rollback. It distinguishes this from other tools by detailing its multi-step parameter-gathering process.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.
Behavior5/5
Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?
With no annotations provided, the description carries the full burden of behavioral disclosure and does so comprehensively. It reveals critical behavioral traits including truncation handling requirements (checking for <MCPTruncationWarning>, required warning message), input validation rules (exactly one of three options, parameter completeness requirements), and workflow dependencies (recommends using listFollowedProjects first). This goes well beyond what a basic description would provide.
Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.
Conciseness4/5
Is the description appropriately sized, front-loaded, and free of redundancy?
The description is well-structured with clear sections (CRITICAL REQUIREMENTS, Input options, Recommended Workflow, Additional Requirements) but is quite lengthy. While every sentence earns its place by providing essential guidance, the front-loading could be improved - the core purpose appears early, but critical behavioral details are buried in later sections. The structure helps navigation but the length reduces conciseness.
Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.
Completeness5/5
Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?
Given the tool's complexity (multiple input options, truncation handling, workflow dependencies) and the absence of both annotations and output schema, the description provides complete contextual information. It covers purpose, usage scenarios, parameter semantics, behavioral constraints, error handling, and integration with other tools. No additional information would be needed for an agent to use this tool correctly.
Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.
Parameters5/5
Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?
Despite having 0% schema description coverage (the schema has descriptions but they're not counted in coverage), the description provides extensive parameter semantics that fully compensate. It explains the three distinct parameter options, their relationships (mutual exclusivity, required combinations), specific format requirements (e.g., projectSlug format from listFollowedProjects), and practical usage examples. This adds substantial meaning beyond the basic schema properties.
Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.
Purpose5/5
Does the description clearly state what the tool does and how it differs from similar tools?
The description clearly states the tool's purpose: 'retrieving failure logs' to 'debug CircleCI build failures'. It specifies the exact resource (failure logs) and verb (retrieve), and distinguishes it from siblings like get_job_test_results or get_latest_pipeline_status by focusing specifically on failure logs rather than test results or status.
Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.
Usage Guidelines5/5
Does the description explain when to use this tool, when not to, or what alternatives exist?
The description provides explicit, detailed guidance on when and how to use this tool versus alternatives. It outlines three distinct input options with clear requirements, specifies that exactly one option must be used, and provides a recommended workflow starting with the listFollowedProjects tool. It also includes explicit exclusions ('Never call this tool with incomplete parameters') and prerequisites for each option.
Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

GitHub Badge

Glama performs regular codebase and documentation scans to:

Confirm that the MCP server is working as expected.
Confirm that there are no obvious security issues.
Evaluate tool definition quality.

Our badge communicates server capabilities, safety, and installation instructions.

Card Badge

Copy to your README.md:

[![mcp-server-circleci MCP server](https://glama.ai/mcp/servers/CircleCI-Public/mcp-server-circleci/badges/card.svg)](https://glama.ai/mcp/servers/CircleCI-Public/mcp-server-circleci)

Score Badge

Copy to your README.md:

[![mcp-server-circleci MCP server](https://glama.ai/mcp/servers/CircleCI-Public/mcp-server-circleci/badges/score.svg)](https://glama.ai/mcp/servers/CircleCI-Public/mcp-server-circleci)

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/CircleCI-Public/mcp-server-circleci'

If you have feedback or need assistance with the MCP directory API, please join our Discord server