Skip to main content
Glama
ampcome-mcps

CircleCI MCP Server

by ampcome-mcps

run_evaluation_tests

Run evaluation tests on CircleCI pipelines by triggering new builds with custom prompt files. Returns a URL to monitor pipeline progress.

Instructions

This tool allows the users to run evaluation tests on a circleci pipeline.
They can be referred to as "Prompt Tests" or "Evaluation Tests".

This tool triggers a new CircleCI pipeline and returns the URL to monitor its progress.
The tool will generate an appropriate circleci configuration file and trigger a pipeline using this temporary configuration.
The tool will return the project slug.

Input options (EXACTLY ONE of these THREE options must be used):

Option 1 - Project Slug and branch (BOTH required):
- projectSlug: The project slug obtained from listFollowedProjects tool (e.g., "gh/organization/project")
- branch: The name of the branch (required when using projectSlug)

Option 2 - Direct URL (provide ONE of these):
- projectURL: The URL of the CircleCI project in any of these formats:
  * Project URL with branch: https://app.circleci.com/pipelines/gh/organization/project?branch=feature-branch
  * Pipeline URL: https://app.circleci.com/pipelines/gh/organization/project/123
  * Workflow URL: https://app.circleci.com/pipelines/gh/organization/project/123/workflows/abc-def
  * Job URL: https://app.circleci.com/pipelines/gh/organization/project/123/workflows/abc-def/jobs/xyz

Option 3 - Project Detection (ALL of these must be provided together):
- workspaceRoot: The absolute path to the workspace root
- gitRemoteURL: The URL of the git remote repository
- branch: The name of the current branch

Test Files:
- promptFiles: Array of prompt template file objects from the ./prompts directory, each containing:
  * fileName: The name of the prompt template file
  * fileContent: The contents of the prompt template file

Pipeline Selection:
- If the project has multiple pipeline definitions, the tool will return a list of available pipelines
- You must then make another call with the chosen pipeline name using the pipelineChoiceName parameter
- The pipelineChoiceName must exactly match one of the pipeline names returned by the tool
- If the project has only one pipeline definition, pipelineChoiceName is not needed

Additional Requirements:
- Never call this tool with incomplete parameters
- If using Option 1, make sure to extract the projectSlug exactly as provided by listFollowedProjects
- If using Option 2, the URLs MUST be provided by the user - do not attempt to construct or guess URLs
- If using Option 3, ALL THREE parameters (workspaceRoot, gitRemoteURL, branch) must be provided
- If none of the options can be fully satisfied, ask the user for the missing information before making the tool call

Returns:
- A URL to the newly triggered pipeline that can be used to monitor its progress

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
paramsNo

Implementation Reference

  • The core handler function that implements the 'run_evaluation_tests' tool logic: detects project/branch, selects pipeline, processes and compresses prompt files, generates CircleCI config for parallel evaluation jobs, and triggers the pipeline.
    export const runEvaluationTests: ToolCallback<{
      params: typeof runEvaluationTestsInputSchema;
    }> = async (args) => {
      const {
        workspaceRoot,
        gitRemoteURL,
        branch,
        projectURL,
        pipelineChoiceName,
        projectSlug: inputProjectSlug,
        promptFiles,
      } = args.params;
    
      let projectSlug: string | undefined;
      let branchFromURL: string | undefined;
    
      if (inputProjectSlug) {
        if (!branch) {
          return mcpErrorOutput(
            'Branch not provided. When using projectSlug, a branch must also be specified.',
          );
        }
        projectSlug = inputProjectSlug;
      } else if (projectURL) {
        projectSlug = getProjectSlugFromURL(projectURL);
        branchFromURL = getBranchFromURL(projectURL);
      } else if (workspaceRoot && gitRemoteURL && branch) {
        projectSlug = await identifyProjectSlug({
          gitRemoteURL,
        });
      } else {
        return mcpErrorOutput(
          'Missing required inputs. Please provide either: 1) projectSlug with branch, 2) projectURL, or 3) workspaceRoot with gitRemoteURL and branch.',
        );
      }
    
      if (!projectSlug) {
        return mcpErrorOutput(`
              Project not found. Ask the user to provide the inputs user can provide based on the tool description.
    
              Project slug: ${projectSlug}
              Git remote URL: ${gitRemoteURL}
              Branch: ${branch}
              `);
      }
      const foundBranch = branchFromURL || branch;
      if (!foundBranch) {
        return mcpErrorOutput(
          'No branch provided. Try using the current git branch.',
        );
      }
    
      if (!promptFiles || promptFiles.length === 0) {
        return mcpErrorOutput(
          'No prompt template files provided. Please ensure you have prompt template files in the ./prompts directory (e.g. <relevant-name>.prompt.yml) and include them in the promptFiles parameter.',
        );
      }
    
      const circleci = getCircleCIClient();
      const { id: projectId } = await circleci.projects.getProject({
        projectSlug,
      });
      const pipelineDefinitions = await circleci.pipelines.getPipelineDefinitions({
        projectId,
      });
    
      const pipelineChoices = [
        ...pipelineDefinitions.map((definition) => ({
          name: definition.name,
          definitionId: definition.id,
        })),
      ];
    
      if (pipelineChoices.length === 0) {
        return mcpErrorOutput(
          'No pipeline definitions found. Please make sure your project is set up on CircleCI to run pipelines.',
        );
      }
    
      const formattedPipelineChoices = pipelineChoices
        .map(
          (pipeline, index) =>
            `${index + 1}. ${pipeline.name} (definitionId: ${pipeline.definitionId})`,
        )
        .join('\n');
    
      if (pipelineChoices.length > 1 && !pipelineChoiceName) {
        return {
          content: [
            {
              type: 'text',
              text: `Multiple pipeline definitions found. Please choose one of the following:\n${formattedPipelineChoices}`,
            },
          ],
        };
      }
    
      const chosenPipeline = pipelineChoiceName
        ? pipelineChoices.find((pipeline) => pipeline.name === pipelineChoiceName)
        : undefined;
    
      if (pipelineChoiceName && !chosenPipeline) {
        return mcpErrorOutput(
          `Pipeline definition with name ${pipelineChoiceName} not found. Please choose one of the following:\n${formattedPipelineChoices}`,
        );
      }
    
      const runPipelineDefinitionId =
        chosenPipeline?.definitionId || pipelineChoices[0].definitionId;
    
      // Process each file for compression and encoding
      const processedFiles = promptFiles.map((promptFile) => {
        const fileExtension = promptFile.fileName.toLowerCase();
        let processedPromptFileContent: string;
    
        if (fileExtension.endsWith('.json')) {
          // For JSON files, parse and re-stringify to ensure proper formatting
          const json = JSON.parse(promptFile.fileContent);
          processedPromptFileContent = JSON.stringify(json, null);
        } else if (
          fileExtension.endsWith('.yml') ||
          fileExtension.endsWith('.yaml')
        ) {
          // For YAML files, keep as-is
          processedPromptFileContent = promptFile.fileContent;
        } else {
          // Default to treating as text content
          processedPromptFileContent = promptFile.fileContent;
        }
    
        // Gzip compress the content and then base64 encode for compact transport
        const gzippedContent = gzipSync(processedPromptFileContent);
        const base64GzippedContent = gzippedContent.toString('base64');
    
        return {
          fileName: promptFile.fileName,
          base64GzippedContent,
        };
      });
    
      // Generate file creation commands with conditional logic for parallelism
      const fileCreationCommands = processedFiles
        .map(
          (file, index) =>
            `          if [ "$CIRCLE_NODE_INDEX" = "${index}" ]; then
                sudo mkdir -p /prompts
                echo "${file.base64GzippedContent}" | base64 -d | gzip -d | sudo tee /prompts/${file.fileName} > /dev/null
              fi`,
        )
        .join('\n');
    
      // Generate individual evaluation commands with conditional logic for parallelism
      const evaluationCommands = processedFiles
        .map(
          (file, index) =>
            `          if [ "$CIRCLE_NODE_INDEX" = "${index}" ]; then
                python eval.py ${file.fileName}
              fi`,
        )
        .join('\n');
    
      const configContent = `
    version: 2.1
    
    jobs:
      evaluate-prompt-template-tests:
        parallelism: ${processedFiles.length}
        docker:
          - image: cimg/python:3.12.0
        steps:
          - run: |
              curl https://gist.githubusercontent.com/jvincent42/10bf3d2d2899033ae1530cf429ed03f8/raw/acf07002d6bfcfb649c913b01a203af086c1f98d/eval.py > eval.py
              echo "deepeval>=3.0.3
              openai>=1.84.0
              anthropic>=0.54.0
              PyYAML>=6.0.2
              " > requirements.txt
              pip install -r requirements.txt
          - run: |
    ${fileCreationCommands}
          - run: |
    ${evaluationCommands}
    
    workflows:
      mcp-run-evaluation-tests:
        jobs:
          - evaluate-prompt-template-tests
    `;
    
      const runPipelineResponse = await circleci.pipelines.runPipeline({
        projectSlug,
        branch: foundBranch,
        definitionId: runPipelineDefinitionId,
        configContent,
      });
    
      return {
        content: [
          {
            type: 'text',
            text: `Pipeline run successfully. View it at: https://app.circleci.com/pipelines/${projectSlug}/${runPipelineResponse.number}`,
          },
        ],
      };
    };
  • Zod schema defining the input structure for the tool, including options for project identification and required promptFiles array.
    export const runEvaluationTestsInputSchema = z.object({
      projectSlug: z.string().describe(projectSlugDescription).optional(),
      branch: z.string().describe(branchDescription).optional(),
      workspaceRoot: z
        .string()
        .describe(
          'The absolute path to the root directory of your project workspace. ' +
            'This should be the top-level folder containing your source code, configuration files, and dependencies. ' +
            'For example: "/home/user/my-project" or "C:\\Users\\user\\my-project"',
        )
        .optional(),
      gitRemoteURL: z
        .string()
        .describe(
          'The URL of the remote git repository. This should be the URL of the repository that you cloned to your local workspace. ' +
            'For example: "https://github.com/user/my-project.git"',
        )
        .optional(),
      projectURL: z
        .string()
        .describe(
          'The URL of the CircleCI project. Can be any of these formats:\n' +
            '- Project URL with branch: https://app.circleci.com/pipelines/gh/organization/project?branch=feature-branch\n' +
            '- Pipeline URL: https://app.circleci.com/pipelines/gh/organization/project/123\n' +
            '- Workflow URL: https://app.circleci.com/pipelines/gh/organization/project/123/workflows/abc-def\n' +
            '- Job URL: https://app.circleci.com/pipelines/gh/organization/project/123/workflows/abc-def/jobs/xyz',
        )
        .optional(),
      pipelineChoiceName: z
        .string()
        .describe(
          'The name of the pipeline to run. This parameter is only needed if the project has multiple pipeline definitions. ' +
            'If not provided and multiple pipelines exist, the tool will return a list of available pipelines for the user to choose from. ' +
            'If provided, it must exactly match one of the pipeline names returned by the tool.',
        )
        .optional(),
      promptFiles: z
        .array(
          z.object({
            fileName: z.string().describe('The name of the prompt template file'),
            fileContent: z
              .string()
              .describe('The contents of the prompt template file'),
          }),
        )
        .describe(
          `Array of prompt template files in the ${promptsOutputDirectory} directory (e.g. ${fileNameTemplate}).`,
        ),
    });
  • Tool object definition exporting 'runEvaluationTestsTool' with name 'run_evaluation_tests', description, and inputSchema reference.
    export const runEvaluationTestsTool = {
      name: 'run_evaluation_tests' as const,
      description: `
        This tool allows the users to run evaluation tests on a circleci pipeline.
        They can be referred to as "Prompt Tests" or "Evaluation Tests".
    
        This tool triggers a new CircleCI pipeline and returns the URL to monitor its progress.
        The tool will generate an appropriate circleci configuration file and trigger a pipeline using this temporary configuration.
        The tool will return the project slug.
    
        Input options (EXACTLY ONE of these THREE options must be used):
    
        ${option1DescriptionBranchRequired}
    
        Option 2 - Direct URL (provide ONE of these):
        - projectURL: The URL of the CircleCI project in any of these formats:
          * Project URL with branch: https://app.circleci.com/pipelines/gh/organization/project?branch=feature-branch
          * Pipeline URL: https://app.circleci.com/pipelines/gh/organization/project/123
          * Workflow URL: https://app.circleci.com/pipelines/gh/organization/project/123/workflows/abc-def
          * Job URL: https://app.circleci.com/pipelines/gh/organization/project/123/workflows/abc-def/jobs/xyz
    
        Option 3 - Project Detection (ALL of these must be provided together):
        - workspaceRoot: The absolute path to the workspace root
        - gitRemoteURL: The URL of the git remote repository
        - branch: The name of the current branch
    
        Test Files:
        - promptFiles: Array of prompt template file objects from the ${promptsOutputDirectory} directory, each containing:
          * fileName: The name of the prompt template file
          * fileContent: The contents of the prompt template file
    
        Pipeline Selection:
        - If the project has multiple pipeline definitions, the tool will return a list of available pipelines
        - You must then make another call with the chosen pipeline name using the pipelineChoiceName parameter
        - The pipelineChoiceName must exactly match one of the pipeline names returned by the tool
        - If the project has only one pipeline definition, pipelineChoiceName is not needed
    
        Additional Requirements:
        - Never call this tool with incomplete parameters
        - If using Option 1, make sure to extract the projectSlug exactly as provided by listFollowedProjects
        - If using Option 2, the URLs MUST be provided by the user - do not attempt to construct or guess URLs
        - If using Option 3, ALL THREE parameters (workspaceRoot, gitRemoteURL, branch) must be provided
        - If none of the options can be fully satisfied, ask the user for the missing information before making the tool call
    
        Returns:
        - A URL to the newly triggered pipeline that can be used to monitor its progress
        `,
      inputSchema: runEvaluationTestsInputSchema,
    };
  • Registration of the runEvaluationTestsTool in the central CCI_TOOLS array used for MCP tool provision.
    export const CCI_TOOLS = [
      getBuildFailureLogsTool,
      getFlakyTestLogsTool,
      getLatestPipelineStatusTool,
      getJobTestResultsTool,
      configHelperTool,
      createPromptTemplateTool,
      recommendPromptTemplateTestsTool,
      runPipelineTool,
      listFollowedProjectsTool,
      runEvaluationTestsTool,
      rerunWorkflowTool,
      analyzeDiffTool,
      runRollbackPipelineTool,
    ];
  • Handler mapping for 'run_evaluation_tests' to the runEvaluationTests function in CCI_HANDLERS object.
    export const CCI_HANDLERS = {
      get_build_failure_logs: getBuildFailureLogs,
      find_flaky_tests: getFlakyTestLogs,
      get_latest_pipeline_status: getLatestPipelineStatus,
      get_job_test_results: getJobTestResults,
      config_helper: configHelper,
      create_prompt_template: createPromptTemplate,
      recommend_prompt_template_tests: recommendPromptTemplateTests,
      run_pipeline: runPipeline,
      list_followed_projects: listFollowedProjects,
      run_evaluation_tests: runEvaluationTests,
      rerun_workflow: rerunWorkflow,
      analyze_diff: analyzeDiff,
      run_rollback_pipeline: runRollbackPipeline,
    } satisfies ToolHandlers;
Behavior5/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It thoroughly describes the tool's behavior: it triggers a new CircleCI pipeline, generates a temporary configuration file, returns a project slug or URL, handles multiple pipeline definitions with a selection process, and includes constraints like requiring exactly one of three input options and not constructing URLs. This covers operational details, constraints, and output behavior comprehensively.

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness4/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately sized and well-structured, with clear sections (e.g., Input options, Test Files, Pipeline Selection, Additional Requirements, Returns). It is front-loaded with the core purpose and key behavior. While comprehensive, some sentences could be more concise (e.g., the detailed URL formats in Option 2), but overall, it efficiently conveys necessary information without waste.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness5/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the tool's complexity (multiple input options, nested objects, no output schema, and no annotations), the description is complete. It covers all aspects: purpose, usage guidelines, behavioral traits, parameter semantics, and return values (URL to monitor progress). It addresses potential edge cases like multiple pipeline definitions and incomplete parameters, ensuring the agent has sufficient context to use the tool correctly.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

The input schema has 0% description coverage (based on context signals indicating 0% schema description coverage), so the description must compensate. It adds extensive meaning beyond the schema by explaining the three input options in detail, specifying required parameters for each option, describing 'promptFiles' as array objects from the ./prompts directory, and clarifying 'pipelineChoiceName' usage for multiple pipelines. This fully compensates for the lack of schema descriptions.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose5/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool's purpose: 'run evaluation tests on a circleci pipeline' and 'triggers a new CircleCI pipeline and returns the URL to monitor its progress.' It distinguishes from siblings like 'run_pipeline' by specifying it's for evaluation tests (referred to as 'Prompt Tests' or 'Evaluation Tests'), making the purpose specific and differentiated.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides explicit usage guidelines, including when to use this tool (for running evaluation tests on CircleCI pipelines) and detailed conditions for three input options. It also specifies prerequisites like 'Never call this tool with incomplete parameters' and references sibling tools (e.g., 'listFollowedProjects'), offering clear alternatives and exclusions.

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ampcome-mcps/circleci-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server