Skip to main content
Glama

run_evaluation_tests

Run evaluation tests on CircleCI pipelines by triggering new builds with custom prompt files. Returns a URL to monitor pipeline progress.

Instructions

This tool allows the users to run evaluation tests on a circleci pipeline. They can be referred to as "Prompt Tests" or "Evaluation Tests". This tool triggers a new CircleCI pipeline and returns the URL to monitor its progress. The tool will generate an appropriate circleci configuration file and trigger a pipeline using this temporary configuration. The tool will return the project slug. Input options (EXACTLY ONE of these THREE options must be used): Option 1 - Project Slug and branch (BOTH required): - projectSlug: The project slug obtained from listFollowedProjects tool (e.g., "gh/organization/project") - branch: The name of the branch (required when using projectSlug) Option 2 - Direct URL (provide ONE of these): - projectURL: The URL of the CircleCI project in any of these formats: * Project URL with branch: https://app.circleci.com/pipelines/gh/organization/project?branch=feature-branch * Pipeline URL: https://app.circleci.com/pipelines/gh/organization/project/123 * Workflow URL: https://app.circleci.com/pipelines/gh/organization/project/123/workflows/abc-def * Job URL: https://app.circleci.com/pipelines/gh/organization/project/123/workflows/abc-def/jobs/xyz Option 3 - Project Detection (ALL of these must be provided together): - workspaceRoot: The absolute path to the workspace root - gitRemoteURL: The URL of the git remote repository - branch: The name of the current branch Test Files: - promptFiles: Array of prompt template file objects from the ./prompts directory, each containing: * fileName: The name of the prompt template file * fileContent: The contents of the prompt template file Pipeline Selection: - If the project has multiple pipeline definitions, the tool will return a list of available pipelines - You must then make another call with the chosen pipeline name using the pipelineChoiceName parameter - The pipelineChoiceName must exactly match one of the pipeline names returned by the tool - If the project has only one pipeline definition, pipelineChoiceName is not needed Additional Requirements: - Never call this tool with incomplete parameters - If using Option 1, make sure to extract the projectSlug exactly as provided by listFollowedProjects - If using Option 2, the URLs MUST be provided by the user - do not attempt to construct or guess URLs - If using Option 3, ALL THREE parameters (workspaceRoot, gitRemoteURL, branch) must be provided - If none of the options can be fully satisfied, ask the user for the missing information before making the tool call Returns: - A URL to the newly triggered pipeline that can be used to monitor its progress

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
paramsNo

Implementation Reference

  • The core handler function that implements the 'run_evaluation_tests' tool logic: detects project/branch, selects pipeline, processes and compresses prompt files, generates CircleCI config for parallel evaluation jobs, and triggers the pipeline.
    export const runEvaluationTests: ToolCallback<{ params: typeof runEvaluationTestsInputSchema; }> = async (args) => { const { workspaceRoot, gitRemoteURL, branch, projectURL, pipelineChoiceName, projectSlug: inputProjectSlug, promptFiles, } = args.params; let projectSlug: string | undefined; let branchFromURL: string | undefined; if (inputProjectSlug) { if (!branch) { return mcpErrorOutput( 'Branch not provided. When using projectSlug, a branch must also be specified.', ); } projectSlug = inputProjectSlug; } else if (projectURL) { projectSlug = getProjectSlugFromURL(projectURL); branchFromURL = getBranchFromURL(projectURL); } else if (workspaceRoot && gitRemoteURL && branch) { projectSlug = await identifyProjectSlug({ gitRemoteURL, }); } else { return mcpErrorOutput( 'Missing required inputs. Please provide either: 1) projectSlug with branch, 2) projectURL, or 3) workspaceRoot with gitRemoteURL and branch.', ); } if (!projectSlug) { return mcpErrorOutput(` Project not found. Ask the user to provide the inputs user can provide based on the tool description. Project slug: ${projectSlug} Git remote URL: ${gitRemoteURL} Branch: ${branch} `); } const foundBranch = branchFromURL || branch; if (!foundBranch) { return mcpErrorOutput( 'No branch provided. Try using the current git branch.', ); } if (!promptFiles || promptFiles.length === 0) { return mcpErrorOutput( 'No prompt template files provided. Please ensure you have prompt template files in the ./prompts directory (e.g. <relevant-name>.prompt.yml) and include them in the promptFiles parameter.', ); } const circleci = getCircleCIClient(); const { id: projectId } = await circleci.projects.getProject({ projectSlug, }); const pipelineDefinitions = await circleci.pipelines.getPipelineDefinitions({ projectId, }); const pipelineChoices = [ ...pipelineDefinitions.map((definition) => ({ name: definition.name, definitionId: definition.id, })), ]; if (pipelineChoices.length === 0) { return mcpErrorOutput( 'No pipeline definitions found. Please make sure your project is set up on CircleCI to run pipelines.', ); } const formattedPipelineChoices = pipelineChoices .map( (pipeline, index) => `${index + 1}. ${pipeline.name} (definitionId: ${pipeline.definitionId})`, ) .join('\n'); if (pipelineChoices.length > 1 && !pipelineChoiceName) { return { content: [ { type: 'text', text: `Multiple pipeline definitions found. Please choose one of the following:\n${formattedPipelineChoices}`, }, ], }; } const chosenPipeline = pipelineChoiceName ? pipelineChoices.find((pipeline) => pipeline.name === pipelineChoiceName) : undefined; if (pipelineChoiceName && !chosenPipeline) { return mcpErrorOutput( `Pipeline definition with name ${pipelineChoiceName} not found. Please choose one of the following:\n${formattedPipelineChoices}`, ); } const runPipelineDefinitionId = chosenPipeline?.definitionId || pipelineChoices[0].definitionId; // Process each file for compression and encoding const processedFiles = promptFiles.map((promptFile) => { const fileExtension = promptFile.fileName.toLowerCase(); let processedPromptFileContent: string; if (fileExtension.endsWith('.json')) { // For JSON files, parse and re-stringify to ensure proper formatting const json = JSON.parse(promptFile.fileContent); processedPromptFileContent = JSON.stringify(json, null); } else if ( fileExtension.endsWith('.yml') || fileExtension.endsWith('.yaml') ) { // For YAML files, keep as-is processedPromptFileContent = promptFile.fileContent; } else { // Default to treating as text content processedPromptFileContent = promptFile.fileContent; } // Gzip compress the content and then base64 encode for compact transport const gzippedContent = gzipSync(processedPromptFileContent); const base64GzippedContent = gzippedContent.toString('base64'); return { fileName: promptFile.fileName, base64GzippedContent, }; }); // Generate file creation commands with conditional logic for parallelism const fileCreationCommands = processedFiles .map( (file, index) => ` if [ "$CIRCLE_NODE_INDEX" = "${index}" ]; then sudo mkdir -p /prompts echo "${file.base64GzippedContent}" | base64 -d | gzip -d | sudo tee /prompts/${file.fileName} > /dev/null fi`, ) .join('\n'); // Generate individual evaluation commands with conditional logic for parallelism const evaluationCommands = processedFiles .map( (file, index) => ` if [ "$CIRCLE_NODE_INDEX" = "${index}" ]; then python eval.py ${file.fileName} fi`, ) .join('\n'); const configContent = ` version: 2.1 jobs: evaluate-prompt-template-tests: parallelism: ${processedFiles.length} docker: - image: cimg/python:3.12.0 steps: - run: | curl https://gist.githubusercontent.com/jvincent42/10bf3d2d2899033ae1530cf429ed03f8/raw/acf07002d6bfcfb649c913b01a203af086c1f98d/eval.py > eval.py echo "deepeval>=3.0.3 openai>=1.84.0 anthropic>=0.54.0 PyYAML>=6.0.2 " > requirements.txt pip install -r requirements.txt - run: | ${fileCreationCommands} - run: | ${evaluationCommands} workflows: mcp-run-evaluation-tests: jobs: - evaluate-prompt-template-tests `; const runPipelineResponse = await circleci.pipelines.runPipeline({ projectSlug, branch: foundBranch, definitionId: runPipelineDefinitionId, configContent, }); return { content: [ { type: 'text', text: `Pipeline run successfully. View it at: https://app.circleci.com/pipelines/${projectSlug}/${runPipelineResponse.number}`, }, ], }; };
  • Zod schema defining the input structure for the tool, including options for project identification and required promptFiles array.
    export const runEvaluationTestsInputSchema = z.object({ projectSlug: z.string().describe(projectSlugDescription).optional(), branch: z.string().describe(branchDescription).optional(), workspaceRoot: z .string() .describe( 'The absolute path to the root directory of your project workspace. ' + 'This should be the top-level folder containing your source code, configuration files, and dependencies. ' + 'For example: "/home/user/my-project" or "C:\\Users\\user\\my-project"', ) .optional(), gitRemoteURL: z .string() .describe( 'The URL of the remote git repository. This should be the URL of the repository that you cloned to your local workspace. ' + 'For example: "https://github.com/user/my-project.git"', ) .optional(), projectURL: z .string() .describe( 'The URL of the CircleCI project. Can be any of these formats:\n' + '- Project URL with branch: https://app.circleci.com/pipelines/gh/organization/project?branch=feature-branch\n' + '- Pipeline URL: https://app.circleci.com/pipelines/gh/organization/project/123\n' + '- Workflow URL: https://app.circleci.com/pipelines/gh/organization/project/123/workflows/abc-def\n' + '- Job URL: https://app.circleci.com/pipelines/gh/organization/project/123/workflows/abc-def/jobs/xyz', ) .optional(), pipelineChoiceName: z .string() .describe( 'The name of the pipeline to run. This parameter is only needed if the project has multiple pipeline definitions. ' + 'If not provided and multiple pipelines exist, the tool will return a list of available pipelines for the user to choose from. ' + 'If provided, it must exactly match one of the pipeline names returned by the tool.', ) .optional(), promptFiles: z .array( z.object({ fileName: z.string().describe('The name of the prompt template file'), fileContent: z .string() .describe('The contents of the prompt template file'), }), ) .describe( `Array of prompt template files in the ${promptsOutputDirectory} directory (e.g. ${fileNameTemplate}).`, ), });
  • Tool object definition exporting 'runEvaluationTestsTool' with name 'run_evaluation_tests', description, and inputSchema reference.
    export const runEvaluationTestsTool = { name: 'run_evaluation_tests' as const, description: ` This tool allows the users to run evaluation tests on a circleci pipeline. They can be referred to as "Prompt Tests" or "Evaluation Tests". This tool triggers a new CircleCI pipeline and returns the URL to monitor its progress. The tool will generate an appropriate circleci configuration file and trigger a pipeline using this temporary configuration. The tool will return the project slug. Input options (EXACTLY ONE of these THREE options must be used): ${option1DescriptionBranchRequired} Option 2 - Direct URL (provide ONE of these): - projectURL: The URL of the CircleCI project in any of these formats: * Project URL with branch: https://app.circleci.com/pipelines/gh/organization/project?branch=feature-branch * Pipeline URL: https://app.circleci.com/pipelines/gh/organization/project/123 * Workflow URL: https://app.circleci.com/pipelines/gh/organization/project/123/workflows/abc-def * Job URL: https://app.circleci.com/pipelines/gh/organization/project/123/workflows/abc-def/jobs/xyz Option 3 - Project Detection (ALL of these must be provided together): - workspaceRoot: The absolute path to the workspace root - gitRemoteURL: The URL of the git remote repository - branch: The name of the current branch Test Files: - promptFiles: Array of prompt template file objects from the ${promptsOutputDirectory} directory, each containing: * fileName: The name of the prompt template file * fileContent: The contents of the prompt template file Pipeline Selection: - If the project has multiple pipeline definitions, the tool will return a list of available pipelines - You must then make another call with the chosen pipeline name using the pipelineChoiceName parameter - The pipelineChoiceName must exactly match one of the pipeline names returned by the tool - If the project has only one pipeline definition, pipelineChoiceName is not needed Additional Requirements: - Never call this tool with incomplete parameters - If using Option 1, make sure to extract the projectSlug exactly as provided by listFollowedProjects - If using Option 2, the URLs MUST be provided by the user - do not attempt to construct or guess URLs - If using Option 3, ALL THREE parameters (workspaceRoot, gitRemoteURL, branch) must be provided - If none of the options can be fully satisfied, ask the user for the missing information before making the tool call Returns: - A URL to the newly triggered pipeline that can be used to monitor its progress `, inputSchema: runEvaluationTestsInputSchema, };
  • Registration of the runEvaluationTestsTool in the central CCI_TOOLS array used for MCP tool provision.
    export const CCI_TOOLS = [ getBuildFailureLogsTool, getFlakyTestLogsTool, getLatestPipelineStatusTool, getJobTestResultsTool, configHelperTool, createPromptTemplateTool, recommendPromptTemplateTestsTool, runPipelineTool, listFollowedProjectsTool, runEvaluationTestsTool, rerunWorkflowTool, analyzeDiffTool, runRollbackPipelineTool, ];
  • Handler mapping for 'run_evaluation_tests' to the runEvaluationTests function in CCI_HANDLERS object.
    export const CCI_HANDLERS = { get_build_failure_logs: getBuildFailureLogs, find_flaky_tests: getFlakyTestLogs, get_latest_pipeline_status: getLatestPipelineStatus, get_job_test_results: getJobTestResults, config_helper: configHelper, create_prompt_template: createPromptTemplate, recommend_prompt_template_tests: recommendPromptTemplateTests, run_pipeline: runPipeline, list_followed_projects: listFollowedProjects, run_evaluation_tests: runEvaluationTests, rerun_workflow: rerunWorkflow, analyze_diff: analyzeDiff, run_rollback_pipeline: runRollbackPipeline, } satisfies ToolHandlers;

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/ampcome-mcps/circleci-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server