Skip to main content
Glama
CircleCI-Public

mcp-server-circleci

Official

find_flaky_tests

Retrieve information about flaky tests in CircleCI projects to analyze test reliability and implement targeted fixes.

Instructions

This tool retrieves information about flaky tests in a CircleCI project. 

The agent receiving this output MUST analyze the flaky test data and implement appropriate fixes based on the specific issues identified.

CRITICAL REQUIREMENTS:
1. Truncation Handling (HIGHEST PRIORITY):
   - ALWAYS check for <MCPTruncationWarning> in the output
   - When present, you MUST start your response with:
     "WARNING: The logs have been truncated. Only showing the most recent entries. Earlier build failures may not be visible."
   - Only proceed with log analysis after acknowledging the truncation

Input options (EXACTLY ONE of these THREE options must be used):

Option 1 - Project Slug:
- projectSlug: The project slug obtained from listFollowedProjects tool (e.g., "gh/organization/project")

Option 2 - Direct URL (provide ONE of these):
- projectURL: The URL of the CircleCI project in any of these formats:
  * Project URL: https://app.circleci.com/pipelines/gh/organization/project
  * Pipeline URL: https://app.circleci.com/pipelines/gh/organization/project/123
  * Workflow URL: https://app.circleci.com/pipelines/gh/organization/project/123/workflows/abc-def
  * Job URL: https://app.circleci.com/pipelines/gh/organization/project/123/workflows/abc-def/jobs/xyz

Option 3 - Project Detection (ALL of these must be provided together):
- workspaceRoot: The absolute path to the workspace root
- gitRemoteURL: The URL of the git remote repository

Additional Requirements:
- Never call this tool with incomplete parameters
- If using Option 1, make sure to extract the projectSlug exactly as provided by listFollowedProjects
- If using Option 2, the URLs MUST be provided by the user - do not attempt to construct or guess URLs
- If using Option 3, BOTH parameters (workspaceRoot, gitRemoteURL) must be provided
- If none of the options can be fully satisfied, ask the user for the missing information before making the tool call

Input Schema

TableJSON Schema
NameRequiredDescriptionDefault
paramsNo

Implementation Reference

  • Primary MCP tool handler function. Resolves project slug from inputs, fetches flaky tests data, handles file output if enabled, otherwise formats for MCP response.
    export const getFlakyTestLogs: ToolCallback<{
      params: typeof getFlakyTestLogsInputSchema;
    }> = async (args) => {
      const {
        workspaceRoot,
        gitRemoteURL,
        projectURL,
        projectSlug: inputProjectSlug,
      } = args.params ?? {};
    
      let projectSlug: string | null | undefined;
    
      if (inputProjectSlug) {
        projectSlug = inputProjectSlug;
      } else if (projectURL) {
        projectSlug = getProjectSlugFromURL(projectURL);
      } else if (workspaceRoot && gitRemoteURL) {
        projectSlug = await identifyProjectSlug({
          gitRemoteURL,
        });
      } else {
        return mcpErrorOutput(
          'Missing required inputs. Please provide either: 1) projectSlug, 2) projectURL, or 3) workspaceRoot with gitRemoteURL.',
        );
      }
    
      if (!projectSlug) {
        return mcpErrorOutput(`
              Project not found. Ask the user to provide the inputs user can provide based on the tool description.
    
              Project slug: ${projectSlug}
              Git remote URL: ${gitRemoteURL}
              `);
      }
    
      const tests = await getFlakyTests({
        projectSlug,
      });
    
      if (process.env.FILE_OUTPUT_DIRECTORY) {
        try {
          return await writeTestsToFiles({ tests });
        } catch (error) {
          console.error(error);
          return formatFlakyTests(tests);
        }
      }
    
      return formatFlakyTests(tests);
    };
  • Zod schema defining the input parameters for the find_flaky_tests tool.
    export const getFlakyTestLogsInputSchema = z.object({
      projectSlug: z.string().describe(projectSlugDescriptionNoBranch).optional(),
      workspaceRoot: z
        .string()
        .describe(
          'The absolute path to the root directory of your project workspace. ' +
            'This should be the top-level folder containing your source code, configuration files, and dependencies. ' +
            'For example: "/home/user/my-project" or "C:\\Users\\user\\my-project"',
        )
        .optional(),
      gitRemoteURL: z
        .string()
        .describe(
          'The URL of the remote git repository. This should be the URL of the repository that you cloned to your local workspace. ' +
            'For example: "https://github.com/user/my-project.git"',
        )
        .optional(),
      projectURL: z
        .string()
        .describe(
          'The URL of the CircleCI project. Can be any of these formats:\n' +
            '- Project URL: https://app.circleci.com/pipelines/gh/organization/project\n' +
            '- Project URL with branch: https://app.circleci.com/pipelines/gh/organization/project?branch=feature-branch\n' +
            '- Pipeline URL: https://app.circleci.com/pipelines/gh/organization/project/123\n' +
            '- Workflow URL: https://app.circleci.com/pipelines/gh/organization/project/123/workflows/abc-def\n' +
            '- Job URL: https://app.circleci.com/pipelines/gh/organization/project/123/workflows/abc-def/jobs/xyz',
        )
        .optional(),
    });
  • Registers the 'find_flaky_tests' tool name mapped to its handler function in the central CCI_HANDLERS object.
    export const CCI_HANDLERS = {
      get_build_failure_logs: getBuildFailureLogs,
      find_flaky_tests: getFlakyTestLogs,
      get_latest_pipeline_status: getLatestPipelineStatus,
      get_job_test_results: getJobTestResults,
      config_helper: configHelper,
      create_prompt_template: createPromptTemplate,
      recommend_prompt_template_tests: recommendPromptTemplateTests,
      run_pipeline: runPipeline,
      list_followed_projects: listFollowedProjects,
      run_evaluation_tests: runEvaluationTests,
      rerun_workflow: rerunWorkflow,
      download_usage_api_data: downloadUsageApiData,
      find_underused_resource_classes: findUnderusedResourceClasses,
      analyze_diff: analyzeDiff,
      run_rollback_pipeline: runRollbackPipeline,
      list_component_versions: listComponentVersions,
    } satisfies ToolHandlers;
  • Includes the getFlakyTestLogsTool in the exported CCI_TOOLS array for tool registration.
    export const CCI_TOOLS = [
      getBuildFailureLogsTool,
      getFlakyTestLogsTool,
      getLatestPipelineStatusTool,
      getJobTestResultsTool,
      configHelperTool,
      createPromptTemplateTool,
      recommendPromptTemplateTestsTool,
      runPipelineTool,
      listFollowedProjectsTool,
      runEvaluationTestsTool,
      rerunWorkflowTool,
      downloadUsageApiDataTool,
      findUnderusedResourceClassesTool,
      analyzeDiffTool,
      runRollbackPipelineTool,
      listComponentVersionsTool,
    ];
  • Core helper function that fetches flaky tests from CircleCI API using insights.getProjectFlakyTests and retrieves detailed test data from jobs.
    const getFlakyTests = async ({ projectSlug }: { projectSlug: string }) => {
      const circleci = getCircleCIClient();
      const flakyTests = await circleci.insights.getProjectFlakyTests({
        projectSlug,
      });
    
      if (!flakyTests || !flakyTests.flaky_tests) {
        throw new Error('Flaky tests not found');
      }
    
      const flakyTestDetails = [
        ...new Set(
          flakyTests.flaky_tests.map((test) => ({
            jobNumber: test.job_number,
            test_name: test.test_name,
          })),
        ),
      ];
    
      const testsArrays = await rateLimitedRequests(
        flakyTestDetails.map(({ jobNumber, test_name }) => async () => {
          try {
            const tests = await circleci.tests.getJobTests({
              projectSlug,
              jobNumber,
            });
            const matchingTest = tests.find((test) => test.name === test_name);
            if (matchingTest) {
              return matchingTest;
            }
            console.error(`Test ${test_name} not found in job ${jobNumber}`);
            return tests.filter((test) => test.result === 'failure');
          } catch (error) {
            if (error instanceof Error && error.message.includes('404')) {
              console.error(`Job ${jobNumber} not found:`, error);
              return undefined;
            } else if (error instanceof Error && error.message.includes('429')) {
              console.error(`Rate limited for job request ${jobNumber}:`, error);
              return undefined;
            }
            throw error;
          }
        }),
      );
    
      const filteredTestsArrays = testsArrays
        .flat()
        .filter((test) => test !== undefined);
    
      return filteredTestsArrays;
    };
Behavior4/5

Does the description disclose side effects, auth requirements, rate limits, or destructive behavior?

With no annotations provided, the description carries the full burden of behavioral disclosure. It effectively describes several behavioral traits: the tool's output may be truncated (with specific handling instructions), it requires exactly one of three parameter sets, and it has strict validation requirements for each parameter option. However, it doesn't mention authentication needs, rate limits, or what happens when flaky tests are found (beyond stating the agent should 'implement appropriate fixes').

Agents need to know what a tool does to the world before calling it. Descriptions should go beyond structured annotations to explain consequences.

Conciseness3/5

Is the description appropriately sized, front-loaded, and free of redundancy?

The description is appropriately front-loaded with the core purpose, but contains significant redundancy and instructional content that extends beyond tool description. The 'CRITICAL REQUIREMENTS' section includes agent instructions about output handling that belong in a different context. While well-structured with clear sections, it's verbose (over 400 words) with some sentences that don't directly describe the tool's behavior or parameters.

Shorter descriptions cost fewer tokens and are easier for agents to parse. Every sentence should earn its place.

Completeness4/5

Given the tool's complexity, does the description cover enough for an agent to succeed on first attempt?

Given the complexity (multiple parameter patterns, no annotations, no output schema), the description provides substantial context about parameter usage, validation rules, and output handling. It adequately covers the tool's operational context despite the lack of structured metadata. However, it doesn't explain what format the flaky test information returns in or what specific data fields are available, which would be helpful given the absence of an output schema.

Complex tools with many parameters or behaviors need more documentation. Simple tools need less. This dimension scales expectations accordingly.

Parameters5/5

Does the description clarify parameter syntax, constraints, interactions, or defaults beyond what the schema provides?

With 0% schema description coverage (the schema provides only basic parameter names without meaningful descriptions), the description comprehensively compensates by explaining all four parameters in detail. It clarifies the three mutually exclusive usage patterns, provides specific format examples for each parameter, explains relationships between parameters (e.g., Option 3 requires BOTH workspaceRoot and gitRemoteURL), and gives practical guidance on parameter sourcing and validation.

Input schemas describe structure but not intent. Descriptions should explain non-obvious parameter relationships and valid value ranges.

Purpose4/5

Does the description clearly state what the tool does and how it differs from similar tools?

The description clearly states the tool 'retrieves information about flaky tests in a CircleCI project', providing a specific verb ('retrieves') and resource ('flaky tests'). It distinguishes from sibling tools like 'get_job_test_results' or 'get_build_failure_logs' by focusing specifically on flaky tests rather than general test results or failure logs. However, it doesn't explicitly contrast with these siblings in the description text.

Agents choose between tools based on descriptions. A clear purpose with a specific verb and resource helps agents select the right tool.

Usage Guidelines5/5

Does the description explain when to use this tool, when not to, or what alternatives exist?

The description provides extensive, explicit guidance on when and how to use this tool through the 'CRITICAL REQUIREMENTS' and 'Input options' sections. It specifies three mutually exclusive parameter options with clear conditions ('EXACTLY ONE of these THREE options must be used'), includes prerequisites ('If using Option 1, make sure to extract the projectSlug exactly as provided by listFollowedProjects'), and gives explicit fallback instructions ('If none of the options can be fully satisfied, ask the user for the missing information before making the tool call').

Agents often have multiple tools that could apply. Explicit usage guidance like "use X instead of Y when Z" prevents misuse.

Install Server

Other Tools

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/CircleCI-Public/mcp-server-circleci'

If you have feedback or need assistance with the MCP directory API, please join our Discord server