Skip to main content
Glama

Server Configuration

Describes the environment variables required to run the server.

NameRequiredDescriptionDefault
GAFFER_API_KEYYesYour Gaffer API Key (starts with gaf_)
GAFFER_API_URLNoAPI base URL (default: https://app.gaffer.sh)https://app.gaffer.sh

Capabilities

Features and capabilities supported by this server

CapabilityDetails
tools
{
  "listChanged": true
}

Tools

Functions exposed to the LLM to take actions

NameDescription
execute_codeA

Execute JavaScript code that calls Gaffer API functions via the codemode namespace.

Write async JavaScript — all functions are available as codemode.<function_name>(input). Use return to send results back. Use console.log() for debug output.

Available Functions

/** Get the health metrics for a project.

Returns:
- Health score (0-100): Overall project health based on pass rate and trend
- Pass rate: Percentage of tests passing
- Test run count: Number of test runs in the period
- Flaky test count: Number of tests with inconsistent results
- Trend: Whether test health is improving (up), declining (down), or stable

Use this to understand the current state of your test suite. */
get_project_health(input: { projectId?: string; days?: number }): Promise<any>

/** Get the pass/fail history for a specific test.

Search by either:
- testName: The exact name of the test (e.g., "should handle user login")
- filePath: The file path containing the test (e.g., "tests/auth.test.ts")

Returns:
- History of test runs showing pass/fail status over time
- Duration of each run
- Branch and commit information
- Error messages for failed runs
- Summary statistics (pass rate, total runs)

Use this to investigate flaky tests or understand test stability. */
get_test_history(input: { projectId?: string; testName?: string; filePath?: string; limit?: number }): Promise<any>

/** Get the list of flaky tests in a project.

A test is considered flaky if it frequently switches between pass and fail states.
Tests are ranked by a composite flakinessScore that factors in flip behavior,
failure rate, and duration variability.

Returns:
- List of flaky tests sorted by flakinessScore (most flaky first), with:
  - name: Test name
  - flipRate: How often the test flips between pass/fail (0-1)
  - flipCount: Number of status transitions
  - totalRuns: Total test executions analyzed
  - lastSeen: When the test last ran
  - flakinessScore: Composite score (0-1) combining flip proximity, failure rate, and duration variability
- Summary with threshold used and total count

Use this after get_project_health shows flaky tests exist, to identify which
specific tests are flaky and need investigation. */
get_flaky_tests(input: { projectId?: string; threshold?: number; limit?: number; days?: number }): Promise<any>

/** List recent test runs for a project with optional filtering.

Filter by:
- commitSha: Filter by commit SHA (supports prefix matching)
- branch: Filter by branch name
- status: Filter by "passed" (no failures) or "failed" (has failures)

Returns:
- List of test runs with:
  - id: Test run ID (can be used with get_test_run for details)
  - commitSha: Git commit SHA
  - branch: Git branch name
  - passedCount/failedCount/skippedCount: Test counts
  - createdAt: When the test run was created
- Pagination info (total count, hasMore flag)

Use cases:
- "What tests failed in commit abc123?"
- "Show me recent test runs on main branch"
- "What's the status of tests on my feature branch?" */
list_test_runs(input: { projectId?: string; commitSha?: string; branch?: string; status?: 'passed' | 'failed'; limit?: number }): Promise<any>

/** Get URLs for report files uploaded with a test run.

IMPORTANT: This tool returns download URLs, not file content. You must fetch the URLs separately.

Returns for each file:
- filename: The file name (e.g., "report.html", "results.json", "junit.xml")
- size: File size in bytes
- contentType: MIME type (e.g., "text/html", "application/json", "application/xml")
- downloadUrl: Presigned URL to download the file (valid for ~5 minutes)

How to use the returned URLs:

1. **JSON files** (results.json, coverage.json):
   Use WebFetch with the downloadUrl to retrieve and parse the JSON content.
   Example: WebFetch(url=downloadUrl, prompt="Extract test results from this JSON")

2. **XML files** (junit.xml, xunit.xml):
   Use WebFetch with the downloadUrl to retrieve and parse the XML content.
   Example: WebFetch(url=downloadUrl, prompt="Parse the test results from this JUnit XML")

3. **HTML reports** (Playwright, pytest-html, Vitest):
   These are typically bundled React/JavaScript applications that require a browser.
   They cannot be meaningfully parsed by WebFetch.
   For programmatic analysis, use get_test_run_details instead.

Recommendations:
- For analyzing test results programmatically: Use get_test_run_details (returns parsed test data)
- For JSON/XML files: Use this tool + WebFetch on the downloadUrl
- For HTML reports: Direct users to view in browser, or use get_test_run_details

Use cases:
- "What files are in this test run?" (list available reports)
- "Get the coverage data from this run" (then WebFetch the JSON URL)
- "Parse the JUnit XML results" (then WebFetch the XML URL) */
get_report(input: { testRunId: string }): Promise<any>

/** Get the slowest tests in a project, sorted by P95 duration.

Parameters:
- projectId (optional): Project ID — required for user API keys, auto-resolved for project tokens
- days (optional): Analysis period in days (default: 30, max: 365)
- limit (optional): Max tests to return (default: 20, max: 100)
- framework (optional): Filter by framework (e.g., "playwright", "vitest")
- branch (optional): Filter by git branch (e.g., "main", "develop")

Returns:
- List of slowest tests with:
  - name: Short test name
  - fullName: Full test name including describe blocks
  - filePath: Test file path (if available)
  - framework: Test framework used
  - avgDurationMs: Average test duration in milliseconds
  - p95DurationMs: 95th percentile duration (used for sorting)
  - runCount: Number of times the test ran in the period
- Summary with project info and period

Use cases:
- "Which tests are slowing down my CI pipeline?"
- "Find the slowest Playwright tests to optimize"
- "Show me e2e tests taking over 30 seconds"
- "What are the slowest tests on the main branch?" */
get_slowest_tests(input: { projectId?: string; days?: number; limit?: number; framework?: string; branch?: string }): Promise<any>

/** Get parsed test results for a specific test run.

Parameters:
- testRunId (required): The test run ID to get details for
- projectId (optional): Project ID — required for user API keys, auto-resolved for project tokens
- status (optional): Filter by test status: "passed", "failed", or "skipped"
- limit (optional): Max tests to return (default: 100, max: 500)
- offset (optional): Pagination offset (default: 0)

Returns:
- testRunId: The test run ID
- commitSha: Git commit SHA (null if not recorded)
- branch: Git branch name (null if not recorded)
- framework: Test framework (e.g., "playwright", "vitest")
- createdAt: When the test run was created (ISO 8601)
- summary: Overall counts (passed, failed, skipped, total)
- tests: Array of individual test results with:
  - name: Short test name
  - fullName: Full test name including describe blocks
  - status: Test status (passed, failed, skipped)
  - durationMs: Test duration in milliseconds (null if not recorded)
  - filePath: Test file path (null if not recorded)
  - error: Error message for failed tests (null otherwise)
  - errorStack: Full stack trace for failed tests (null otherwise)
- pagination: Pagination info (total, limit, offset, hasMore)

Use cases:
- "Show me all failed tests from this test run"
- "Get the test results from commit abc123"
- "List tests that took the longest in this run"
- "Find tests with errors in the auth module"

Note: For aggregate analytics like flaky test detection or duration trends,
use get_test_history, get_flaky_tests, or get_slowest_tests instead. */
get_test_run_details(input: { testRunId: string; projectId?: string; status?: 'passed' | 'failed' | 'skipped'; limit?: number; offset?: number }): Promise<any>

/** Group failed tests by root cause using error message similarity.

Parameters:
- projectId (optional): Project ID — required for user API keys, auto-resolved for project tokens
- testRunId (required): The test run ID to analyze

Returns:
- clusters: Array of failure clusters, each containing:
  - representativeError: The error message representing this cluster
  - count: Number of tests with this same root cause
  - tests: Array of individual failed tests in this cluster
    - name: Short test name
    - fullName: Full test name including describe blocks
    - errorMessage: The specific error message
    - filePath: Test file path (null if not recorded)
  - similarity: Similarity threshold used for clustering (0-1)
- totalFailures: Total number of failed tests across all clusters

Use cases:
- "Group these 15 failures by root cause" — often reveals 2-3 distinct bugs
- "Which error affects the most tests?" — fix the largest cluster first
- "Are these failures related?" — check if they land in the same cluster

Tip: Use get_test_run_details with status='failed' first to see raw failures,
then use this tool to understand which failures share the same root cause. */
get_failure_clusters(input: { projectId?: string; testRunId: string }): Promise<any>

/** Compare test metrics between two commits or test runs.

Useful for measuring the impact of code changes on test performance or reliability.

Parameters:
- projectId (optional): Project ID — required for user API keys, auto-resolved for project tokens
- testName (required): The test name to compare (short name or full name)
- Option 1 - Compare by commit:
  - beforeCommit: Commit SHA for "before" measurement
  - afterCommit: Commit SHA for "after" measurement
- Option 2 - Compare by test run:
  - beforeRunId: Test run ID for "before" measurement
  - afterRunId: Test run ID for "after" measurement

Returns:
- testName: The test that was compared
- before: Metrics from the before commit/run
  - testRunId, commit, branch, status, durationMs, createdAt
- after: Metrics from the after commit/run
  - testRunId, commit, branch, status, durationMs, createdAt
- change: Calculated changes
  - durationMs: Duration difference (negative = faster)
  - percentChange: Percentage change (negative = improvement)
  - statusChanged: Whether pass/fail status changed

Use cases:
- "Did my fix make this test faster?"
- "Compare test performance between these two commits"
- "Did this test start failing after my changes?"
- "Show me the before/after for the slow test I optimized"

Tip: Use get_test_history first to find the commit SHAs or test run IDs you want to compare. */
compare_test_metrics(input: { projectId?: string; testName: string; beforeCommit?: string; afterCommit?: string; beforeRunId?: string; afterRunId?: string }): Promise<any>

/** Get the coverage metrics summary for a project.

Returns:
- Current coverage percentages (lines, branches, functions)
- Trend direction (up, down, stable) and change amount
- Total number of coverage reports
- Latest report date
- Top 5 files with lowest coverage

Use this to understand your project's overall test coverage health.

After getting the summary, use get_coverage_for_file with path prefixes to drill into
specific areas (e.g., "server/services", "src/api", "lib/core"). This helps identify
high-value targets in critical code paths rather than just the files with lowest coverage. */
get_coverage_summary(input: { projectId?: string; days?: number }): Promise<any>

/** Get coverage metrics for a specific file or files matching a path pattern.

Parameters:
- projectId (optional): Project ID — required for user API keys, auto-resolved for project tokens
- filePath: File path to search for (exact or partial match)

Returns:
- Line coverage (covered/total/percentage)
- Branch coverage (covered/total/percentage)
- Function coverage (covered/total/percentage)

This is the preferred tool for targeted coverage analysis. Use path prefixes to focus on
specific areas of the codebase:
- "server/services" - Backend service layer
- "server/utils" - Backend utilities
- "src/api" - API routes
- "lib/core" - Core business logic

Before querying, explore the codebase to identify critical paths - entry points,
heavily-imported files, and code handling auth/payments/data mutations.
Prioritize: high utilization + low coverage = highest impact. */
get_coverage_for_file(input: { projectId?: string; filePath: string }): Promise<any>

/** Find areas of code that have both low coverage AND test failures.

This cross-references test failures with coverage data to identify high-risk
areas in your codebase that need attention. Files are ranked by a "risk score"
calculated as: (100 - coverage%) × failureCount.

Parameters:
- projectId (optional): Project ID — required for user API keys, auto-resolved for project tokens
- days: Analysis period for test failures (default: 30)
- coverageThreshold: Include files below this coverage % (default: 80)

Returns:
- List of risk areas sorted by risk score (highest risk first)
- Each area includes: file path, coverage %, failure count, risk score, test names

Use this to prioritize which parts of your codebase need better test coverage. */
find_uncovered_failure_areas(input: { projectId?: string; days?: number; coverageThreshold?: number }): Promise<any>

/** Get files with little or no test coverage.

Returns files sorted by coverage percentage (lowest first), filtered
to only include files below a coverage threshold.

Parameters:
- projectId (optional): Project ID — required for user API keys, auto-resolved for project tokens
- maxCoverage: Include files with coverage at or below this % (default: 10)
- limit: Maximum number of files to return (default: 20, max: 100)

Returns:
- List of files sorted by coverage (lowest first)
- Each file includes line/branch/function coverage metrics
- Total count of files matching the criteria

IMPORTANT: Results may be dominated by certain file types (e.g., UI components) that are
numerous but not necessarily the highest priority. For targeted analysis of specific code
areas (backend, services, utilities), use get_coverage_for_file with path prefixes instead.

To prioritize effectively, explore the codebase to understand which code is heavily utilized
(entry points, frequently-imported files, critical business logic) and then query coverage
for those specific paths. */
get_untested_files(input: { projectId?: string; maxCoverage?: number; limit?: number }): Promise<any>

/** Get a browser-navigable URL for viewing a test report (Playwright, Vitest, etc.).

Returns a signed URL that can be opened directly in a browser without requiring
the user to log in. The URL expires after 30 minutes for security.

Parameters:
- projectId (optional): Project ID — required for user API keys, auto-resolved for project tokens
- testRunId: The test run to view (required)
- filename: Specific file to open (optional, defaults to index.html)

Returns:
- url: Browser-navigable URL with signed token
- filename: The file being accessed
- expiresAt: ISO timestamp when the URL expires
- expiresInSeconds: Time until expiration

The returned URL can be shared with users who need to view the report.
Note: URLs expire after 30 minutes for security. */
get_report_browser_url(input: { projectId?: string; testRunId: string; filename?: string }): Promise<any>

/** Check if CI results have been uploaded and processed.

Use this tool to answer "are my test results ready?" after pushing code.

Parameters:
- projectId (optional): Project ID — required for user API keys, auto-resolved for project tokens
- sessionId (optional): Specific upload session ID for detailed status
- commitSha (optional): Filter by commit SHA to find uploads for a specific commit
- branch (optional): Filter by branch name

Behavior:
- If sessionId is provided: returns detailed status with linked test runs and coverage reports
- Otherwise: returns a list of recent upload sessions (filtered by commitSha/branch if provided)

Processing statuses:
- "pending" — upload received, processing not started
- "processing" — files are being parsed
- "completed" — all files processed successfully, results are ready
- "error" — some files failed to process

Workflow:
1. After pushing code, call with commitSha to find the upload session
2. Check processingStatus — if "completed", results are ready
3. If "processing" or "pending", wait and check again
4. Once completed, use the linked testRunIds with get_test_run_details

Returns (list mode):
- sessions: Array of upload sessions with processing status
- pagination: Pagination info

Returns (detail mode):
- session: Upload session details
- testRuns: Linked test run summaries (id, framework, pass/fail counts)
- coverageReports: Linked coverage report summaries (id, format) */
get_upload_status(input: { projectId?: string; sessionId?: string; commitSha?: string; branch?: string }): Promise<any>

/** Search across test failures by error message, stack trace, or test name.

Use this to find specific failures across test runs — like grep for your test history.

Examples:
- "TypeError: Cannot read properties of undefined" → find all occurrences of this error
- "timeout" → find timeout-related failures
- "auth" with searchIn="names" → find failing auth tests

Returns matching failures with test run context (branch, commit, timestamp) for investigation. */
search_failures(input: { projectId?: string; query: string; searchIn?: 'errors' | 'names' | 'all'; days?: number; branch?: string; limit?: number }): Promise<any>

Examples

// Single call
const health = await codemode.get_project_health({ projectId: "proj_abc" });
return health;
// Multi-step: get flaky tests and check history for each
const flaky = await codemode.get_flaky_tests({ projectId: "proj_abc", limit: 5 });
const histories = [];
for (const test of flaky.flakyTests) {
  const history = await codemode.get_test_history({ projectId: "proj_abc", testName: test.name, limit: 5 });
  histories.push({ test: test.name, score: test.flakinessScore, history: history.summary });
}
return { flaky: flaky.summary, details: histories };
// Coverage analysis
const summary = await codemode.get_coverage_summary({ projectId: "proj_abc" });
const lowFiles = await codemode.get_coverage_for_file({ projectId: "proj_abc", maxCoverage: 50, limit: 10 });
return { summary, lowCoverageFiles: lowFiles };

Constraints

  • Max 20 API calls per execution

  • 30s timeout

  • No access to Node.js globals (process, require, etc.)

search_toolsA

Search for available Gaffer API functions by keyword.

Returns matching functions with their TypeScript declarations so you can use them with execute_code.

Examples:

  • "coverage" → coverage-related functions

  • "flaky" → flaky test detection

  • "" (empty) → list all available functions

list_projectsA

List all projects you have access to.

Returns a list of projects with their IDs, names, and organization info. Use this to find project IDs for other tools like get_project_health.

Requires a user API Key (gaf_). Get one from Account Settings in the Gaffer dashboard.

Prompts

Interactive templates invoked by user choice

NameDescription

No prompts

Resources

Contextual data attached and managed by the client

NameDescription

No resources

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gaffer-sh/mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server