What can you do with this server?

Chaos-MCP is an MCP server that runs on-demand, sandbox-isolated mutation testing on source files to identify gaps in unit test coverage. Core Tools * audit_code_resilience — Mutation test a single file to find surviving mutants (untested logic) and no-coverage mutants. Returns enriched survivors with severity, explanation, test-writing hints, and source context. * triage_test_coverage — Batch-audit multiple files/directories and get a weakest-first ranked leaderboard of mutation scores. Optionally inline top survivors per file. * estimate_audit — Cheap pre-flight estimate of mutant count and optional timing without running a full mutation cycle. Language Support * TypeScript/JavaScript (StrykerJS), Python (cosmic-ray), Rust (cargo-mutants), PHP (Infection) Key Capabilities * Sandbox isolation — All runs execute in temporary directories; your real workspace is never modified. * Verify loop — Use runId or baseline to re-test previously surviving mutants after fixing tests. * Git-diff scoping — Use diffBase to limit testing to only lines/files changed in a branch or commit (PR context). * Gate mode — Pass minScore for a machine-readable gate.passed field to drive CI pass/fail decisions. * Equivalent mutant suppression — Mark unkillable mutants as equivalent to exclude them from scores and future output. * Survivor filtering — Cap results with maxSurvivors and filter by severityFloor. * Concurrency control — Configure parallel mutation workers and parallel file auditing. * Progress notifications & cancellation — Emits MCP progress events; in-flight runs can be cleanly aborted. * Incremental runs & dry-run mode — Reuse prior results to speed up repeat audits; validate test suite before mutation (StrykerJS only). * MCP resources — Exposes chaos://languages, chaos://config-schema, and chaos://capabilities as static readable resources. * MCP prompts — Provides harden_file and triage_changes prompts to guide agents through the audit → fix → verify workflow.

Which integrations are available for this server?

Provides on-demand, sandbox-isolated mutation testing for TypeScript/JavaScript projects using StrykerJS, enabling AI agents to identify gaps in unit test coverage by injecting logical faults and checking whether tests catch them.

How do I use Chaos-MCP?

1. Click on "Install Server". 2. Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state. 3. In the chat, type @ followed by the MCP server name and your instructions, e.g., "@Chaos-MCP audit src/utils.ts for test resilience" That's it! The server will respond to your query, and you can continue using it as needed. Here is a step-by-step guide with screenshots.

Chaos-MCP

by AraneaDev

Overview Schema Related Servers Score Discussions

TypeScript

Local

Chaos-MCP

On-demand micro-mutation sandbox for AI test verification — maps holes in unit tests by running isolated mutation testing via the Model Context Protocol.

Release MCP Observatory risk grade License: MIT tests coverage status: in development

Pre-release / in active development. Chaos-MCP is not yet published to npm. The source is public on GitHub — install from source (see Installation). Any npm install -g / npx commands in this README describe the planned published experience and do not work yet.

Chaos-MCP is an MCP (Model Context Protocol) server that exposes three tools — audit_code_resilience (audit a single file), triage_test_coverage (rank a whole tree weakest-first), and estimate_audit (cheap pre-flight mutant count / timing estimate) — which run isolated mutation testing against your source to find weaknesses in the local test suite. It intentionally injects logical faults (like changing > to >=) and checks whether your tests catch them. Surviving mutants indicate test coverage holes.

Features

4 Languages Supported — TypeScript/JavaScript (StrykerJS), Python (cosmic-ray), Rust (cargo-mutants), PHP (Infection)
Sandbox Isolation — all mutation runs execute in temporary directories; your real workspace is never touched
Auto-Detection — automatically detects project type, test runner, and workspace root
Async Subprocesses — all mutation-tool execution uses async execFile/exec (subprocess runs never block the event loop; the one-time sandbox copy is synchronous)
Rich Tool Schema — supports line scoping, mutator denylists, concurrency control, dry-run mode, incremental runs, and output format selection
Pre-flight Estimation — estimate_audit gives a fast mutant count (exact for Rust, approximate for others) and optional timing estimate before you commit to a full run
Gate Mode — pass minScore to audit_code_resilience or triage_test_coverage to get a machine-readable pass/fail field for CI pipelines
Cross-Platform — works on macOS, Linux, and Windows (with junction fallback for symlinks)

Related MCP server: Veris

Installation

While in development, the only supported install path is from source — clone the repo, build, and register the built entrypoint with your MCP client.

git clone https://github.com/AraneaDev/Chaos-MCP.git
cd Chaos-MCP
npm install
npm run build      # compiles to build/index.js

claude mcp add chaos-mcp -- node /absolute/path/to/ChaosMCP/build/index.js

Planned (not available yet): once published, install will be npm install -g chaos-mcp or run on demand via npx chaos-mcp. These do not work until the package ships to npm.

Prerequisites — language mutation tools

Chaos-MCP does not bundle the per-language mutation engines or install them for you; it shells out to whichever tool matches the file you audit. Install only the one(s) for the languages you intend to audit. If a tool is missing, the audit returns a clear error naming the exact install command — it never fails silently.

Language	Engine	Install
TypeScript / JavaScript	StrykerJS	`npm install --save-dev @stryker-mutator/core` (in the target project) — note: StrykerJS 9.x's vitest-runner is not compatible with vitest 3.x's dropped `--related` API. If the target uses vitest 3.x, downgrade it to `vitest@^2.1.x` for the audit, or wait for StrykerJS 10.x.
Python	cosmic-ray	`pipx install cosmic-ray` — or `pip install cosmic-ray` inside a virtualenv
Rust	cargo-mutants	`cargo install cargo-mutants`
PHP	Infection	`composer require --dev infection/infection` — also enable a coverage driver (Xdebug or PCOV)

Notes:

The tool itself must be on PATH (or, for StrykerJS, resolvable from the target project's node_modules), and the language toolchain it builds on must already be present — Node.js for StrykerJS, a Python interpreter for cosmic-ray, a Rust/Cargo toolchain for cargo-mutants, and PHP + Composer with a coverage driver (Xdebug or PCOV) for Infection.
Python / cosmic-ray: on modern distros a bare pip install cosmic-ray is blocked by PEP 668 ("externally-managed-environment"); use pipx install cosmic-ray (isolated) or install inside an activated virtualenv. Chaos-MCP generates cosmic-ray's config.toml for you (scoped to the target file) and runs baseline → init → exec → dump in the sandbox — no per-project config needed. cosmic-ray runs its full operator set (no per-file line-scoping), so auditing a large file is slow. Two cosmicray config knobs keep big audits tractable: testSelection scopes the per-mutant test run (e.g. ["tests/unit/test_x.py"] or ["-m","unit"]), and excludeOperators (regexes, applied via cr-filter-operators) bounds the mutant count by skipping whole operator families — e.g. ["core/NumberReplacer", "core/ReplaceBinaryOperator.*"] drops ~half the mutants on an arithmetic-heavy file. Excluded mutants are omitted from the score (a scoped audit).
These engines run inside the sandbox against a copy of your workspace; Chaos-MCP never installs or modifies anything in your real project.

Quick Start

1. Start the Server

Normally your MCP client launches the server for you (see Installation). To run it directly from a source checkout:

# From the repo root, after `npm run build`
npm start                                  # → node build/index.js
node build/index.js --verbose              # diagnostic logging to stderr
node build/index.js --config ./chaos-mcp.config.json

2. Call the Tool from Your MCP Client

The primary tool is audit_code_resilience (the batch tool triage_test_coverage is documented below; the lightweight pre-flight tool estimate_audit is documented below).

Minimal example:

{
  "filePath": "src/utils/math.ts"
}

Full example with all options:

{
  "filePath": "src/utils/math.ts",
  "timeoutMs": 120000,
  "lineScope": { "start": 10, "end": 80 },
  "mutatorDenylist": ["StringLiteral"],
  "concurrency": 4,
  "incremental": true,
  "ignorePatterns": ["fixtures/", "snapshots/"],
  "outputFormat": "text",
  "enrich": false,
  "maxSurvivors": 20,
  "severityFloor": "medium"
}

Get enriched, severity-ranked guidance on survivors (on by default):

Enrichment is enabled by default. Each surviving / no-coverage line is augmented with four fields: a severity rating (high, medium, or low) based on the mutator's semantics (e.g. boundary operators and logical operators rank high), a why explanation of why the gap is dangerous, a hint describing the kind of test that would kill it, and a context snippet of the surrounding source lines. Survivors are re-ranked severity-first so the most critical gaps appear first. To disable enrichment and return the plain unranked output, pass "enrich": false.

TypeScript targets produce the richest output because StrykerJS exposes per-mutant operator detail; Python (cosmic-ray) targets also produce severity-ranked output, mapping the tool's authoritative operator name to a canonical category; targets whose tool can't expose a per-mutant operator fall back to severity: "unknown" with a generic why/hint.

Cap and filter the survivor list:

{
  "filePath": "src/utils/math.ts",
  "maxSurvivors": 5,
  "severityFloor": "high"
}

maxSurvivors caps how many survivor (and no-coverage) line groups are returned after severity ranking (default: 10; configurable via defaultMaxSurvivors). Hidden groups are counted in survivorsTruncated / noCoverageTruncated in the output. severityFloor drops groups below the given severity level (requires enrichment, which is on by default); dropped groups are counted in survivorsFiltered / noCoverageFiltered.

Scope to just your uncommitted changes:

{
  "filePath": "src/utils/math.ts",
  "diffBase": "HEAD"
}

Mutation-tests only the lines you've changed since the last commit.

Verify your new tests killed the previous survivors:

{
  "filePath": "src/utils/math.ts",
  "baseline": { "survivors": [{ "line": 42, "mutators": { "ConditionalExpression": 1 } }] }
}

Re-runs only the baseline lines and reports which previously-uncaught mutants are now killed:

{ "mode": "verify", "baselineTotal": 1, "killedCount": 1,
  "nowKilled": [{ "line": 42, "mutator": "ConditionalExpression" }],
  "stillSurviving": [], "newSurvivors": [] }

3. Interpret the Results

The output is bundled and deduplicated to stay token-efficient: mutants are grouped by line (with a per-line count of each mutator type), survivors (tests ran but didn't catch) and noCoverage (no test reached the mutant) are reported separately at line+mutator granularity, and the explanatory note appears once instead of being repeated for every mutant. Because the split is per-mutator, the same line can appear in both lists (e.g. a live expression that survived next to an unreachable fallback that no test reached). Survivors and no-coverage entries also include a changes sample — a capped, deduped list of original → mutated edits — for TypeScript and Rust targets (best-effort; absent for Python, which doesn't expose per-mutant detail). When diffBase is used, the output may include a scopeNote (a top-level JSON field / a Scope: text line) reporting scoping decisions — e.g. a skipped run when nothing changed, or a whole-file fallback for Python/Rust targets.

JSON output (default — emitted as a single compact line):

{
  "target": "src/utils/math.ts",
  "mutationScore": "91.67%",
  "summary": { "total": 12, "killed": 11, "survived": 1, "worstSeverity": "high" },
  "survivors": [
    {
      "line": 42, "mutators": { "ConditionalExpression": 1 }, "changes": ["a > b → a >= b"],
      "severity": "high",
      "why": "a branch condition was forced to a constant; a test passed without exercising both arms.",
      "hint": "add tests that take BOTH the true and the false branch.",
      "context": ["41: if (a > b) {", "42:   return a;", "43: }"]
    }
  ],
  "noCoverage": [],
  "suggestedTestFile": { "path": "src/utils/__tests__/math.test.ts", "exists": false },
  "note": "survivors: mutants your tests ran but did not kill. noCoverage: mutants no test reached (per line+mutator, so a line may appear here and in survivors). mutators = type→count. Add or strengthen tests targeting these. changes = sampled original→mutated edits for that line (capped)."
}

The tool response also carries a structuredContent field (in addition to the standard text content block) so MCP clients that support it can consume the data directly without parsing JSON from text. The text block is retained for compatibility with clients that read content[0].text.

suggestedTestFile is included when there are survivors or no-coverage entries (i.e. when the mutation score is below 100%), pointing to the conventional test file path for the audited source file (e.g. src/utils/__tests__/math.test.ts for src/utils/math.ts). The exists flag indicates whether the file already exists on disk.

Text output ("outputFormat": "text"):

Chaos-MCP Audit Report: src/utils/math.ts
Mutation score: 91.67% (11/12 killed, 1 survived)
Survivors (line: mutators):
  42: ConditionalExpression  (a > b → a >= b)
Add or strengthen tests targeting these lines to kill the survivors.

Tool Parameters

Parameter	Type	Required	Description
`filePath`	`string`	Yes	Workspace-relative path to the file (`.ts`, `.js`, `.tsx`, `.jsx`, `.py`, `.go`, `.rs`)
`timeoutMs`	`number`	No	Max run time in ms (default: 300000 / 5 min)
`lineScope`	`{ start, end }`	No	1-based line range (StrykerJS only)
`diffBase`	`string`	No	Auto-scope mutation to git-changed lines. `"HEAD"` (uncommitted), `"staged"`, or a git ref (e.g. `"main"`, via merge-base). Mutually exclusive with `lineScope`. Line-level scoping is StrykerJS-only; other languages run whole-file with a note. No changes vs base → run skipped.
`baseline`	`object`	No	Verify mode. Pass back a prior run's `{ survivors, noCoverage }` to re-test only those mutants and get a delta (`nowKilled` / `stillSurviving` / `newSurvivors`). Re-run auto-scopes to the baseline lines (StrykerJS) or whole-file (other languages). Mutually exclusive with `diffBase`/`lineScope`. Verify mode keys on line numbers, so run it after adding tests — not after editing the source under test, since edits shift line numbers and would misreport which mutants were killed.
`mutatorAllowlist`	`string[]`	No	Not supported in StrykerJS v9 — ignored (use `mutatorDenylist`)
`mutatorDenylist`	`string[]`	No	Stryker mutator names to exclude
`concurrency`	`number`	No	Parallel mutation workers (StrykerJS only)
`dryRun`	`boolean`	No	Validate test suite only, no mutations (StrykerJS only)
`outputFormat`	`"json"` \| `"text"`	No	Output format (default: `"json"`)
`incremental`	`boolean`	No	Reuse previous run results (StrykerJS only)
`ignorePatterns`	`string[]`	No	Substring patterns to exclude from sandbox copy
`enrich`	`boolean`	No	Annotate each survivor with severity, why-it-matters, a test hint, and source context — and rank severity-first. Default: `true` (pass `false` to disable and return plain unranked output). Richest for TypeScript; Python degrades to `severity: "unknown"`.
`maxSurvivors`	`integer ≥ 1`	No	Cap on how many survivor (and no-coverage) line groups are returned after severity ranking. Hidden groups counted in `survivorsTruncated`/`noCoverageTruncated`. Precedence: arg > `defaultMaxSurvivors` config > 10.
`severityFloor`	`"high"` \| `"medium"` \| `"low"`	No	Drop survivor groups below this severity (requires enrichment, on by default). Dropped groups counted in `survivorsFiltered`/`noCoverageFiltered`. `"unknown"`-severity groups are below `"low"` and are dropped by any floor.
`runId`	`string`	No	Verify mode by cached id: re-run against the survivor baseline saved from a prior audit (the `runId` it returned). Mutually exclusive with `baseline`, `diffBase`, and `lineScope`. Unknown or expired ids (cache TTL: ~24 h) return an error.
`suppress`	`object[]`	No	Mark mutants as equivalent (unkillable). Each entry: `{ "line": N, "mutator": "MutatorName" }` (reason is an optional string explaining why the mutant is equivalent). Persisted to `.chaos-mcp/suppressions.json`; suppressed mutants are auto-excluded from the score denominator and from future `audit` and `triage` output. The output field `suppressedCount` reports how many were excluded.
`unsuppress`	`object[]`	No	Remove previously-suppressed mutants for this file. Each entry: `{ "line": N, "mutator": "MutatorName" }`.
`minScore`	`number 0–100`	No	Gate threshold. When the mutation score is below this value, the output includes `gate: { minScore, passed: false }`. Never an error. Uses the suppression-adjusted score.

See CONTRIBUTING.md for development setup and the full parameter semantics.

State & the verify loop

Verify loop via `runId`

Every successful, non-verify audit_code_resilience call returns a runId (an 8-character id) in its JSON output. Use it to re-verify without copying the full baseline object:

Audit: { "filePath": "src/utils/math.ts" } → response includes "runId": "a1b2c3d4".
Fix or add tests.
Verify: { "filePath": "src/utils/math.ts", "runId": "a1b2c3d4" } → reports which previously-surviving mutants are now killed.

runId is mutually exclusive with baseline, diffBase, and lineScope. The baseline cache lives in os.tmpdir()/chaos-mcp-runs/ and is ephemeral (default TTL: 24 h; default max: 200 entries). Passing an unknown or expired runId returns an error.

triage_test_coverage also mints and returns a runId per ranking row, so you can drill into a weak file and immediately verify after fixing its tests.

Suppressing equivalent mutants

Some mutants are equivalent — logically identical to the original under all possible inputs — and cannot be killed by any test. Suppress them so they stop appearing in the output and stop dragging down the score:

{
  "filePath": "src/utils/math.ts",
  "suppress": [{ "line": 99, "mutator": "StringLiteral", "reason": "guard always true for this type" }]
}

Suppressed mutants are:

Persisted to <workspaceRoot>/.chaos-mcp/suppressions.json (keyed by workspace-relative file path).
Auto-excluded from every future audit and triage call for that file — no flag needed.
Removed from the score denominator — mutationScore rises and the output field suppressedCount tells you how many were excluded.
Excluded from verify mode — suppressed mutants won't appear as "still surviving".

To undo a wrong suppression:

{
  "filePath": "src/utils/math.ts",
  "unsuppress": [{ "line": 99, "mutator": "StringLiteral" }]
}

.gitignore or commit? Add .chaos-mcp/ to .gitignore if the suppression list is personal, or commit it to share the equivalent-mutant list with the team. Suppression keys are workspace-relative, so the file is portable across machines.

Staleness caveat: entries are keyed by file + line + mutator. Edits that shift line numbers can stale an entry. Each entry records an optional reason and an addedAt timestamp so you can audit and prune the list over time.

Config keys for state

Key	Default	Description
`suppressionsPath`	`.chaos-mcp/suppressions.json`	Path to the suppression file (workspace-relative or absolute)
`runCacheTtlMs`	`86400000` (24 h)	Run-cache entry TTL in milliseconds
`runCacheMax`	`200`	Max cached run entries; oldest are evicted when exceeded

Batch Triage — `triage_test_coverage`

A second tool ranks where your test suite is weakest across many files in one call.

{ "paths": ["src/utils", "src/index.ts"], "maxFiles": 25 }

Directories are recursively expanded to supported source files (test files skipped), audited in bounded parallel (default max(1, min(4, cpus-1)) files at a time; capped at maxFiles; precedence maxFiles arg → defaultMaxFiles config → 25), and ranked weakest-first by mutation score:

{ "mode": "triage",
  "summary": { "filesDiscovered": 30, "filesAudited": 25, "filesSkipped": 5, "filesErrored": 0 },
  "ranking": [ { "file": "src/a.ts", "mutationScore": "62.50%", "total": 16, "killed": 10, "survived": 5, "noCoverage": 1 } ],
  "errors": [],
  "note": "Ranked weakest-first by mutation score. Drill into a file with audit_code_resilience for survivor detail." }

The tool response carries a structuredContent field (in addition to the text block) so MCP clients can consume the ranked payload directly without parsing JSON. The outputSchema on the tool definition describes the payload shape.

Drill into a weak file with audit_code_resilience for per-mutant survivor detail.

PR-diff scan — diffBase:

Pass diffBase to limit the triage to files changed in a PR or branch. paths becomes optional in this mode:

{ "diffBase": "main" }

diffBase alone audits every changed supported source file in the workspace (relative to main via merge-base). Passing both limits the scan to changed files under those paths:

{ "diffBase": "main", "paths": ["src/utils"] }

TypeScript files are mutated only on the changed lines; Python and Rust files run whole-file (a per-file scopeNote is included in the ranking row).

Inline survivor detail — survivorsPerFile:

{ "paths": ["src"], "survivorsPerFile": 3 }

survivorsPerFile (default 0, scores-only) inlines the top-N severity-ranked, enriched survivor groups into each ranking row so you can triage and inspect in one call. Set it to 0 for the compact leaderboard; raise it when you want to see the worst gaps immediately.

Parallel file auditing — fileConcurrency:

{ "paths": ["src"], "fileConcurrency": 8 }

fileConcurrency controls how many files are audited in parallel (default max(1, min(4, cpus-1)); range 1–64). When fileConcurrency > 1 and the file is TypeScript, each StrykerJS run's worker count is automatically capped (floor((cpus-1) / fileConcurrency)) so total CPU use stays near the core count rather than oversubscribing. Other languages run their mutation tool without a worker-count override (they ignore the concurrency cap).

Parameters:

Parameter	Type	Description
`paths`	`string[]`	Workspace-relative files/dirs to triage. Optional when `diffBase` is provided.
`maxFiles`	`integer ≥ 1`	Cap on files audited (precedence: arg → `defaultMaxFiles` config → 25).
`timeoutMs`	`number`	Per-file mutation-run timeout in ms (default: 300000).
`mutatorDenylist`	`string[]`	Stryker mutator names to exclude, applied to every TypeScript/JS file.
`outputFormat`	`"json"` \| `"text"`	Output format (default: `"json"`).
`diffBase`	`string`	Auto-scope to git-changed files. `"HEAD"`, `"staged"`, or any git ref/SHA. Makes `paths` optional; with `paths`, intersects changed files under those paths. TypeScript: changed lines only. Other languages: whole-file.
`survivorsPerFile`	`integer ≥ 0`	Inline top-N enriched survivors per ranked file (default `0` = scores-only).
`fileConcurrency`	`integer 1–64`	Files audited in parallel (default `max(1, min(4, cpus-1))`). Per-file StrykerJS worker count is automatically capped (TypeScript/StrykerJS only; other engines ignore the worker-count cap).
`minScore`	`number 0–100`	Gate threshold. Per-row `passed` field + top-level `gate: { minScore, passed, failingFiles }` in output. Never an error.

Pre-flight Estimate — `estimate_audit`

Before committing to a full mutation run, use estimate_audit to check how many mutants a file will produce and (optionally) how long the run will take. It never runs the mutation test cycle by default.

{ "filePath": "src/utils/math.ts" }

Output:

{
  "target": "src/utils/math.ts",
  "language": "typescript",
  "mutants": 47,
  "fidelity": "approx",
  "basis": "source heuristic: 23 constructs",
  "note": "Approximate mutant count from a source-parse heuristic; the real audit may differ. Run audit_code_resilience for exact results."
}

With timing (withTiming: true): runs the test suite once to measure a baseline, then estimates total wall-clock time as mutants × baseline / concurrency. This provisions a sandbox and counts against your machine's resources — use it when you want a time budget before a large audit.

{ "filePath": "src/utils/math.ts", "withTiming": true }

Additional output fields when withTiming: true:

{
  "baselineMs": 4200,
  "estimatedMs": 197400,
  "concurrency": 1
}

Fidelity

Language	Fidelity	Basis
Rust	`exact`	`cargo-mutants --list` (no tests run)
TypeScript / JavaScript	`approx`	source-parse heuristic
Python	`approx`	source-parse heuristic

For Rust, the estimate is exact because cargo mutants --list enumerates every planned mutant without running tests. For all other languages the count is approximate — a lightweight heuristic over the source AST; the actual audit may differ. Run audit_code_resilience for exact results.

If cargo-mutants is not installed, the Rust path falls back to the heuristic and reports fidelity: "approx" with a note.

Parameters

Parameter	Type	Required	Description
`filePath`	`string`	Yes	Workspace-relative path to the file to estimate.
`withTiming`	`boolean`	No	When `true`, runs the test suite once to measure `baselineMs` and computes `estimatedMs`. Default: `false`.

Use case

Call estimate_audit first when you are unsure whether a file is too large to audit interactively:

estimate_audit { "filePath": "src/big.ts" } → 300 mutants, approx.
Consider scoping with lineScope or diffBase, or scheduling the full run with a longer timeoutMs.
audit_code_resilience { "filePath": "src/big.ts", "diffBase": "HEAD" } → audits only your changed lines.

Gate Mode — `minScore`

Both audit_code_resilience and triage_test_coverage accept a minScore parameter (0–100). When the mutation score falls below the threshold, the result reports the gate as failed. A failing gate is never an error — it is a data field for an agent or CI pipeline to read and act on.

Gate on a single file

{ "filePath": "src/utils/math.ts", "minScore": 80 }

If the mutation score is below 80, the output includes:

{ "gate": { "minScore": 80, "passed": false } }

If the score meets or exceeds the threshold, gate.passed is true. The field is absent when minScore is not provided.

The gate uses the suppression-adjusted mutation score (i.e. equivalent mutants excluded via suppress are not counted against the denominator).

Gate on a triage run

{ "paths": ["src"], "minScore": 75 }

Each ranking row gains a passed field. The top-level output includes:

{
  "gate": {
    "minScore": 75,
    "passed": false,
    "failingFiles": ["src/utils/math.ts", "src/parser.ts"]
  }
}

gate.passed is false if any file's score is below minScore. failingFiles lists the workspace-relative paths that did not pass. Files that errored during triage are reported in errors[] and do not affect the gate.

CI use case

# Fail CI if any audited file scores below 80%
mcp call triage_test_coverage '{"paths":["src"],"minScore":80}' \
  | jq -e '.gate.passed'

An agent or CI script reads gate.passed and decides whether to block the build, open an issue, or continue. The tool call itself always succeeds (never isError) regardless of the gate outcome.

Configuration

Create a chaos-mcp.config.json in your workspace root for default settings:

{
  "defaultTimeoutMs": 300000,
  "mutatorDenylist": ["StringLiteral"],
  "concurrency": 4,
  "defaultMaxFiles": 25,
  "defaultMaxSurvivors": 10,
  "defaultSeverityFloor": "medium",
  "defaultFileConcurrency": 4
}

Tool call arguments override config defaults.

Config key	Type	Default	Description
`defaultTimeoutMs`	`number`	`300000`	Per-file timeout in ms
`mutatorDenylist`	`string[]`	`[]`	Mutator names to exclude globally
`concurrency`	`number`	`4`	Parallel mutation workers
`defaultMaxFiles`	`number`	`25`	Default triage file cap (integer ≥ 1); overridden by the `maxFiles` argument
`defaultMaxSurvivors`	`number`	`10`	Default cap on survivor/no-coverage groups returned by `audit_code_resilience` (integer ≥ 1); overridden by the `maxSurvivors` argument
`defaultSeverityFloor`	`"high"` \| `"medium"` \| `"low"`	—	Default severity floor for survivor reporting; overridden by the `severityFloor` argument
`defaultFileConcurrency`	`number`	`max(1, min(4, cpus-1))`	Default parallel file count for `triage_test_coverage` (integer 1–64); overridden by the `fileConcurrency` argument

Enabling `prebuildCommand`

The prebuildCommand tool argument runs an arbitrary shell command inside the sandbox, which can reach outside it. It is disabled by default. Enable it explicitly with "allowPrebuild": true in chaos-mcp.config.json, or by setting the CHAOS_MCP_ALLOW_PREBUILD=1 environment variable. The auto-detected prebuild for Rust (cargo check) runs without this flag.

Supported Test Runners (Auto-Detected)

Language	Mutation Tool	Detected Runners
TypeScript/JS	StrykerJS	vitest, jest, mocha, jasmine, bun, node:test
Python	cosmic-ray	pytest, unittest
Rust	cargo-mutants	cargo test, cargo-nextest
PHP	Infection	phpunit

CLI Flags

chaos-mcp [flags]

  --version   Print version and exit
  --help      Show help text and exit
  --config    Path to a JSON config file
  --verbose   Enable diagnostic logging to stderr

Protocol features

Progress notifications

When an MCP client includes a progressToken in a tool call's _meta field, Chaos-MCP emits notifications/progress events during the run. Clients that omit progressToken receive no notifications — there is zero overhead for clients that do not opt in.

Triage emits one notification per file as it completes:

Field	Value
`progress`	files completed so far
`total`	total files to audit
`message`	`"audited X/N"`

Audit emits four coarse milestones:

`progress`	`total`	`message`
1	4	`"validating"`
2	4	`"provisioning sandbox"`
3	4	`"running mutation engine"`
4	4	`"complete"`

Estimate does not emit progress notifications.

Cancellation

Cancelling an in-flight MCP request aborts the run cleanly:

The abort signal propagates through the tool handler into RunOptions.signal and from there into the mutation engine subprocess, terminating it.
The sandbox is always cleaned up even if cancellation occurs mid-run.
The cancelled call returns "Operation cancelled." as a tool error rather than throwing.

All three tools (audit_code_resilience, triage_test_coverage, estimate_audit) respect cancellation.

Resources

The server exposes three static resources, discoverable via resources/list and readable via resources/read:

URI	MIME type	Contents
`chaos://languages`	`application/json`	Per-language entry: engine name, `supportsLineScope`, estimate fidelity (`"exact"` or `"approx"`), config key, and whether an auto-prebuild runs.
`chaos://config-schema`	`application/json`	Every `chaos-mcp.config.json` key with its type and a short description.
`chaos://capabilities`	`text/markdown`	All three tools (args summary) and the triage → audit → verify workflow loop.

Prompts

The server exposes two prompts, discoverable via prompts/list and retrieved via prompts/get:

Prompt	Required argument	Purpose
`harden_file`	`filePath`	Returns a `user`-role message walking an agent through: optional estimate → audit → write tests for survivors → verify by `runId` → repeat until clean.
`triage_changes`	`diffBase`	Returns a `user`-role message walking an agent through: triage changed files weakest-first → harden the weakest → move down the ranking until the score bar is met.

Development

npm run check         # Full CI pipeline: build + lint + format + test
npm run test:watch    # Watch mode for iterative development
npm run test:coverage # Tests with coverage report

The suite runs on every push/PR to main via CI (Node 22/24). v8 line/statement coverage of src/ sits at ~99%, and the source is additionally hardened by running Chaos-MCP against its own code — so the suite is graded by mutation score, not just line coverage.

See CONTRIBUTING.md for detailed development setup and contribution guidelines.

License

MIT — See LICENSE for details.

Latest Blog Posts

Your AI Chatbot Just Exposed Your CEO's Salary to an Intern
By Om-Shree-0709 on July 2, 2026.
Agent Identity
MCP Security
OAuth Delegation
Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)
By Om-Shree-0709 on June 30, 2026.
Agentic Ai
Prompt Injection
WebAssembly
Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
OpenAI
open source

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/AraneaDev/Chaos-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

Chaos-MCP

Features

Installation

Prerequisites — language mutation tools

Quick Start

1. Start the Server

2. Call the Tool from Your MCP Client

3. Interpret the Results

Tool Parameters

State & the verify loop

Verify loop via runId

Suppressing equivalent mutants

Config keys for state

Batch Triage — triage_test_coverage

Pre-flight Estimate — estimate_audit

Fidelity

Parameters

Use case

Gate Mode — minScore

Gate on a single file

Gate on a triage run

CI use case

Configuration

Enabling prebuildCommand

Supported Test Runners (Auto-Detected)

CLI Flags

Protocol features

Progress notifications

Cancellation

Resources

Prompts

Development

License

Links

Maintenance

Resources

Looking for Admin?

Tools

Latest Blog Posts

MCP directory API

Verify loop via `runId`

Batch Triage — `triage_test_coverage`

Pre-flight Estimate — `estimate_audit`

Gate Mode — `minScore`

Enabling `prebuildCommand`