Rutherford MCP Server
The Rutherford MCP Server lets you orchestrate multiple AI coding CLI agents (Claude Code, Codex, Cursor, etc.) from a single MCP interface—delegating tasks, running parallel consensus, structured debates, and code reviews—without managing new API keys.
Delegate (
delegate): Send a prompt to a single CLI agent and get a normalized result; supports sync/async modes, file context, roles, session resumption, and safety modes (read_only,propose,write,yolo).Consensus (
consensus): Ask the same prompt to multiple CLI agents in parallel; optionally synthesize a combined verdict via majority, unanimous, plurality, or weighted voting.Debate (
debate): Have multiple CLI agents argue across rounds—each sees others' positions and revises—returning a full transcript plus a closing synthesis.Review (
review): Submit a diff or file paths for read-only code review by one or more CLI agents, with findings organized by file/line and severity.Plan (
plan): Direct a single CLI agent to produce an ordered, step-by-step implementation plan for a given goal.Doctor (
doctor): Health-probe each CLI adapter for binary presence, version, auth status, and runtime reachability.Capabilities (
capabilities): Instantly list all known CLIs, their install/auth status, and supported models—no live model calls needed.Background job management: Use
job_status,job_result, andcancel_jobto track and retrieve results from long-running async tasks.List roles (
list_roles): Discover available role personas (e.g.,planner,codereviewer,security,debugger) that can guide any delegation.
All operations default to read_only safety; write and yolo modes require explicitly trusted workspaces, and a depth guard prevents recursive CLI call chains.
Allows delegation of coding tasks, code reviews, and consensus-building to Codex CLI, using OpenAI's models for code generation and analysis.
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Rutherford MCP ServerAsk Claude Code and Codex to implement the login feature and compare solutions."
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
uv tool install rutherford-mcp-serverRutherford is a Model Context Protocol server. Your MCP client (a coding CLI or a desktop app) calls it, and it spawns the target CLIs as fresh, isolated headless subprocesses — argv arrays, never shell strings, read-only by default. Every answer comes back in one normalized shape, so your agent compares like with like. You drive it in plain language; your agent translates your words into Rutherford's tools, and you rarely name the tools yourself.
.---------.
| \/\/\/ |
| O [==]|
| < |
| \___/ |
'---------'
-- Ensign Sam Rutherford --
USS Cerritos . EngineeringNamed for the cheerful engineer aboard the USS Cerritos in Star Trek: Lower Decks, who has a gift for getting heterogeneous systems to cooperate. That is the job here: one agent hands work to a crew of others and brings the results back. Star Trek and Lower Decks are trademarks of their respective owners; this is an unaffiliated, fan-named open-source project.
See it work
The mode that isn't just parallel answers is debate. Round one is each voice's independent take; in
every later round, each voice sees the others' latest positions and is asked to rebut and revise. Here
is one real run of a three-round debate across Claude Code, Codex, and Kiro, condensed to the moment
that matters:
prompt "Is UUIDv7 or ULID the better primary key for a high-write event table?"
panel claude_code, codex, kiro rounds: 3
round 1 claude_code UUIDv7 — the timestamp prefix gives B-tree index locality
codex UUIDv7 — standardized and DB-native; monotonic within a process
kiro UUIDv7 — but argues ULID is BOTH lexicographically sortable AND
collision-resistant across concurrent writers
round 2 claude_code flags that Kiro conflates two properties: ULID's sortability and
its per-process monotonicity are not the same guarantee
codex agrees — the monotonic guarantee is per-process, not cross-node
kiro revises its position: cross-node, UUIDv7's timestamp prefix gives
the locality without relying on a per-process assumption
result converged on UUIDv7, with a closing synthesis of where the panel agreed and whyIn round two, two models corrected a factual error in the third's argument, and that third model changed its mind. The call returns the full per-round transcript plus the closing synthesis, so you can retrace exactly who said what and where someone revised. This is one run — debates do not always converge or change a mind — but when they do, the transcript shows precisely where.
Related MCP server: Debate Agent MCP
Why you'd want this
You are deep in a session with one coding agent, and you hit a moment where one opinion isn't enough:
You're about to commit to a design and want a second and third opinion before you do.
Two models disagree and you want to watch them argue it out, not just answer in parallel.
A diff is risky and you want several reviewers on it, with the must-fix issues separated from nits.
You want to hand off a long refactor to a different agent and keep working while it runs.
You want a fresh critique of the code you just wrote, from an instance with no memory of the conversation that produced it.
Most multi-model tools ask you to wire up provider API keys and pay per token; Rutherford reuses the CLI logins you already have.
Quickstart
You bring the crew. The prerequisite that surprises people: Rutherford does not install or authenticate any coding CLI — it drives the ones you already have. You need Python 3.11+ and at least two target CLIs installed and signed in (two is enough for a consensus or a debate). If you already use Claude Code or Codex, you have most of what you need.
1. Install Rutherford.
uv tool install rutherford-mcp-server
# or: pipx install rutherford-mcp-server / pip install rutherford-mcp-server2. Register it with your client. One-click for Cursor and VS Code with the badges at the top of this page, or by hand:
claude mcp add rutherford -- rutherford-mcp-server # Claude Code
codex mcp add rutherford -- rutherford-mcp-server # CodexFor Claude Desktop, Cursor, and other JSON-config clients:
{ "mcpServers": { "rutherford": { "command": "rutherford-mcp-server" } } }If rutherford-mcp-server isn't on your PATH, use an absolute path or python -m rutherford with the
interpreter from the environment where you installed it. WSL and more clients:
docs/mcp-client-integration.md.
3. Scaffold a config.
rutherford-mcp-server initinit detects which CLIs you're signed in to, prints the plan, and writes a starter config.toml plus a
panel only after you confirm (it never overwrites an existing file). You can do the same conversationally
once Rutherford is registered: ask your agent to "set up Rutherford."
4. Run doctor first. Multi-CLI auth and PATH is the most common thing that goes wrong, so confirm
the crew is reachable before your first real task:
Run Rutherford's doctor and tell me which CLIs are installed, authenticated, and reachable.
doctor
claude_code ok authenticated models: opus, sonnet, haiku
codex ok authenticated
qwen ok auth: unknown (verified live — a round trip succeeded)
cursor not-found install it, then re-run doctor
kiro needs-login run `kiro-cli login` or set KIRO_API_KEYGreen on two or more CLIs means you're ready. Any other line tells you exactly what to fix.
No paid CLI subscription? Run your first consensus for free against local models. Install Ollama, pull one model, and Rutherford will use it — no key, no account:
ollama pull llama3.2Ask the
ollamamodel and any other CLI I'm signed into the same question — "UUID or ULID for a primary key?" — and show me their answers side by side.
The tools
You rarely call these by name; your agent picks them from your request. Everything defaults to read-only.
Tool | When to reach for it |
Hand one task to one CLI; get one normalized result back. | |
Ask several CLIs the same thing in parallel; optionally aggregate to a verdict. | |
Have several CLIs argue across rounds and return the full transcript. | |
Multi-reviewer, read-only code review of a diff or a set of paths. | |
Ask one CLI for an ordered implementation plan. | |
|
|
Long tasks run as background jobs (mode=async): the call returns a job id immediately, and list_jobs,
job_status, job_result, and cancel_job manage them. setup, list_roles, and reload_panels
round out the surface.
How it works
your MCP client (Claude Code, Cursor, Codex, Claude Desktop, ...)
| MCP over stdio
v
rutherford-mcp-server
| fresh subprocess per call (read_only by default, argv arrays never shell strings)
+--> claude -p "..." --output-format json
+--> codex exec --json
+--> cursor-agent -p --output-format json
+--> ... seven more, each behind one adapter fileA CLI that errors or isn't installed comes back as one failed voice without sinking the rest of a panel. Parallel-agent runners point N agents at N tasks and let you read the diffs; Rutherford points several agents at one task and reconciles their answers into one shape.
Recipes
These are how-to recipes — paste the prompt to your agent, which translates it into a tool call. The longer ones, plus saved panels and the strategy walkthrough, live in docs/recipes.md.
Hand one task to one agent
Use Rutherford to have Codex read
src/auth/session.pyand explain how token refresh works. Read-only.
A delegate to one CLI. You get back the answer, timing, token cost, and a session id you can resume.
Get a second and third opinion
I think the deadlock is in
queue.py. Ask Claude Code, Codex, and Qwen the same question — where is it and how would you fix it? — and show me their answers side by side.
A consensus across three targets in parallel. To poll everyone you're signed into, don't name targets.
Run a debate
Run a 3-round debate between Claude Code, Codex, and Kiro on whether UUIDv7 or ULID is the better primary key for a high-write event table. Show how each position shifted, plus a closing summary.
Round one is each voice's independent take; later rounds feed each voice the others' positions to rebut and revise. The result carries the full per-round transcript and a closing synthesis.
Review a diff across several reviewers
Review my staged diff with Claude Code and Codex as reviewers. Findings by file and line, must-fix separated from nits, and call out anything only one of them flagged.
A review — read-only, using the codereviewer role. Point it at paths instead and the reviewers read
the files themselves.
Get an implementation plan
Use Rutherford's planner on Claude Code to turn "add OAuth2 device-code login to the CLI" into an ordered, step-by-step plan, with the files each step touches and the risky parts flagged.
A plan — one target, the planner role, read-only.
Kick off a long job and keep working
Start a big refactor on OpenCode in the background — convert the data layer to the repository pattern in
C:\work\myrepo— and just give me the job id.
Async mode returns a job id immediately. Ask "is that Rutherford job done yet?" to poll it.
Safety model
Every delegation runs in one of four modes, defaulting to the most restrictive.
Mode | Meaning |
| Inspect only. |
| May propose changes (a diff) but not apply them. |
| May modify the workspace, subject to the CLI's own approvals. |
| May act without approval prompts (the CLI's bypass mode). |
A call that omits safety_mode adopts the configured default_safety_mode (read_only out of the box);
an explicit value always wins. write and yolo — explicit or configured — require a trusted workspace:
the target directory must be on the trusted_workspaces allowlist, or the call must pass
trust_workspace=true. No adapter ever defaults to its permission-bypass flag, and invocations are
always built as an argv list, never a shell string. A depth guard (max_depth, default 3) keeps a
CLI-calls-itself chain bounded. Full detail:
docs/security.md.
Supported CLIs
Each adapter keeps all of its CLI-specific details in one file, so a change is a one-file edit. An eleventh, well-behaved CLI can be added without code — see docs/adding-a-cli.md.
CLI | Adapter id | How Rutherford runs it | Auth |
Claude Code |
|
| subscription/OAuth or |
Codex |
|
| ChatGPT login or |
Cursor |
|
|
|
Qwen Code |
|
|
|
Kiro |
|
|
|
OpenCode |
|
| provider key or |
Goose |
|
|
|
Antigravity |
|
| Google login |
Ollama (local) |
|
| none — local daemon |
LM Studio (local) |
|
| none — local |
Confirmed CLI versions. Rutherford's own code is production-stable; its CLI integrations target third-party tools whose headless flags and output formats change between releases. Each Rutherford release records the CLI versions it was last verified against. Re-check after a CLI upgrade, and pin if you can.
CLI | Confirmed with Rutherford 1.3.0 | Check yours |
Claude Code | 2.1.172 |
|
Codex | 0.135.0 |
|
Cursor | 2026.05.28 |
|
Qwen Code | 0.17.0 |
|
Kiro | 2.6.1 |
|
OpenCode | 1.15.13 |
|
Goose | 1.36.0 |
|
Antigravity | 1.0.7 |
|
Ollama | 0.30.6 |
|
LM Studio ( | build efce996 |
|
Ollama and LM Studio are optional, bring-your-own local models: name a model per call with model=, or
set [adapters.<id>] default_model. capabilities/doctor mark them optional: true, and they stay
out of an auto-all panel unless you name them. Local CPU/iGPU inference is slow, so a longer
[adapters.<id>] timeout_s is worth setting. LM Studio also reaches remote models over
LM Link: a model loaded on another machine on your network is addressed by its
normal model key and runs on that machine, reached through lms rather than any vendor API.
Strategies and saved panels
Give consensus a strategy and each voice is asked for a verdict, which Rutherford aggregates. Verdicts
are read from a final VERDICT: <token> line, or as JSON if you pass a verdict_schema.
Strategy | What it does |
| Every voice, no aggregation (the default). |
| Every eligible voice must weigh in and agree; a failed or unparseable voice vetoes. |
| A verdict must exceed 50% of all eligible voices (failed/unparseable count in the denominator). |
| The single top-scoring verdict wins even below 50%; a tie at the top is |
| Like |
| Compares a proposer against parity counterweights; disagreement escalates. |
The min_quorum field (default 1) sets how many parseable voices an aggregating strategy needs. An
optional judge target (ideally a non-participant) writes the synthesis, recorded as synthesis_by;
the same option applies to debate. Full mechanics:
docs/configuration.md.
Save a crew you keep reaching for as a named panel:
# ~/.rutherford/panels.toon
panels:
design-roundtable:
description: Lineage-diverse design review
strategy: parity-pair
targets[3]:
- cli: claude_code
model: opus
label: proposer
- cli: codex
label: implementer
- cli: kiro
model: deepseek-3.2
label: dissenter
parity: trueconsensus, debate, and review all accept panel="design-roundtable". Panels live in
~/.rutherford/panels.toon (global) or <project>/.rutherford/panels.toon (project-specific, which
overrides a global panel of the same name). After editing the file, ask your agent to "reload Rutherford's
panels" and it picks up the change without a restart.
Troubleshooting
Symptom | Fix |
A CLI shows as | It isn't on Rutherford's PATH; install it and re-run |
A CLI shows as | Sign in with that CLI's own login, or set its API key; Rutherford never logs in for you. |
| The target dir isn't on |
| A CLI-calls-itself chain hit |
A local model times out | Raise |
More in docs/troubleshooting.md.
Configuration
The main config is a small TOML file (config.toml in your platform config dir, or a project-local
rutherford.toml); panels and custom roles live in their own files under ~/.rutherford/ and a
project's .rutherford/. The bundled roles are planner, codereviewer, security, and debugger;
add your own as markdown or TOON under ~/.rutherford/roles/. Full reference:
docs/configuration.md.
Status
Rutherford's orchestration core — the safety model, the normalized envelope, the aggregation strategies — is production-stable and covered by a strict test gate. Its CLI integrations are version-sensitive: they drive independent third-party CLIs whose flags, output formats, and auth mechanisms change between releases, and a CLI update can break something an adapter relies on. The versions each release was verified against are listed under Supported CLIs. Pin your CLI versions where you can, and re-verify after upgrades.
Documentation
docs/configuration.md — config file, panels, custom roles, strategies, generic adapters.
docs/recipes.md — the full cookbook of paste-able prompts.
docs/architecture.md — the layered design and the two core interfaces.
docs/adding-a-cli.md — the contract and checklist for adding a CLI.
docs/mcp-client-integration.md — registration for many clients.
docs/integration-testing.md — installing and authenticating each CLI.
docs/security.md — the security model in depth.
docs/troubleshooting.md — common problems and fixes.
Contributing
See CONTRIBUTING.md. The
whole core is testable without a real CLI; run just check before pushing, then just test-integration
for whatever CLIs your machine has installed and authenticated.
License
MIT (c) John Chapman. See LICENSE.
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/chapmanjw/rutherford-mcp-server'
If you have feedback or need assistance with the MCP directory API, please join our Discord server