Bernstein - Multi-agent orchestration
The Bernstein MCP server lets you orchestrate, monitor, and control multi-agent AI coding workflows programmatically.
Health check (
bernstein_health): Verify the server is running and responsive.Start a run (
bernstein_run): Kick off a new orchestration task by specifying a goal, role, priority, scope, complexity, and time estimate — returns the created task ID and status.Monitor status (
bernstein_status): Get a summary of task counts (total, open, claimed, done, failed) with a per-role breakdown.List tasks (
bernstein_tasks): Retrieve tasks, optionally filtered by status (open, claimed, in_progress, done, failed, blocked, or cancelled).Track costs (
bernstein_cost): View total USD spent and a per-role cost breakdown.Stop the orchestrator (
bernstein_stop): Trigger a graceful shutdown by writing a SHUTDOWN signal to the project directory.Approve tasks (
bernstein_approve): Mark pending or blocked tasks as complete at a human-in-the-loop gate, with an optional approval note.Create subtasks (
bernstein_create_subtask): Decompose a parent task into linked subtasks, automatically transitioning the parent to a waiting state.Load skill packs (
load_skill): Fetch full skill pack content, reference documents, or scripts on demand to guide agent behavior.
Provides GitHub App integration for connecting with GitHub repositories, enabling automated workflows and repository management within the orchestration ecosystem.
Provides Grafana dashboards for observability and monitoring, allowing visualization of orchestration metrics, task execution status, and cost tracking data.
Orchestrates Ollama CLI agent for running local AI models offline, enabling parallel task execution with local models without cloud dependencies.
Exposes Prometheus /metrics endpoint for monitoring orchestration performance, agent health, task metrics, and cost tracking with built-in metrics export.
"To achieve great things, two things are needed: a plan and not quite enough time." - Leonard Bernstein
why the name?
Bernstein is named after Leonard Bernstein, the American conductor and composer. The project orchestrates a crew of CLI coding agents the way Bernstein conducted the New York Philharmonic: every player on cue, the score deterministic, the conductor accountable for the result. He is the original orchestrator the project takes its name from.
deterministic multi-agent CLI orchestration
website · docs · install · first run · glossary · limitations · sponsor
Bernstein is a deterministic Python scheduler that runs a crew of CLI coding agents (Claude Code, Codex, Gemini CLI, and 40 more) against a single goal in parallel git worktrees, with an HMAC-signed audit chain over every step.
at a glance
44 CLI agent adapters in v2.2.x: 41 third-party wrappers, 2 leaf-node delegators, plus a generic
--promptwrapper. Source of truth: the supported agents table below.HMAC-SHA256 audit chain per RFC 2104, one record per scheduling decision, tamper-evident. Operator guide: docs/security/audit-log.md.
Bearer-token task server authenticates the manager and every worker. Per-session zero-trust JWT in
.sdd/runtime/agent_tokens/, legacyBERNSTEIN_AUTH_TOKENfallback, opt-out viaBERNSTEIN_AUTH_DISABLED=1. Flow + diagnostics: docs/security/manager-auth.md.Signed agent cards use detached JWS (RFC 7515 §A.5) over RFC 8785 (JCS) canonicalization, with Ed25519 / EdDSA keys. Code: src/bernstein/core/security/agent_card_signer.py.
Per-artefact lineage records every file write linked back to producer + inputs + prompt SHA + model + cost. CLI:
bernstein lineage verify <run_id>.Deterministic scheduler: zero LLM in the coordination loop. Plain Python decides who runs, where, with what budget. Replay yesterday's plan, get yesterday's task graph.
why this exists
i wrote bernstein because i was paying $400/month in claude bills running three coding agents in parallel and getting nondeterministic merges.
Apache 2.0, solo maintained. Live stats: bernstein.run.
install in 30 seconds
pipx install bernstein
bernstein init
bernstein run -g "fix the failing test in tests/test_foo.py"See installed integrations: bernstein integrations list --installed.
sponsor
If Bernstein routed a model that saved you a Claude bill, $25 covers a month of my coffee.
github.com/sponsors/chernistry
who this is for
Specific shapes where the value lands:
engineering teams running >=3 CLI coding agents in parallel: each agent gets its own git worktree, the merge queue serialises landings, no race conditions
operators running compliance-sensitive workflows: every routing decision is plaintext, the audit log is HMAC-signed and tamper-evident, no SaaS hop, no third-party data plane
platform teams that need an audit log of agent decisions: the orchestrator writes one row per scheduling decision, you can grep it
anyone burning more than $1k/mo on coding agents who wants determinism: you can replay yesterday's plan and get yesterday's task graph
forward-deployed engineers dropping into a client repo: credentials stay in your env, not the client's; agents you spawn are whichever CLI tool the client already trusts
If you nodded at two of those bullets, this fits.
who this is NOT for
"I want one pair-programmer to chat with about my code": a single CLI agent is fine. Bernstein adds orchestration overhead you don't need.
prototypes where merge gates are overkill: the lint/types/tests/cross-model-review pipeline is value when the cost of a bad merge is real, friction when you're throwing the repo away on Friday.
non-coding tasks (research, writing, data analysis pipelines): Bernstein wraps CLI coding agents specifically, not generic LLM workflows.
anyone who wants a SaaS wrapper with a credit-card form: Bernstein is on-prem only by design.
teams that need a vendor with a support SLA and a contract: solo open-source project. GitHub issues are how support happens.
research-shape "let the agents collaborate emergently" use cases: the deterministic scheduler is a hard wall there.
how it compares
Closest neighbours in this category live in docs/compare/README.md. What Bernstein does well is the auditability surface: HMAC-chained audit, signed agent cards, per-artefact lineage, air-gap deploy profile, plus the widest CLI adapter coverage.
what is this, in one paragraph
You tell Bernstein what you want built. It splits the work across several AI coding agents, runs them in parallel inside isolated git worktrees, records every handoff in an HMAC-SHA256-chained audit log (RFC 2104), runs the tests, and merges the code that actually passes. File-based state (.sdd/), per-agent credential scoping, signed audit trail.
other install methods
curl -fsSL https://bernstein.run/install.sh | sh # macOS / Linux one-liner
irm https://bernstein.run/install.ps1 | iex # Windows PowerShell
pip install bernstein # pip
uv tool install bernstein # uv
brew tap chernistry/tap && brew install bernstein # HomebrewSee the full install matrix for dnf copr, npx, optional extras, and the wheelhouse path for air-gapped sites.
why the scheduler is plain Python
Most agent orchestrators use an LLM to decide who does what. That is non-deterministic and burns tokens on scheduling instead of code. Bernstein does one LLM call to break down your goal, then the rest (running agents in parallel, isolating their git branches, running tests, routing retries) is plain Python. Every run is reproducible. Every step is logged and replayable.
No framework to learn. No vendor lock-in. Swap any agent, any model, any provider.
What you see while it runs:
$ bernstein -g "Add JWT auth"
[manager] decomposed into 4 tasks
[agent-1] claude-sonnet: src/auth/middleware.py (done, 2m 14s)
[agent-2] codex: tests/test_auth.py (done, 1m 58s)
[verify] all gates pass. merging to main.YAML workflow manifests (optional)
When bernstein run -g "<goal>" is too coarse-grained, bernstein workflow runs a declarative DAG of agent / command / loop nodes. Manifests are plain YAML, validated up-front, dispatched through the same AgentSpawner the rest of Bernstein uses.
bernstein workflow list # bundled + user-installed
bernstein workflow run idea-to-pr -g "Add JWT auth"
bernstein workflow init my-flow # scaffold a starter manifest
bernstein workflow validate path/to/flow.yamlStock workflows shipping in the wheel: idea-to-pr, refactor-with-tests, security-review, doc-update, dependency-bump, hot-fix. Loop nodes re-fire until a bash predicate exits 0. fresh_context: true mints a new agent session per iteration. Per-step CLI/model routing: docs/workflows/per-step-routing.md.
use cases
forward-deployed engineering: drop the crew onto a client repo when you arrive, take it with you when you leave.
self-evolving projects: point Bernstein at its own repo and let it execute the backlog (this codebase is one).
CI fleets: run a crew of agents in parallel on PRs, with per-agent credential scoping and signed audit trail.
air-gapped deployment: install from a signed wheelhouse, run with
--profile airgapto deny outbound by default. See Air-gap installation.
supported agents
Bernstein auto-discovers installed CLI agents. Mix them in the same run. Cheap local models for boilerplate, heavier cloud models for architecture.
44 CLI agent adapters: 41 third-party wrappers, 2 leaf-node delegators, plus a generic wrapper for anything with --prompt.
Agent | Models | Install |
Opus 4, Sonnet 4.6, Haiku 4.5 |
| |
GPT-5, GPT-5 mini |
| |
GPT-5, GPT-5 mini, o4 |
| |
Copilot-managed (GPT-5, Sonnet 4.6) |
| |
Gemini 2.5 Pro, Gemini Flash |
| |
Sonnet 4.6, Opus 4, GPT-5 | ||
Devin Terminal (Cognition) | Devin-managed |
|
Any OpenAI/Anthropic-compatible |
| |
Amp-managed |
| |
CLM gateway (sovereign / on-prem LLM) | Any OpenAI-compatible CLM endpoint |
|
Sourcegraph-hosted |
| |
Any OpenAI/Anthropic-compatible |
| |
Any provider Goose supports | See Goose docs | |
IaC (Terraform/Pulumi) | Any provider the base agent uses | Built-in |
BYOK (Anthropic, OpenAI, Google, xAI, OpenRouter, Copilot) |
| |
Kilo-hosted | See Kilo docs | |
Kiro-hosted | See Kiro docs | |
Amazon Q-managed (Claude-backed) |
| |
Ollama + Aider | Local models (offline) |
|
Any provider OpenCode supports | See OpenCode docs | |
Qwen Code models |
| |
Workers AI models |
| |
Any LiteLLM-supported (Anthropic, OpenAI, ...) |
| |
Any (LiteLLM-backed) |
| |
Anthropic, OpenAI, OpenRouter |
| |
Plandex Cloud or self-hosted models |
| |
OpenAI, Anthropic, OpenRouter, Groq, Gemini |
| |
Letta-routed (Anthropic, OpenAI) |
| |
Generic | Any CLI with | Built-in |
Any adapter also works as the internal scheduler LLM:
internal_llm_provider: gemini # or qwen, ollama, codex, goose, ...
internal_llm_model: gemini-3.1-proRunbernstein --headless for CI pipelines. No TUI, structured JSON output, non-zero exit on failure.
quick start
cd your-project
bernstein init # creates .sdd/ workspace + bernstein.yaml
bernstein -g "Add rate limiting" # agents spawn, work in parallel, verify, exit
bernstein live # watch progress in the TUI dashboard
bernstein stop # graceful shutdown with drainFor multi-stage projects, define a YAML plan:
bernstein run plan.yaml # skips LLM planning, goes straight to execution
bernstein run --dry-run plan.yaml # preview tasks and estimated costweb UI
v2.0.0 ships a minimal web UI (operator-requested; UI is a side surface, core orchestrator is the priority).
bernstein gui serve # http://127.0.0.1:8052/ui/
bernstein gui serve --dev # expects `npm run dev` on :5173
bernstein gui serve --minimal # skip the full /api/v1/* surfaceThe Vite bundle is committed under src/bernstein/gui/static/, so wheel installs work without a Node toolchain. Surface tour + per-task drawer: docs/web-ui.md.
how it works
Bernstein runs a four-stage pipeline per goal:
Decompose. The manager breaks your goal into tasks with roles, owned files, and completion signals. One LLM call, then plain Python from there.
Spawn. Agents start in isolated git worktrees, one per task. Main branch stays clean.
Verify. The janitor checks concrete signals: tests pass, files exist, lint clean, types correct.
Merge. Verified work lands in main. Failed tasks get retried or routed to a different model.
The orchestrator is a Python scheduler, not an LLM. Scheduling decisions are deterministic, auditable, and reproducible. Every step writes a record to the HMAC-chained audit log (.sdd/audit/YYYY-MM-DD.jsonl) per RFC 2104.
cloud execution (Cloudflare)
bernstein cloud runs agents on Cloudflare Workers with R2-backed workspace sync. See docs/cloudflare/.
bernstein cloud login # authenticate with Bernstein Cloud
bernstein cloud deploy # push agent workers
bernstein cloud run plan.yaml # execute a plan on Cloudflarecapabilities
Bernstein ships parallel execution + worktree isolation + a janitor that gates merges on tests/lint/types, signed lineage records, MCP server mode, an HMAC-SHA256 audit chain, and 44 CLI adapters out of the box. Pluggable sandbox backends (worktree, Docker, E2B, Modal), pluggable artifact sinks (local, S3, GCS, Azure Blob, R2), progressive-disclosure skill packs, and a lethal-trifecta capability gate round it out.
Full feature matrix: docs/reference/FEATURE_MATRIX.md. Recent features: docs/whats-new.md.
regulatory anchors
Regulatory mappings (EU AI Act Article 12, SOC 2 CC4/CC7, DORA / NIS2, OWASP ASI06, RFC 2104/7515/8785/8037/7636/8707) live in docs/compliance/. These are mappings, not certifications.
operator commands
Highest-value commands; full list in docs/operations/commands.md.
Command | What it does |
| Auto-creates a GitHub PR from a completed session; body carries the janitor's gate results and cost breakdown. |
| Imports a Linear / GitHub Issues / Jira ticket as a Bernstein task. |
| Daemon that monitors open Bernstein PRs; spawns a fixer agent when CI fails. |
| Lifecycle hooks ( |
| Atomically claims one eligible row from |
| Drive runs from chat with |
| Run a YAML workflow manifest. |
| Manage operator-registered recurring schedules; |
retrieval & caching: what's actually under the hood
Bernstein deliberately uses no neural embeddings, no vector databases, and no external embedding APIs. There are two retrieval/caching layers, both keyword/lexical:
Codebase RAG (
core/knowledge/rag.py): SQLite FTS5 with BM25 ranking and AST-aware chunking for Python files.Semantic cache (
core/knowledge/semantic_cache.py): TF (term-frequency) cosine similarity over word counts, not learned embeddings.
If you need real semantic retrieval (vector DB, neural embeddings), wire it yourself via the retrieval role/skill in templates/; nothing in core performs vector search.
install
Method | Command |
One-liner (macOS / Linux) |
|
One-liner (Windows) |
|
pip |
|
pipx |
|
uv |
|
Homebrew |
|
Fedora / RHEL |
|
npm (wrapper) |
|
Docker (GHCR) |
|
The one-liner scripts check for Python 3.12+, bootstrap pipx when it's missing, fix PATH for the current session, and install (or upgrade) bernstein. Script sources: install.sh · install.ps1.
optional extras
Provider SDKs are optional so the base install stays lean.
Extra | Enables |
| OpenAI Agents SDK v2 adapter ( |
| Docker sandbox backend |
| E2B microVM sandbox backend (needs |
| Modal sandbox backend, optional GPU (needs |
| S3 artifact sink (via |
| Google Cloud Storage artifact sink |
| Azure Blob artifact sink |
| Cloudflare R2 artifact sink (S3-compatible |
| gRPC bridge |
| Kubernetes integrations |
Combine extras with brackets, e.g. pip install 'bernstein[openai,docker,s3]'.
Editor extensions: VS Marketplace · Open VSX
security
OpenSSF Scorecard. Weekly run via
.github/workflows/scorecard.yml. Results uploaded to GitHub Code Scanning. Badge above.Fuzzing. ClusterFuzzLite config at
.clusterfuzzlite/plus acifuzz-prworkflow (.github/workflows/cifuzz-pr.yml) provide an OSSF-recognized fuzzing harness on top of the existing Hypothesis property-test suite.Vulnerability disclosure. See SECURITY.md.
contributing
PRs welcome. See CONTRIBUTING.md for setup and code style.
support
If Bernstein saves you time: GitHub Sponsors.
Contact: forte@bernstein.run.
featured in
Augment Code - 9 Open-Source Agent Orchestrators for AI Coding (2026); editorial roundup.
nibzard/awesome-agentic-patterns; Bernstein cited as the production implementation of the "deterministic zero-LLM orchestration" pattern.
Python Weekly; newsletter mention.
Future Digest; cost-cutting playbook write-up.
Awesome lists: Jenqyang/Awesome-AI-Agents, jamesmurdza/awesome-ai-devtools, jim-schwoebel/awesome_ai_agents, Piebald-AI/awesome-gemini-cli, ComposioHQ/awesome-codex-skills, punkpeye/awesome-mcp-servers, jxzhangjhu/Awesome-LLM-RAG, rohitg00/awesome-claude-code-toolkit, numtide/llm-agents.nix, andyrewlee/awesome-agent-orchestrators, bradAGI/awesome-cli-coding-agents, milisp/awesome-codex-cli, yaolifeng0629/Awesome-independent-tools, caramaschiHG/awesome-ai-agents-2026, ai-for-developers/awesome-vibe-coding, taishi-i/awesome-ChatGPT-repositories, eudk/awesome-ai-tools, killop/anything_about_game, vinta/awesome-python, Zijian-Ni/awesome-ai-agents-2026, rohitg00/awesome-devops-mcp-servers, Glama MCP Catalog. Mirrors: icopy-site/awesome, icopy-site/awesome-cn, trackawesomelist/trackawesomelist.
Prior-art citations by peer projects: mkb23/overcode, Vintersong/NOVA-Cognition-Framework, AJV009/drupal-contrib-workbench, danielvaughan/codex-blog.
Directories: AlternativeTo.
cite
Machine-readable metadata lives in CITATION.cff (CFF 1.2.0); GitHub renders the "Cite this repository" button automatically. A Zenodo DOI will be minted on the next release.
license
Alex Chernysh · GitHub · X · bernstein.run
Translations available in 11 languages: see docs/i18n/.
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/sipyourdrink-ltd/bernstein'
If you have feedback or need assistance with the MCP directory API, please join our Discord server