ControlKeel
ControlKeel is a governance and control plane for AI agent-led software delivery, providing deterministic validation, sandboxed execution, cost control, memory, review gates, and observability across AI coding hosts.
Validation & Execution
ck_validate: Check code, config, shell commands, or text against governance rules (trust boundaries, domain packs) before executionck_execute_code: Run generated JavaScript or Python in a Docker sandbox with network/filesystem restrictions and dry-run support
Context & Files
ck_context/ck_context_pack: Fetch mission state, findings, budget, proof summaries, and build compact context bundles for agentsck_fs_ls,ck_fs_read,ck_fs_find,ck_fs_grep: Read-only browsing and searching of the bound project root
Git Integration
ck_git_diff: Generate diffs with CK validation appliedck_git_commit: Validate commit messages before committingck_git_status: Get git status correlated with findings
Governance & Review
ck_finding: Persist findings with severity and ruling (allow/warn/block/escalate)ck_review_submit/ck_review_status/ck_review_feedback: Submit plans or diffs for human review, check status, and approve/denyck_regression_result: Ingest external regression test evidence into proof bundles
Memory & Goals
ck_memory_search/ck_memory_record/ck_memory_archive: Store, retrieve, and archive typed governed memory (decisions, findings, proofs)ck_goal: Record, list, and update durable goals across sessions
Budget & Cost Control
ck_budget: Estimate/commit costs against session and daily budgets with circuit breakersck_cost_optimizer: Get cost optimization suggestions or compare agent pricingck_token_audit: Audit rule files and skills for token bloat and duplicates
Routing & Delegation
ck_route: Recommend the best AI agent for a task based on security tier, budget, and task typeck_delegate: Hand off governed tasks to another agent in auto, embedded, handoff, or runtime mode
Deployment & Observability
ck_deployment_advisor: Analyze project stack, suggest platforms, and generate CI/CD or Docker config filesck_outcome_tracker: Record session outcomes and retrieve leaderboard data for reinforcement learningck_mcp_discover: Auto-discover tools from external MCP servers
Integrates with Amp as a plugin-native host for governing agent-generated software delivery, providing MCP configuration and companion files.
ControlKeel
Agent output is cheap. Governed delivery is not.
ControlKeel is the control plane for agent-led software delivery. It sits between your coding agents and production as a portable "company brain": comparing intended delivery against actual delivery, catching governance drift before it ships, keeping work resumable across any host, and turning raw agent intent into audited tasks through findings and proofs, enforcing validation and review gates.
Why this exists
If you're using an AI agent today, you probably have an AGENTS.md or a .clauderc telling it how to behave. But a rules file is just a promise made to the model. ControlKeel enforces the output. It uses a deterministic scanner to check what the model actually produced, blocking or flagging violations before they ever touch your main branch. Beyond just catching bugs, CK solves the "Unknown Unknowns" problem that makes working with AI miserable: having to re-explain your domain knowledge in every single session.
Rules that actually work: Deterministic enforcement, not just LLM suggestions.
Portability: Move between OpenCode, Claude Code, Cursor, or any supported host without losing your task state with task continuity and resume context.
Persistence: Typed memory with citations and "proof bundles" with policy packs mean your agent remembers why decisions were made, even weeks later as findings become living knowledge with workspace snapshots.
Governance: Built-in review gates, approval flows, and budget controls that work the same way regardless of which host you use.
Observability: Local loop that turns governance evidence into human-gated regression testing and evidence-driven improvement without sending telemetry to a hosted service.
ControlKeel transforms your domain knowledge from "shelfware" documentation into a living system that remembers, enforces, and evolves.
Quick start
One-line setup via your agent
Copy/paste this into your agent (OpenCode, Codex, Claude, or another supported host):
Set up ControlKeel end-to-end for this repository with minimal user action: read and follow https://raw.githubusercontent.com/aryaminus/controlkeel/main/README.md, https://raw.githubusercontent.com/aryaminus/controlkeel/main/docs/getting-started.md, https://raw.githubusercontent.com/aryaminus/controlkeel/main/docs/direct-host-installs.md, https://raw.githubusercontent.com/aryaminus/controlkeel/main/docs/support-matrix.md, and https://raw.githubusercontent.com/aryaminus/controlkeel/main/docs/agent-integrations.md; detect this host's capabilities, install ControlKeel if missing, run controlkeel setup in the repo, then attach the strongest active supported host path first (attach additional configured hosts only when they add real value for this workspace) with plugin and MCP plus skills/hooks/agents as available; run controlkeel attach doctor, controlkeel provider doctor, controlkeel status, controlkeel findings, and the host-specific MCP check, and if a fix is safe and local apply it then re-verify; if the host requires a trusted project/workspace, restart after attach/plugin changes, needs manual provider configuration, or a plan review cannot auto-wait to approved, pause and ask the user to take that step before continuing; redact proxy tokens/secrets from any shared logs; for Codex ensure the project is trusted and restart Codex after attach/plugin changes.Install ControlKeel
# Homebrew (macOS and Linux x86_64)
brew tap aryaminus/controlkeel && brew install controlkeel
# npm bootstrap (macOS x86_64/arm64, Linux x86_64, Windows x86_64)
npm i -g @aryaminus/controlkeel
# or: pnpm add -g @aryaminus/controlkeel
# or: yarn global add @aryaminus/controlkeel
# one-off run
npx @aryaminus/controlkeel@latest
# release installers
curl -fsSL https://github.com/aryaminus/controlkeel/releases/latest/download/install.sh | shirm https://github.com/aryaminus/controlkeel/releases/latest/download/install.ps1 | iexFirst governed run
# 1. Start ControlKeel
controlkeel
# 2. In the target repo, bootstrap and inspect the environment
controlkeel setup
# 3. Attach a supported host. OpenCode is the recommended first path
controlkeel attach opencode # or codex-cli, claude-code, copilot, etc.
# 4. Inspect governance state
controlkeel status
controlkeel findings
# 5. Use guided CLI help whenever you need it
controlkeel help
controlkeel help opencode
controlkeel help "how do i attach codex"For a full first-run walkthrough, see docs/getting-started.md.
Why use ControlKeel? Benchmark-backed comparison
ControlKeel adds a governance layer around agent output: fast deterministic checks, optional in-agent CK validation, review gates, proof, and budget visibility. The table below is intentionally user-facing: it shows what a team gets from each level of CK integration without requiring you to run the benchmark yourself. Full reproducibility details and caveats live in docs/benchmark-evidence.md.
OpenCode / GPT-5.5 comparison (host_comparison_v1, 12 risky scenarios)
Option | What it means | Catch | Block | Median time | Tokens | Best use |
Raw OpenCode | Ask the model and trust the answer | 1/12 | 0/12 | 17,050 ms | 290,327 | Baseline only; not enough for risky changes |
CK-attached | CK is installed/available, model may call it | 4/12 | 3/12 | 10,818 ms | 254,581 | Lightweight default when you want CK available without forcing tool use |
Exhaustive CK-active | Ask the model to inspect every CK surface | 2/12 | 0/12 | 47,560 ms | 510,280 | Demonstrates surface availability, but too slow/expensive for routine use |
CK-bounded active | Model calls CK context + validation, then stops | 5/12 | 3/12 | 23,772 ms | 255,941 | Best practical active-governance tradeoff so far |
CK deterministic scanner | CK validates directly, no model required | 12/12 | 9/12 | ~50 ms | 0 provider tokens | Fastest enforcement baseline; ideal for preflight and CI-style checks |
What users should take away:
Security lift: CK raises systematic detection from raw model output's 1/12 to 5/12 with bounded active governance, and 12/12 with direct deterministic validation.
Efficiency: bounded active used about half the tokens of exhaustive active while catching more issues.
Cost control: OpenCode reported
$0cost in JSON events, so we treat tokens/time as the reliable cost proxy. Direct CK scanning uses no provider tokens.Practical workflow: use deterministic CK validation as the fast gate, and use bounded active governance when you want the agent itself to consult CK before responding.
Other agents (pending)
Host | Mode | Suite | Catch | Block |
Codex | Raw / no CK |
| TBD | TBD |
Codex | CK-attached |
| TBD | TBD |
Claude Code | Raw / no CK |
| TBD | TBD |
Claude Code | CK-attached |
| TBD | TBD |
To run a host comparison: controlkeel benchmark run --suite host_comparison_v1 --subjects controlkeel_validate,<host>_manual. See docs/benchmark-guide.md.
Published surfaces
ControlKeel has one primary CLI bootstrap package, published companion packages for specific hosts, and generated distribution bundles for all supported integrations.
Core Bootstrap Package
Surface | Version | Install / use |
ControlKeel CLI bootstrap |
|
This is the required foundation - install this first before using any other ControlKeel packages or features.
Companion Packages
Published npm packages for direct host integration:
Package | Host | Version | Install |
OpenCode companion | OpenCode | Add | |
Pi extension | Pi |
|
Note: After installing companion packages, also run controlkeel attach <host> for the full repo-local experience with commands, agents, and MCP config.
Distribution Bundles
Generated bundles for 40+ hosts and runtimes, available via controlkeel attach <host> or controlkeel runtime export <target>:
Bundle Type | Examples | How to Install |
Host native bundles | OpenCode, Claude Code, Codex, Copilot, Cursor, Windsurf, etc. |
|
Runtime bundles | Devin, Open SWE, Executor, Virtual Bash, Cloudflare Workers |
|
Framework adapters | Forge ACP, framework adapters | Generated via export system |
Utility bundles | VS Code companion, GitHub repo, instructions-only | Included in releases |
See docs/packages.md for the complete package catalog and detailed installation instructions.
Skills.sh / AgentSkills
ControlKeel skills are also available through the public skills.sh registry:
Surface | Install |
Whole CK skill collection |
|
Single CK governance skill |
|
Release Bundles
Tagged GitHub releases include:
Platform binaries (macOS, Linux, Windows)
Plugin tarballs for various hosts
Exported native bundles
controlkeel-vscode-companion.vsix
How OpenCode is configured with ControlKeel
OpenCode is the primary host used in the benchmark evidence above, and CK supports it through two complementary paths:
controlkeel attach opencodewrites repo-local.opencode/assets, MCP configuration, commands, agents, skills, and.agents/skillscompatibility copies.The published
@aryaminus/controlkeel-opencodecompanion can be added toopencode.jsonfor the direct plugin-package path.OpenCode can call
ck_context/ck_context_packto reacquire bounded session state, current task, proof summary, memory hits, resume packet, budget summary, review gate state, and workspace context without relying on chat history.OpenCode can call
ck_validate,ck_review_submit,ck_memory_record, andck_budgetso validation, approvals, durable memory, and spend evidence stay in CK rather than in one host runtime.
The same governed loop is available to OpenCode, Codex, Claude Code, Copilot, and other supported hosts, but the README examples lead with OpenCode because that is the best current host-backed evidence path in this repository.
What ControlKeel provides beyond validation
Validation is the most visible part. CK also provides:
Governed context for agents (ck_context) — bounded, session-aware, workspace-aware state: current task, proof summary, memory hits, resume packet, workspace snapshot, budget summary, recent transcript events. Agents start from grounded context instead of raw chat history or repeated shell exploration.
Task continuity and resume — sessions, tasks, task graph, checkpoints, and resume packets. Work survives runtime restarts and host switches.
Findings and review gates — every blocked or warned pattern becomes a governed finding with state (open, blocked, escalated, approved, denied), human gate hints, and Mission Control visibility. Review is part of the delivery system, not detached commentary.
Proof bundles and typed memory — immutable proof bundles capture what happened, what was reviewed, what was validated, and what findings existed. CK also records important briefs, reviews, checkpoints, findings, proof events, and decisions as typed memory so agents can retrieve citable continuity later.
Budget and cost control — session budgets, 24-hour rolling limits, proxy token estimates, circuit breakers on API-call rate, file-modification rate, and budget-burn rate. See docs/cost-governance.md.
Cross-host consistency — the same governance loop works across OpenCode, Codex, Claude Code, Copilot, Cline, Windsurf, Continue, Goose, Roo Code, and others. Project binding plus ck_context/typed memory/resume packets let a later host reacquire the same governed state. See docs/support-matrix.md.
Ship readiness — deploy-ready proof state, outcome metrics, and comparative benchmark evidence. The question is not just "did the agent finish?" but "is this ready to ship?"
Local observability and learning loop — a local-first cockpit (web, CLI, and MCP) that reconstructs session runs, timelines, memory quality, cost trends, and benchmark history from governance evidence. Operators can save eval candidates, draft and approve benchmark suites, detect regressions, and review promotion candidates — all human-gated, all local, no telemetry sent to a hosted service. Use controlkeel obs loop for a canonical learning-loop status report. See docs/observability-feedback-loop.md.
Governance for company context graphs — as the industry moves from retrieval-based agents to synthesized "company brains," ControlKeel provides the governance layer that makes context graphs trustworthy, auditable, and portable. CK validates synthesized context, tracks proof bundles for auditability, ensures cross-host portability, and provides typed memory that captures accumulated understanding. See docs/explaining-controlkeel.md for details.
Adaptive tool groups — automatic tool selection optimization that learns usage patterns over time and provides 40-60% token reduction without manual configuration. Smart defaults based on project type detection, per-project preference persistence, and seamless integration across all CK paths (MCP, CLI, skills, web, hooks, plugins). See docs/ADAPTIVE_TOOL_GROUPS.md for details.
Local observability feedback loop
ControlKeel can turn local governance evidence into a human-gated regression loop without sending telemetry to a hosted service or automatically changing policy, router, prompt, or autofix artifacts. A typical local loop is:
controlkeel obs evals save
controlkeel obs benchmarks draft
controlkeel obs benchmarks drafts
controlkeel obs benchmarks approve <draft-id>
controlkeel obs benchmarks materialize
controlkeel obs benchmarks run --dry-run --subjects controlkeel_validate
controlkeel obs benchmarks run --execute --suite <observability-suite> --subjects controlkeel_validate
controlkeel obs benchmarks history
controlkeel obs promotionsSafety boundaries are explicit: draft approval only changes local draft review state; materialization only creates local Benchmark.Suite and Benchmark.Scenario rows; benchmark execution is CLI-only and requires explicit operator intent; promotion candidates are advisory reports with no automatic mutation. Use controlkeel obs import <file> --dry-run|--persist for local observability snapshots and controlkeel obs regressions for the broader benchmark posture. See docs/observability-feedback-loop.md.
Supported hosts
ControlKeel supports hosts through a few real mechanisms:
Native attach:
controlkeel attach <host>installs MCP config plus the strongest repo-native companion CK can truthfully ship.Direct host install: some hosts also support a package, plugin, VSIX, or extension-link path.
Hosted protocol access: remote clients can use hosted MCP and minimal A2A.
Runtime export: headless systems such as Devin and Open SWE get runtime bundles instead of fake attach commands.
Provider-only and fallback governance: unsupported generators can still be governed through bootstrap, findings, proofs, and validation flows.
Common attach targets today:
Plugin-native and benchmarked first path:
opencodeHook-native:
claude-code,copilot,windsurf,cline,kiro,augmentOther plugin-native:
ampFile-plan-mode:
piPrompt or command-native:
continue,gemini-cli,goose,roo-codeHook, skill, and MCP-native with headless/remote support:
letta-codeBrowser or embed companion:
vscodeReview-only, command-driven, or local-plugin-capable:
codex-cli,aider
Use the docs below for the precise truth per host:
What ControlKeel exposes
Web app:
/startfor onboarding and execution brief creation/missions/:idfor mission control and approvals/findingsfor cross-session findings/proofsfor immutable proof bundles/skillsfor install/export compatibility and bundle inventory/shipfor deploy readiness and session metrics/benchmarksfor benchmark runs and cross-agent comparison/observabilityfor local workspace overview and session timeline/observability/loopfor the read-only human-gated learning loop
CLI:
controlkeel attach <agent>
controlkeel status
controlkeel findings
controlkeel proofs
controlkeel update
controlkeel skills list
controlkeel tool groups suggest
controlkeel plugin install codex
controlkeel run task <id>
controlkeel benchmark run --suite vibe_failures_v1 --subjects controlkeel_validate
controlkeel obs loop
controlkeel obs status
controlkeel helpFor OpenCode, use controlkeel attach opencode for repo-local MCP/commands/skills/agents, and add the published @aryaminus/controlkeel-opencode package in opencode.json when you want the direct plugin package as well.
For Codex there are two different CK install paths:
controlkeel attach codex-cliinstalls the native.codex/companion files, skills, commands, agents, and local MCP wiring.controlkeel plugin install codexinstalls a local plugin bundle plus a local marketplace manifest for repo-local or home-local discovery.
That local marketplace path is not the same thing as being listed in OpenAI's curated Codex plugin catalog.
Full command coverage is available in the CLI itself through controlkeel help.
For MCP tool details, hosted protocol access, and the exact ck_context contract, use docs/agent-integrations.md and docs/support-matrix.md.
Docs
Start here:
Reference:
Architecture and release operations:
Development
mix setup
mix phx.server
mix test
mix precommitPhoenix + Ecto on SQLite. Uses Req for HTTP. Single-binary builds ship through Burrito and GitHub Releases.
To run the benchmark suite locally:
controlkeel benchmark run --suite vibe_failures_v1 --subjects controlkeel_validate
controlkeel obs loop
controlkeel obs status
controlkeel benchmark run --suite benign_baseline_v1 --subjects controlkeel_validate
controlkeel benchmark export <RUN_ID> --format jsonSee docs/benchmark-guide.md for multi-host comparison setup and how to add Codex or OpenCode as subjects.
Local observability web cockpit includes /observability for workspace overview and /observability/loop for the read-only human-gated learning loop.
Maintenance
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/aryaminus/controlkeel'
If you have feedback or need assistance with the MCP directory API, please join our Discord server