What can you do with this server?

ControlKeel is a governance and control plane for AI agent-led software delivery, providing deterministic validation, sandboxed execution, cost control, memory, review gates, and observability across AI coding hosts. Validation & Execution * ck_validate: Check code, config, shell commands, or text against governance rules (trust boundaries, domain packs) before execution * ck_execute_code: Run generated JavaScript or Python in a Docker sandbox with network/filesystem restrictions and dry-run support Context & Files * ck_context / ck_context_pack: Fetch mission state, findings, budget, proof summaries, and build compact context bundles for agents * ck_fs_ls, ck_fs_read, ck_fs_find, ck_fs_grep: Read-only browsing and searching of the bound project root Git Integration * ck_git_diff: Generate diffs with CK validation applied * ck_git_commit: Validate commit messages before committing * ck_git_status: Get git status correlated with findings Governance & Review * ck_finding: Persist findings with severity and ruling (allow/warn/block/escalate) * ck_review_submit / ck_review_status / ck_review_feedback: Submit plans or diffs for human review, check status, and approve/deny * ck_regression_result: Ingest external regression test evidence into proof bundles Memory & Goals * ck_memory_search / ck_memory_record / ck_memory_archive: Store, retrieve, and archive typed governed memory (decisions, findings, proofs) * ck_goal: Record, list, and update durable goals across sessions Budget & Cost Control * ck_budget: Estimate/commit costs against session and daily budgets with circuit breakers * ck_cost_optimizer: Get cost optimization suggestions or compare agent pricing * ck_token_audit: Audit rule files and skills for token bloat and duplicates Routing & Delegation * ck_route: Recommend the best AI agent for a task based on security tier, budget, and task type * ck_delegate: Hand off governed tasks to another agent in auto, embedded, handoff, or runtime mode Deployment & Observability * ck_deployment_advisor: Analyze project stack, suggest platforms, and generate CI/CD or Docker config files * ck_outcome_tracker: Record session outcomes and retrieve leaderboard data for reinforcement learning * ck_mcp_discover: Auto-discover tools from external MCP servers

Which integrations are available for this server?

Integrates with Amp as a plugin-native host for governing agent-generated software delivery, providing MCP configuration and companion files.

de en es ja ko ru zh

ControlKeel

by aryaminus

Overview Schema Related Servers Score Discussions

Hybrid

ControlKeel is a governance and control plane for AI agent-led software delivery, providing deterministic validation, sandboxed execution, cost control, memory, review gates, and observability across AI coding hosts.

Validation & Execution

ck_validate: Check code, config, shell commands, or text against governance rules (trust boundaries, domain packs) before execution
ck_execute_code: Run generated JavaScript or Python in a Docker sandbox with network/filesystem restrictions and dry-run support

Context & Files

ck_context / ck_context_pack: Fetch mission state, findings, budget, proof summaries, and build compact context bundles for agents
ck_fs_ls, ck_fs_read, ck_fs_find, ck_fs_grep: Read-only browsing and searching of the bound project root

Git Integration

ck_git_diff: Generate diffs with CK validation applied
ck_git_commit: Validate commit messages before committing
ck_git_status: Get git status correlated with findings

Governance & Review

ck_finding: Persist findings with severity and ruling (allow/warn/block/escalate)
ck_review_submit / ck_review_status / ck_review_feedback: Submit plans or diffs for human review, check status, and approve/deny
ck_regression_result: Ingest external regression test evidence into proof bundles

Memory & Goals

ck_memory_search / ck_memory_record / ck_memory_archive: Store, retrieve, and archive typed governed memory (decisions, findings, proofs)
ck_goal: Record, list, and update durable goals across sessions

Budget & Cost Control

ck_budget: Estimate/commit costs against session and daily budgets with circuit breakers
ck_cost_optimizer: Get cost optimization suggestions or compare agent pricing
ck_token_audit: Audit rule files and skills for token bloat and duplicates

Routing & Delegation

ck_route: Recommend the best AI agent for a task based on security tier, budget, and task type
ck_delegate: Hand off governed tasks to another agent in auto, embedded, handoff, or runtime mode

Deployment & Observability

ck_deployment_advisor: Analyze project stack, suggest platforms, and generate CI/CD or Docker config files
ck_outcome_tracker: Record session outcomes and retrieve leaderboard data for reinforcement learning
ck_mcp_discover: Auto-discover tools from external MCP servers

ControlKeel

Release Smoke Latest Release npm bootstrap Socket Badge controlkeel MCP server

Agent output is cheap. Governed delivery is not.

ControlKeel is the control plane for agent-led software delivery. It sits between your coding agents and production as a portable "company brain": comparing intended delivery against actual delivery, catching governance drift before it ships, keeping work resumable across any host, and turning raw agent intent into audited tasks through findings and proofs, enforcing validation and review gates.

Why this exists

If you're using an AI agent today, you probably have an AGENTS.md or a .clauderc telling it how to behave. But a rules file is just a promise made to the model. ControlKeel enforces the output. It uses a deterministic scanner to check what the model actually produced, blocking or flagging violations before they ever touch your main branch. Beyond just catching bugs, CK solves the "Unknown Unknowns" problem that makes working with AI miserable: having to re-explain your domain knowledge in every single session.

Rules that actually work: Deterministic enforcement, not just LLM suggestions.
Portability: Move between OpenCode, Claude Code, Cursor, or any supported host without losing your task state with task continuity and resume context.
Persistence: Typed memory with citations and "proof bundles" with policy packs mean your agent remembers why decisions were made, even weeks later as findings become living knowledge with workspace snapshots.
Governance: Built-in review gates, approval flows, and budget controls that work the same way regardless of which host you use.
Observability: Local loop that turns governance evidence into human-gated regression testing and evidence-driven improvement without sending telemetry to a hosted service.

ControlKeel transforms your domain knowledge from "shelfware" documentation into a living system that remembers, enforces, and evolves.

Quick start

One-line setup via your agent

Copy/paste this into your agent (OpenCode, Codex, Claude, or another supported host):

Set up ControlKeel end-to-end for this repository with minimal user action: read and follow https://raw.githubusercontent.com/aryaminus/controlkeel/main/README.md, https://raw.githubusercontent.com/aryaminus/controlkeel/main/docs/getting-started.md, https://raw.githubusercontent.com/aryaminus/controlkeel/main/docs/direct-host-installs.md, https://raw.githubusercontent.com/aryaminus/controlkeel/main/docs/support-matrix.md, and https://raw.githubusercontent.com/aryaminus/controlkeel/main/docs/agent-integrations.md; detect this host's capabilities, install ControlKeel if missing, run controlkeel setup in the repo, then attach the strongest active supported host path first (attach additional configured hosts only when they add real value for this workspace) with plugin and MCP plus skills/hooks/agents as available; run controlkeel attach doctor, controlkeel provider doctor, controlkeel status, controlkeel findings, and the host-specific MCP check, and if a fix is safe and local apply it then re-verify; if the host requires a trusted project/workspace, restart after attach/plugin changes, needs manual provider configuration, or a plan review cannot auto-wait to approved, pause and ask the user to take that step before continuing; redact proxy tokens/secrets from any shared logs; for Codex ensure the project is trusted and restart Codex after attach/plugin changes.

Install ControlKeel

# Homebrew (macOS and Linux x86_64)
brew tap aryaminus/controlkeel && brew install controlkeel

# npm bootstrap (macOS x86_64/arm64, Linux x86_64, Windows x86_64)
npm i -g @aryaminus/controlkeel
# or: pnpm add -g @aryaminus/controlkeel
# or: yarn global add @aryaminus/controlkeel

# one-off run
npx @aryaminus/controlkeel@latest

# release installers
curl -fsSL https://github.com/aryaminus/controlkeel/releases/latest/download/install.sh | sh

irm https://github.com/aryaminus/controlkeel/releases/latest/download/install.ps1 | iex

First governed run

# 1. Start ControlKeel
controlkeel

# 2. In the target repo, bootstrap and inspect the environment
controlkeel setup

# 3. Attach a supported host. OpenCode is the recommended first path
controlkeel attach opencode   # or codex-cli, claude-code, copilot, etc.

# 4. Inspect governance state
controlkeel status
controlkeel findings

# 5. Use guided CLI help whenever you need it
controlkeel help
controlkeel help opencode
controlkeel help "how do i attach codex"

For a full first-run walkthrough, see docs/getting-started.md.

Why use ControlKeel? Benchmark-backed comparison

ControlKeel adds a governance layer around agent output: fast deterministic checks, optional in-agent CK validation, review gates, proof, and budget visibility. The table below is intentionally user-facing: it shows what a team gets from each level of CK integration without requiring you to run the benchmark yourself. Full reproducibility details and caveats live in docs/benchmark-evidence.md.

OpenCode / GPT-5.5 comparison (`host_comparison_v1`, 12 risky scenarios)

Option	What it means	Catch	Block	Median time	Tokens	Best use
Raw OpenCode	Ask the model and trust the answer	1/12	0/12	17,050 ms	290,327	Baseline only; not enough for risky changes
CK-attached	CK is installed/available, model may call it	4/12	3/12	10,818 ms	254,581	Lightweight default when you want CK available without forcing tool use
Exhaustive CK-active	Ask the model to inspect every CK surface	2/12	0/12	47,560 ms	510,280	Demonstrates surface availability, but too slow/expensive for routine use
CK-bounded active	Model calls CK context + validation, then stops	5/12	3/12	23,772 ms	255,941	Best practical active-governance tradeoff so far
CK deterministic scanner	CK validates directly, no model required	12/12	9/12	~50 ms	0 provider tokens	Fastest enforcement baseline; ideal for preflight and CI-style checks

What users should take away:

Security lift: CK raises systematic detection from raw model output's 1/12 to 5/12 with bounded active governance, and 12/12 with direct deterministic validation.
Efficiency: bounded active used about half the tokens of exhaustive active while catching more issues.
Cost control: OpenCode reported $0 cost in JSON events, so we treat tokens/time as the reliable cost proxy. Direct CK scanning uses no provider tokens.
Practical workflow: use deterministic CK validation as the fast gate, and use bounded active governance when you want the agent itself to consult CK before responding.

Other agents (pending)

Host	Mode	Suite	Catch	Block
Codex	Raw / no CK	`host_comparison_v1`	TBD	TBD
Codex	CK-attached	`host_comparison_v1`	TBD	TBD
Claude Code	Raw / no CK	`host_comparison_v1`	TBD	TBD
Claude Code	CK-attached	`host_comparison_v1`	TBD	TBD

To run a host comparison: controlkeel benchmark run --suite host_comparison_v1 --subjects controlkeel_validate,<host>_manual. See docs/benchmark-guide.md.

Published surfaces

ControlKeel has one primary CLI bootstrap package, published companion packages for specific hosts, and generated distribution bundles for all supported integrations.

Core Bootstrap Package

Surface	Version	Install / use
ControlKeel CLI bootstrap		`npm i -g @aryaminus/controlkeel`

This is the required foundation - install this first before using any other ControlKeel packages or features.

Companion Packages

Published npm packages for direct host integration:

Package	Host	Version	Install
OpenCode companion	OpenCode		Add `"plugin": ["@aryaminus/controlkeel-opencode"]` to `opencode.json`
Pi extension	Pi		`pi install npm:@aryaminus/controlkeel-pi-extension`

Note: After installing companion packages, also run controlkeel attach <host> for the full repo-local experience with commands, agents, and MCP config.

Distribution Bundles

Generated bundles for 40+ hosts and runtimes, available via controlkeel attach <host> or controlkeel runtime export <target>:

Bundle Type	Examples	How to Install
Host native bundles	OpenCode, Claude Code, Codex, Copilot, Cursor, Windsurf, etc.	`controlkeel attach <host>`
Runtime bundles	Devin, Open SWE, Executor, Virtual Bash, Cloudflare Workers	`controlkeel runtime export <runtime>`
Framework adapters	Forge ACP, framework adapters	Generated via export system
Utility bundles	VS Code companion, GitHub repo, instructions-only	Included in releases

See docs/packages.md for the complete package catalog and detailed installation instructions.

Skills.sh / AgentSkills

ControlKeel skills are also available through the public skills.sh registry:

Surface	Install
Whole CK skill collection	`npx skills add https://github.com/aryaminus/controlkeel`
Single CK governance skill	`npx skills add https://github.com/aryaminus/controlkeel --skill controlkeel-governance`

Release Bundles

Tagged GitHub releases include:

Platform binaries (macOS, Linux, Windows)
Plugin tarballs for various hosts
Exported native bundles
controlkeel-vscode-companion.vsix

GitHub release

How OpenCode is configured with ControlKeel

OpenCode is the primary host used in the benchmark evidence above, and CK supports it through two complementary paths:

controlkeel attach opencode writes repo-local .opencode/ assets, MCP configuration, commands, agents, skills, and .agents/skills compatibility copies.
The published @aryaminus/controlkeel-opencode companion can be added to opencode.json for the direct plugin-package path.
OpenCode can call ck_context / ck_context_pack to reacquire bounded session state, current task, proof summary, memory hits, resume packet, budget summary, review gate state, and workspace context without relying on chat history.
OpenCode can call ck_validate, ck_review_submit, ck_memory_record, and ck_budget so validation, approvals, durable memory, and spend evidence stay in CK rather than in one host runtime.

The same governed loop is available to OpenCode, Codex, Claude Code, Copilot, and other supported hosts, but the README examples lead with OpenCode because that is the best current host-backed evidence path in this repository.

What ControlKeel provides beyond validation

Validation is the most visible part. CK also provides:

Governed context for agents (ck_context) — bounded, session-aware, workspace-aware state: current task, proof summary, memory hits, resume packet, workspace snapshot, budget summary, recent transcript events. Agents start from grounded context instead of raw chat history or repeated shell exploration.

Task continuity and resume — sessions, tasks, task graph, checkpoints, and resume packets. Work survives runtime restarts and host switches.

Findings and review gates — every blocked or warned pattern becomes a governed finding with state (open, blocked, escalated, approved, denied), human gate hints, and Mission Control visibility. Review is part of the delivery system, not detached commentary.

Proof bundles and typed memory — immutable proof bundles capture what happened, what was reviewed, what was validated, and what findings existed. CK also records important briefs, reviews, checkpoints, findings, proof events, and decisions as typed memory so agents can retrieve citable continuity later.

Budget and cost control — session budgets, 24-hour rolling limits, proxy token estimates, circuit breakers on API-call rate, file-modification rate, and budget-burn rate. See docs/cost-governance.md.

Cross-host consistency — the same governance loop works across OpenCode, Codex, Claude Code, Copilot, Cline, Windsurf, Continue, Goose, Roo Code, and others. Project binding plus ck_context/typed memory/resume packets let a later host reacquire the same governed state. See docs/support-matrix.md.

Ship readiness — deploy-ready proof state, outcome metrics, and comparative benchmark evidence. The question is not just "did the agent finish?" but "is this ready to ship?"

Local observability and learning loop — a local-first cockpit (web, CLI, and MCP) that reconstructs session runs, timelines, memory quality, cost trends, and benchmark history from governance evidence. Operators can save eval candidates, draft and approve benchmark suites, detect regressions, and review promotion candidates — all human-gated, all local, no telemetry sent to a hosted service. Use controlkeel obs loop for a canonical learning-loop status report. See docs/observability-feedback-loop.md.

Governance for company context graphs — as the industry moves from retrieval-based agents to synthesized "company brains," ControlKeel provides the governance layer that makes context graphs trustworthy, auditable, and portable. CK validates synthesized context, tracks proof bundles for auditability, ensures cross-host portability, and provides typed memory that captures accumulated understanding. See docs/explaining-controlkeel.md for details.

Adaptive tool groups — automatic tool selection optimization that learns usage patterns over time and provides 40-60% token reduction without manual configuration. Smart defaults based on project type detection, per-project preference persistence, and seamless integration across all CK paths (MCP, CLI, skills, web, hooks, plugins). See docs/ADAPTIVE_TOOL_GROUPS.md for details.

Local observability feedback loop

ControlKeel can turn local governance evidence into a human-gated regression loop without sending telemetry to a hosted service or automatically changing policy, router, prompt, or autofix artifacts. A typical local loop is:

controlkeel obs evals save
controlkeel obs benchmarks draft
controlkeel obs benchmarks drafts
controlkeel obs benchmarks approve <draft-id>
controlkeel obs benchmarks materialize
controlkeel obs benchmarks run --dry-run --subjects controlkeel_validate
controlkeel obs benchmarks run --execute --suite <observability-suite> --subjects controlkeel_validate
controlkeel obs benchmarks history
controlkeel obs promotions

Safety boundaries are explicit: draft approval only changes local draft review state; materialization only creates local Benchmark.Suite and Benchmark.Scenario rows; benchmark execution is CLI-only and requires explicit operator intent; promotion candidates are advisory reports with no automatic mutation. Use controlkeel obs import <file> --dry-run|--persist for local observability snapshots and controlkeel obs regressions for the broader benchmark posture. See docs/observability-feedback-loop.md.

Supported hosts

ControlKeel supports hosts through a few real mechanisms:

Native attach: controlkeel attach <host> installs MCP config plus the strongest repo-native companion CK can truthfully ship.
Direct host install: some hosts also support a package, plugin, VSIX, or extension-link path.
Hosted protocol access: remote clients can use hosted MCP and minimal A2A.
Runtime export: headless systems such as Devin and Open SWE get runtime bundles instead of fake attach commands.
Provider-only and fallback governance: unsupported generators can still be governed through bootstrap, findings, proofs, and validation flows.

Common attach targets today:

Plugin-native and benchmarked first path: opencode
Hook-native: claude-code, copilot, windsurf, cline, kiro, augment
Other plugin-native: amp
File-plan-mode: pi
Prompt or command-native: continue, gemini-cli, goose, roo-code
Hook, skill, and MCP-native with headless/remote support: letta-code
Browser or embed companion: vscode
Review-only, command-driven, or local-plugin-capable: codex-cli, aider

Use the docs below for the precise truth per host:

What ControlKeel exposes

Web app:

/start for onboarding and execution brief creation
/missions/:id for mission control and approvals
/findings for cross-session findings
/proofs for immutable proof bundles
/skills for install/export compatibility and bundle inventory
/ship for deploy readiness and session metrics
/benchmarks for benchmark runs and cross-agent comparison
/observability for local workspace overview and session timeline
/observability/loop for the read-only human-gated learning loop

CLI:

controlkeel attach <agent>
controlkeel status
controlkeel findings
controlkeel proofs
controlkeel update
controlkeel skills list
controlkeel tool groups suggest
controlkeel plugin install codex
controlkeel run task <id>
controlkeel benchmark run --suite vibe_failures_v1 --subjects controlkeel_validate
controlkeel obs loop
controlkeel obs status
controlkeel help

For OpenCode, use controlkeel attach opencode for repo-local MCP/commands/skills/agents, and add the published @aryaminus/controlkeel-opencode package in opencode.json when you want the direct plugin package as well.

For Codex there are two different CK install paths:

controlkeel attach codex-cli installs the native .codex/ companion files, skills, commands, agents, and local MCP wiring.
controlkeel plugin install codex installs a local plugin bundle plus a local marketplace manifest for repo-local or home-local discovery.

That local marketplace path is not the same thing as being listed in OpenAI's curated Codex plugin catalog.

Full command coverage is available in the CLI itself through controlkeel help.

For MCP tool details, hosted protocol access, and the exact ck_context contract, use docs/agent-integrations.md and docs/support-matrix.md.

Docs

Start here:

Reference:

Architecture and release operations:

Development

mix setup
mix phx.server
mix test
mix precommit

Phoenix + Ecto on SQLite. Uses Req for HTTP. Single-binary builds ship through Burrito and GitHub Releases.

To run the benchmark suite locally:

controlkeel benchmark run --suite vibe_failures_v1 --subjects controlkeel_validate
controlkeel obs loop
controlkeel obs status
controlkeel benchmark run --suite benign_baseline_v1 --subjects controlkeel_validate
controlkeel benchmark export <RUN_ID> --format json

See docs/benchmark-guide.md for multi-host comparison setup and how to add Codex or OpenCode as subjects.

Local observability web cockpit includes /observability for workspace overview and /observability/loop for the read-only human-gated learning loop.

Install Server

license - not found

quality

maintenance

How are these scores calculated?

Maintenance

–Maintainers

–Response time

1dRelease cycle

26Releases (12mo)

Resources

GitHub Repository

Need Help?

Related Servers

Tools

View all tools

Latest Blog Posts

Lightport: Open-Sourcing Glama's AI Gateway
By punkpeye on April 27, 2026.
open source
OpenAI
Tool Definition Quality Score (TDQS)
By punkpeye on April 3, 2026.
mcp
The Hackers Who Tracked My Sleep Cycle
By punkpeye on March 26, 2026.
security

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/aryaminus/controlkeel'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

ControlKeel

Why this exists

Quick start

One-line setup via your agent

Install ControlKeel

First governed run

Why use ControlKeel? Benchmark-backed comparison

OpenCode / GPT-5.5 comparison (host_comparison_v1, 12 risky scenarios)

Other agents (pending)

Published surfaces

Core Bootstrap Package

Companion Packages

Distribution Bundles

Skills.sh / AgentSkills

Release Bundles

How OpenCode is configured with ControlKeel

What ControlKeel provides beyond validation

Local observability feedback loop

Supported hosts

What ControlKeel exposes

Docs

Development

Maintenance

Resources

Tools

Latest Blog Posts

MCP directory API

OpenCode / GPT-5.5 comparison (`host_comparison_v1`, 12 risky scenarios)