Agent Lab
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@Agent LabRun an experiment with task 'Write a test' and model 'gpt-4'"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
Agent Lab
Run and test agentic systems in isolation. Agent Lab runs OpenCode in a Docker sandbox ("vacuum") with controlled settings and lets you observe how an agent behaves under varied system prompts, models, and task prompts — one run or many in parallel. It is built primarily to be called by agents (over MCP), and secondarily by humans (CLI).
Vary system prompt / model / task prompt; run isolated, capture the full behavior trace.
Two interfaces over one engine: MCP (stdio) and CLI — both agent-friendly.
Three network modes and guaranteed sandbox teardown.
Prerequisites
Bun 1.x —
bun --versionDocker running —
docker --versionOpenCode configured on the host — a provider set up in
~/.config/opencode(auth in~/.local/share/opencode). These are mounted read-only into each sandbox; nothing is baked into the image.
Related MCP server: mcp-eval-harness
Install
bun install
bun link # exposes `agent-lab` + `agent-lab-mcp` on PATHGet the sandbox image (opencode serve) — either pull the published multi-arch image:
docker pull ghcr.io/shutovks/agent-lab-opencode:latest
docker tag ghcr.io/shutovks/agent-lab-opencode:latest agent-lab-opencode:latest…or build it locally:
docker build -t agent-lab-opencode:latest docker/The engine, CLI, and MCP server all run on the host (where Docker + your OpenCode config live). Experiments run inside isolated containers. Runs are persisted under
runs/<runId>/relative to the working directory the server/CLI is launched from.
Use from an agent — MCP (recommended)
Agent Lab exposes an MCP stdio server with four tools:
Tool | Arguments | Returns |
|
|
|
| — | known runs |
|
| full run record + trace (steps, tool calls, tokens, output, git diff) |
|
| structural behavior diff vs. the first (baseline) |
Claude Code
This repo ships a .mcp.json, so opening the project in Claude Code registers the server
automatically. To use it from any project after bun link:
{
"mcpServers": {
"agent-lab": {
"command": "agent-lab-mcp"
}
}
}OpenCode
In opencode.json (or ~/.config/opencode/opencode.jsonc):
{
"mcp": {
"agent-lab": {
"type": "local",
"command": ["agent-lab-mcp"]
}
}
}Typical agent flow
run_experimentwith prompt variant A →runId_Arun_experimentwith prompt variant B →runId_Bcompare_runs [runId_A, runId_B]→ see which variant used fewer steps/tokens or a different tool sequence. Results come back as text andstructuredContent(machine-readable).
Use from a shell — CLI
Agents with a shell tool (and humans) can call the CLI; every command prints parseable JSON.
agent-lab run --system "You are careful." --model cpa/glm-5.2 --task "Refactor the parser."
agent-lab run --config matrix.json --concurrency 3 # variation matrix, run in parallel
agent-lab run --from <runId> # replay a stored experiment
agent-lab list
agent-lab show <runId>
agent-lab compare <runId-a> <runId-b>Config file (--config) is either a single definition or a variation matrix:
{
"base": {
"systemPrompt": "You are a concise agent.",
"model": "cpa/glm-5.2",
"taskPrompt": "placeholder",
"sandbox": { "image": "agent-lab-opencode:latest", "networkAllowlist": ["cpa.funxyz.fun"], "timeoutMs": 120000 }
},
"variations": { "taskPrompt": ["Task A", "Task B"] }
}Sandbox backends
Set backend on the sandbox options:
docker(default) — one container per run; strong FS/PID/network isolation; the vacuum network mode is enforced with an in-container iptables allowlist. Requires Docker.microsandbox— a libkrun microVM per run, no Docker daemon. Same behavior behind the same contract (port publish,NetworkPolicyegress allowlist, guaranteed teardown). Requires the microsandbox runtime (curl -fsSL https://install.microsandbox.dev | sh) and a registry image (microsandbox pulls images from a registry, not a local Docker build), on macOS Apple Silicon or Linux+KVM. The SDK is lazy-loaded, so the Docker path never touches it.
Network modes
Set networkMode on the sandbox options:
open(default) — bridge networking; the agent can reach its LLM. Fast, egress open.vacuum— strict deny-by-default egress via an in-container iptables allowlist (only DNS + the resolved allowlist hosts, e.g. the LLM endpoint + opencode infra). IPv6 fails closed.
What gets captured (RunTrace)
runId, experiment metadata, status (success/error/timeout), timings, ordered steps
(assistant messages + tool calls with ok/error), tokenUsage, finalOutput (text + git diff),
and error/partial when relevant.
More
docs/LIVE_RUN.md— end-to-end live run walkthrough.docs/— GRACE artifacts (requirements, technology, development plan, verification plan, knowledge graph).AGENTS.md— engineering protocol.
Known limitations
Teardown is guaranteed on normal, error, timeout, and container-crash paths, but not if the host
agent-labprocess is hard-killed (SIGKILL). Containers are labeledagent-lab.sandbox=1for cleanup:docker ps -aq --filter label=agent-lab.sandbox=1 | xargs docker rm -f.Vacuum: IPv6 is only reachable under a non-default docker IPv6 setup; DNS exfiltration to the configured resolver remains theoretically possible.
This server cannot be installed
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
- Why MCP Servers Need Execution Sandboxing (And Why Your Current Stack Isn't Enough)By Om-Shree-0709 on .Agentic AiPrompt InjectionWebAssembly
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/ShutovKS/agent-lab-opencode'
If you have feedback or need assistance with the MCP directory API, please join our Discord server