nano-vm-mcp
Click on "Install Server".
Wait a few minutes for the server to deploy. Once ready, it will show a "Started" state.
In the chat, type
@followed by the MCP server name and your instructions, e.g., "@nano-vm-mcprun a program to fetch weather data"
That's it! The server will respond to your query, and you can continue using it as needed.
Here is a step-by-step guide with screenshots.
What nano-vm-mcp Is
nano-vm-mcp is an MCP gateway that turns the Model Context Protocol into a governance-bound execution environment. It wraps the llm-nano-vm execution kernel and exposes it to any MCP client — Claude Desktop, Claude Code, custom agents, or API callers — through stdio or SSE transport.
Most MCP servers expose stateless tools. nano-vm-mcp exposes stateful, governed, auditable workflows.
Capability | Typical MCP Server | nano-vm-mcp |
Tool execution | ✅ | ✅ |
Stateful workflows | ❌ | ✅ |
Deterministic FSM | ❌ | ✅ |
Replayable traces | ❌ | ✅ |
Suspend / resume | ❌ | ✅ |
LLM output enforcement | ❌ | ✅ |
Capability enforcement (double gate) | ❌ | ✅ |
Append-only audit trail | ❌ | ✅ |
GDPR tombstoning | ❌ | ✅ |
Evaluator blindness by design | ❌ | ✅ |
Inter-session idempotency | ❌ | ✅ |
Core invariant: the gateway does not own execution logic — the FSM kernel does.
δ(S, E) → S'
S — current execution state
E — validated event
S' — next deterministic stateArchitecture
MCP Client / Claude Code
↓
nano-vm-mcp (Gateway) ← decides how execution is allowed to proceed
→ GovernedRunProgramHandler ← PolicySnapshot, idempotency_key, CapabilityRef
→ llm-nano-vm (Kernel) ← deterministic FSM, ASTEngine, ProjectionLayer
→ GovernanceEnvelope store ← SQLite WAL, append-only audit log
→ idempotency_keys store ← idempotent re-execution across restarts
↓
deterministic FSM ← guarantees correctness
↓
GovernanceEnvelope ← proves it happenedStrict isolation: the gateway never touches execution logic. The kernel never touches persistence or policy. Each layer has a single responsibility and cannot cross the boundary.
Install
pip install nano-vm-mcp
pip install 'nano-vm-mcp[litellm]' # for llm stepsMCP Tools
Tool | Description |
| Execute a |
| Retrieve full |
| List saved programs ( |
| Retrieve saved |
| Delete a program and all its traces |
Quick Start
stdio — Claude Desktop / local MCP client
nano-vm-mcp --transport stdioclaude_desktop_config.json or .mcp.json:
{
"mcpServers": {
"nano-vm-mcp": {
"command": "nano-vm-mcp",
"args": ["--transport", "stdio"]
}
}
}SSE — VPS / remote clients
NANO_VM_MCP_API_KEY=your-secret-token nano-vm-mcp --transport sse --port 8080MCP client URL: http://<host>:8080/sse
Auth header: Authorization: Bearer your-secret-token
Docker Compose
services:
nano-vm-mcp:
image: ghcr.io/ale007xd/nano-vm-mcp:latest
ports:
- "8080:8080"
volumes:
- ./data:/data
environment:
NANO_VM_MCP_DB: /data/nano_vm_mcp.db
NANO_VM_MCP_PORT: 8080
NANO_VM_MCP_API_KEY: your-secret-token
command: ["nano-vm-mcp", "--transport", "sse"]Claude Code Dynamic Workflows
Claude Code decides what to do. nano-vm-mcp decides how execution is allowed to proceed.
Claude Code Dynamic Workflows give you parallel subagents and dynamic orchestration. They don't give you deterministic step execution, replayable audit trails per step, or idempotent re-execution across restarts. nano-vm-mcp closes exactly that gap.
Claude Code ← decides what to do
↓
nano-vm-mcp ← enforces how execution proceeds
↓
deterministic FSM ← guarantees correctness
↓
GovernanceEnvelope ← proves it happenedClaude Code Dynamic Workflows | + nano-vm-mcp | |
Parallel subagents | ✅ | ✅ |
Dynamic orchestration | ✅ | ✅ |
Deterministic step execution | ❌ | ✅ |
Replayable audit trail per step | ❌ | ✅ |
LLM output enforcement | ❌ | ✅ |
Inter-session idempotency | ❌ | ✅ |
GDPR tombstoning | ❌ | ✅ |
Evaluator blindness | ❌ | ✅ |
Use this combination when a workflow subagent must execute a governed process — payment pipeline, approval chain, compliance check — where correctness and auditability matter beyond the LLM layer.
Example: governed payment step inside a Claude Code workflow
# Claude Code subagent calls this tool directly
result = await session.call_tool(
"run_program",
{
"program": {
"name": "payment_pipeline",
"steps": [
{"id": "validate", "type": "tool", "tool": "validate_amount"},
{"id": "reserve", "type": "tool", "tool": "reserve_funds"},
{"id": "capture", "type": "tool", "tool": "capture_payment"},
{"id": "receipt", "type": "tool", "tool": "send_receipt",
"is_terminal": True},
]
},
"idempotency_key": "order-abc-123",
}
)
# Returns: trace_id, status, step count, cost
# Every step: GovernanceEnvelope in SQLite — tamper-evident, append-onlyThe subagent cannot skip steps, reorder execution, or bypass capability checks — regardless of what the LLM decides at the orchestration layer.
Retrieve the audit trail
trace = await session.call_tool("get_trace", {"trace_id": result["trace_id"]})
# Returns: per-step status, duration_ms, usage, state_snapshotsTraces persist across sessions in SQLite WAL. trace_id is UUID4-stable for OTel propagation.
Idempotency — Inter-session Re-execution Safety
Pass idempotency_key to run_program to guarantee that a program executes at most once per key, even across process restarts:
# First call — executes normally, result cached
result = await session.call_tool("run_program", {
"program": program,
"idempotency_key": "payment-order-xyz-001",
})
# Second call with same key — returns cached result immediately, no re-execution
result = await session.call_tool("run_program", {
"program": program,
"idempotency_key": "payment-order-xyz-001",
})Crash recovery: if the process crashes after program start but before completion (status=pending), the next call with the same key overwrites the pending entry and re-executes. Once the result is written as status=success, it is immutable for that key.
Note on "exactly-once": the FSM guarantees idempotent re-execution — the same key never triggers a second run after success. External side effects (payment capture, webhook delivery) are only as idempotent as the tools you register. This is the same contract Temporal and Cadence operate under.
Governance Layer
GovernanceEnvelope
Each successful execution step produces an immutable GovernanceEnvelope stored in the governance_envelopes table. Envelopes are written only on error=None — they form a tamper-evident, append-only audit trail of successful transitions only.
Field | Type | Description |
|
| Session / trace identifier |
|
| Step index within the execution |
|
| SHA-256 of the active |
|
| Merkle/delta hash of |
|
| Projected (sanitized) step output |
PolicySnapshot and CapabilityRef
PolicySnapshot is a frozen Pydantic model created once per session. It carries the set of allowed tool names and is hashed (SHA-256) before execution starts. Every GovernanceEnvelope records this hash — post-hoc modification of the policy is detectable.
from nano_vm.contracts import PolicySnapshot, CapabilityRef
policy = PolicySnapshot(
tool_capabilities={"reserve_funds", "capture_payment", "send_receipt"},
)
# policy.hash() → SHA-256 hex, stored in every GovernanceEnvelope.policy_hashCapabilityRef wraps sensitive values as opaque tokens (vault://secret/<id>) rather than storing raw plaintext in CanonicalState. The token is resolved JIT during tool execution and never written to the audit log.
ref = CapabilityRef(ref_id="card-4242", value="4242424242424242")
# Stored in state as: vault://secret/card-4242
# GovernanceEnvelope.payload contains the token, not the card numberGDPR Tombstoning
On a GDPR erasure event:
Target ref is tombstoned (
is_tombstone=True)All subsequent projections return
[REDACTED_TOMBSTONE]The
canonical_snapshot_hashchain remains valid — forensic auditability is preservedThe secret is permanently gone
vm.erase(ref_id="card-4242")
# Hash chain remains intact — the erasure itself is auditableExecution Traces
Every step writes a TRACE projection to the execution_traces table — a sanitized snapshot of state with sensitive values replaced by CapabilityRef tokens, protected by a unique index on (execution_id, step_index):
steps = store.get_trace_steps(execution_id="exec-abc-123")
# [
# {"step_index": 0, "step_id": "validate", "projected_json": "...", "canonical_hash": "..."},
# {"step_index": 1, "step_id": "reserve", "projected_json": "...", "canonical_hash": "..."},
# ]Evaluator Blindness by Design
Benchmark research across 10 frontier models found that every model changes behavior when it detects it's being monitored. The best model scored 84% on "acts the same whether watched or not."
The ProjectionLayer gives the LLM only a target=LLM projection of state. Governance metadata — GovernanceEnvelope, canonical_hash, policy_hash, retry counters — never reaches the prompt. The model cannot observe its own audit trail.
Evaluator blindness is structural, not configured. It is a consequence of the strict isolation between the kernel and gateway layers.
Determinism and LLM Steps
State determinism — the FSM kernel guarantees execution order, no step skipping, and reproducible trace structure regardless of LLM output. The graph of transitions is fixed at program definition time. This is unconditional.
Semantic determinism — the text produced by an LLM step may differ across runs even at temperature=0.0. nano-vm does not guarantee semantic determinism and does not try to.
These are orthogonal concerns. The runtime enforces state determinism; you control semantic determinism through prompt engineering and allowed_outputs.
LLM output enforcement at the runtime level
allowed_outputs (v0.8.0) validates the model's raw output against an explicit enum before it enters the FSM context. This isn't a prompt hint — it's a runtime gate.
{
"id": "classify",
"type": "llm",
"prompt": "Is this a valid refund request? Reply ONLY with: yes or no",
"output_key": "decision",
"allowed_outputs": ["yes", "no"], # runtime enforcement — not a prompt hint
"on_error": "skip", # output → "yes" (first element) on mismatch
}Security
ASTEngine — sandboxed condition evaluation
Conditions are evaluated by the ASTEngine — a deterministic sandboxed interpreter with no access to Python builtins, attribute access, or callable invocation. eval() is not used anywhere in the production execution path.
Rules for safe use:
Condition logic must be authored by you, not generated from untrusted input at runtime.
LLM output may appear as a value being tested (
'yes' in '$decision'), never as the condition expression itself.
Capability enforcement — double gate
Tool execution passes through two independent enforcement layers:
Layer | Mechanism |
| Verifies tool name against |
| Rejects any tool name not registered in the tool registry with |
Neither gate can be bypassed by LLM output.
SSE transport and auth
Set NANO_VM_MCP_API_KEY to enable bearer token authentication (secrets.compare_digest — timing-safe). If unset, a warning is logged and all requests are allowed — suitable for localhost only.
Do not expose the SSE endpoint to the public internet without NANO_VM_MCP_API_KEY set.
Configuration
Variable | Default | Description |
|
| SQLite WAL database path |
|
| SSE bind host |
|
| SSE bind port |
| (unset) | Bearer token for SSE auth |
| (unset) | LiteLLM model string for |
Endpoints
Path | Auth | Description |
| none | Liveness probe — always returns |
| bearer | SSE transport entry point |
| bearer | MCP message endpoint |
Performance
The FSM runtime introduces near-zero overhead. The bottleneck is always the LLM API or external I/O.
Sequential execution (single FSM instance): one step at a time per execution_id — deliberate design choice, makes traces deterministic and replayable.
Parallel execution across independent workflows: fan out across multiple execution_id instances. SQLite WAL handles concurrent writers without locking.
Benchmarks (v0.7.3, Mock adapter, QEMU/KVM · Intel Xeon E5-2697A v4 · 2 cores · Python 3.12)
Scenario | Mean TPS | p95 |
Refund pipeline (sequential) | 2,300/s | 0.66 ms |
MCP store round-trip | 3,000/s | 0.42 ms |
GovernanceEnvelope write | 1,300/s | 171 ms |
Parallel throughput ( | 436/s | 542 ms |
Replay equivalence | 1,300/s | 1.30 ms |
Long-horizon (30-step program) | 30/s | 3,606 ms |
Observability
trace.trace_id # UUID4 — stable for OTel propagation
trace.status # SUCCESS | FAILED | SUSPENDED | BUDGET_EXCEEDED | STALLED
trace.final_output
trace.steps # per-step: step_id, status, duration_ms, usage
trace.state_snapshots # list[(step_index, sha256_hex)]Traces are persisted to SQLite and retrievable by trace_id across sessions via get_trace.
Execution State Model
CREATED
↓
RUNNING ──── tool returns "PENDING" ──→ SUSPENDED
│ │
│ resume_with_program()
│ │
└──────────────────────────────────────────┘
│
├── no more steps ──→ SUCCESS
├── tool error (on_error=fail) ──→ FAILED
├── max_steps / max_tokens exceeded ──→ BUDGET_EXCEEDED
└── max_stalled_steps exceeded ──→ STALLEDTerminal states: SUCCESS, FAILED, BUDGET_EXCEEDED, STALLED. All are immutable.
Relationship to llm-nano-vm
Layer | Responsibility |
| Deterministic FSM execution, ASTEngine, ProjectionLayer, step lifecycle |
| MCP transport, persistence, governance, idempotency, capability enforcement |
The gateway never owns transition logic. The FSM kernel does.
The kernel is MIT-licensed, independently versioned on PyPI (llm-nano-vm), and fully documented. Either layer can be used standalone or replaced — the boundary between them is a stable Python interface.
Contact & Support
Author: @ale007xd on Telegram · @ale007xd on X
USDT (TON): UQCakyytrEGBikOi3eYMpveGHXDB1-fd6lcuQC9VvKqMrI-9
License
Maintenance
Resources
Unclaimed servers have limited discoverability.
Looking for Admin?
If you are the server author, to access and configure the admin panel.
Latest Blog Posts
MCP directory API
We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ale007XD/nano-vm-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server