Skip to main content
Glama

Sandbox Agent

LangGraph agent with Docker-based sandboxed code execution. Each session runs in an isolated, hardened Docker container with a persistent kernel — IPython for Python, vm.createContext for Node.js, and a dedicated R environment. Supports 3 runtimes, provider-agnostic LLM configuration, and vision (auto-detection of multimodal models). Available as an interactive CLI, MCP server (Cursor, Claude Desktop), REST API (Aegra), and React frontend.

Features

  • Docker isolation — each session runs in its own container, no ports exposed, no host volumes

  • Hardened containers — non-root user (UID 65532), PID limits, memory+swap limits, tmpfs-only writable dirs, no-new-privileges

  • Crash detection — OOM-kill, fork bombs, segfaults are detected and reported clearly to the agent

  • Persistent state — variables survive between code executions (like Jupyter cells)

  • Checkpointer PostgreSQL — conversation history persists across restarts (shared with Aegra)

  • Async support — Promises (Node.js) and coroutines (Python) are automatically awaited

  • Multi-runtime — Python, Node.js, and R

  • Rich display outputs — captures matplotlib/ggplot figures, Plotly charts, IPython Audio, HTML widgets, and more; auto-sends images to multimodal LLMs

  • Provider-agnostic — works with OpenAI, Anthropic, Google Gemini, Ollama, or any compatible provider via langchain init_chat_model

  • Runtime package installpip install / npm install / install.packages() at session creation or via terminal

  • 6 toolscreate_session, execute_code, execute_terminal, import_files, export_files, stop_session

  • MCP server — expose the same tools via Model Context Protocol (stdio transport)

  • REST API — full LangGraph Platform API via Aegra with OpenAPI docs, streaming, thread management

  • Input validation — Pydantic schemas validate all tool inputs before execution, returning structured errors on failure

  • React frontend — SPA with chat, tool visualization, file upload/download, settings dialog (React 19 + Vite + Tailwind CSS)

  • File upload — upload files to the API for import into sandbox sessions (POST /threads/{id}/files/upload)

  • File export — register files for download (no host copy); download via API or use in cross-session import

  • File import — import from host paths, inline content, or from another session (files exported in same conversation)

  • Cross-session transfer — export from session A, import into session B with {session_id, path}

  • Session garbage collection — idle timeout, max lifetime, thread eviction, orphan container cleanup

  • Auto-cleanup — all containers are stopped and removed when the agent exits

Prerequisites

  • Python 3.11+

  • Docker Engine

  • API key for your LLM provider (CHAT_MODEL_API_KEY)

  • PostgreSQL (for API/CLI mode — checkpointer + Aegra)

  • Node.js 18+ and npm (for the React frontend)

Setup

# Docker — installs (if needed), configures permissions, and builds all 3 images
sudo ./setup-docker.sh

# Install Python dependencies (open a new terminal so the docker group is active)
uv sync

# Install frontend dependencies
cd frontend && npm install && cd ..

# Configure environment
cp .env.example .env
# Edit .env with your CHAT_MODEL_API_KEY, POSTGRES_PASSWORD, and other settings

# Docker images are also built automatically on first use if not already present

PostgreSQL (required for CLI, API, and UI)

PostgreSQL is auto-started via Docker Compose when using localhost. The CLI detects if PostgreSQL is reachable and starts it automatically:

# Manual start (if needed)
docker compose up postgres -d

Or point to an existing PostgreSQL instance via POSTGRES_* env vars in .env.

Usage

All commands use the unified sandbox-agent entry point:

uv run sandbox-agent cli       # Interactive CLI (default)
uv run sandbox-agent mcp       # MCP server (Cursor, Claude Desktop)
uv run sandbox-agent api       # REST API (Aegra, no reload)
uv run sandbox-agent api dev   # REST API with hot reload
uv run sandbox-agent ui        # React UI (auto-starts API if needed)

CLI

uv run sandbox-agent cli
# or simply
uv run sandbox-agent

The CLI operates as a thin client on top of the Aegra REST API. Requires the API to be running (uv run sandbox-agent api). Features:

  • Rich panels with syntax-highlighted tool I/O (per-runtime lexer)

  • Streaming agent output with Markdown rendering

  • Persistent thread across restarts (~/.local/state/sandbox-agent/cli-thread.json)

  • /new command to start a fresh conversation

  • Passes model/provider/key settings to the API via configurable

MCP Server

Run the MCP server (stdio transport) for integration with Cursor, Claude Desktop, or any MCP-compatible client:

uv run sandbox-agent mcp

Cursor or Claude Desktop

Add the following MCP config:

{
  "mcpServers": {
    "sandbox-agent": {
      "command": "uv",
      "args": ["--directory", "/path/to/sandbox-agent", "run", "sandbox-agent", "mcp"]
    }
  }
}

The MCP server exposes the same 6 tools as the CLI agent with identical behavior. It maintains a persistent thread_id in ~/.local/state/sandbox-agent/mcp-thread.json for export URL consistency.

The import_files tool accepts file content directly (as text or base64 via file_content/encoding keys), host paths (via source/destination), or cross-session references (session_id+path). The export_files tool registers files for download via GET /threads/{thread_id}/files/download?session_id=...&path=....

REST API (Aegra)

Run the agent as a REST API via Aegra (self-hosted LangGraph Platform alternative):

uv run sandbox-agent api       # Production mode (no reload, auto-starts PostgreSQL)
uv run sandbox-agent api dev   # Development mode (hot reload via aegra dev)

The production command auto-starts PostgreSQL via Docker Compose if it's not reachable on localhost. The server runs at http://localhost:8000 with OpenAPI docs at /docs. Use the LangGraph SDK or curl to create assistants, threads, and stream runs. Compatible with Agent Chat UI, LangGraph Studio, and CopilotKit.

Custom endpoints:

  • GET /threads/{thread_id}/files/download?session_id=...&path=... — streams exported files from containers

  • POST /threads/{thread_id}/files/upload — uploads files to be available for import into sandbox sessions

  • DELETE /threads/{thread_id} — also cleans up Docker sessions and storage for that thread (via middleware)

  • GET /settings — returns persisted frontend settings merged over backend .env defaults

  • PUT /settings — persist frontend settings to PostgreSQL (encrypted)

React Frontend

A web UI for chatting with the agent via the Aegra API (React 19 + Vite + Tailwind CSS):

# Install frontend dependencies (if not done during setup)
cd frontend && npm install && cd ..

# Start the UI (auto-starts API + PostgreSQL if needed)
uv run sandbox-agent ui

The frontend runs at http://localhost:5173 (Vite dev server with API proxy to :8000). Features:

  • Thread management (create, resume, delete conversations) via sidebar

  • Streaming responses with expandable tool blocks (syntax-highlighted per runtime)

  • File upload and download support

  • Thinking block visualization

  • Settings dialog (model, provider, API key, base URL, vision toggle)

  • Persistent settings via server-side API (GET/PUT /settings), with backend .env defaults as fallback

Programmatic

from sandbox_agent.sandbox import SandboxManager

manager = SandboxManager()

info = manager.create_session(
    runtime="python",
    dependencies={"pandas": "2.2.3", "matplotlib": ""},
)
sid = info.session_id

r1 = manager.execute_code(sid, """
import pandas as pd
df = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})
print(df.describe())
""")
print(r1.stdout)

# Variables persist between calls
r2 = manager.execute_code(sid, "df.shape")
print(r2.result)

# Export files from the sandbox (registers for download, no host copy)
manager.execute_code(sid, "df.to_csv('/workspace/output.csv', index=False)")
export = manager.export_files(sid, [{"source": "output.csv"}])
print(export.files[0].session_id, export.files[0].path)

manager.stop_session(sid)

Exporting Files

export_files registers files for download and cross-session import (no host copy). Files become available via the API (GET /threads/{thread_id}/files/download?session_id=...&path=...) and for import_files in other sessions:

# Export a single file
result = manager.export_files(sid, [{"source": "report.pdf"}])

# Export an entire directory
result = manager.export_files(sid, [{"source": "results/"}])

# Export multiple files at once
result = manager.export_files(sid, [
    {"source": "data.csv"},
    {"source": "chart.png"},
    {"source": "/workspace/logs/"},
])

for f in result.files:
    print(f"{f.session_id}:{f.path} ({'OK' if f.success else f.error})")

Cross-Session File Transfer

Use export_files + import_files to move files between sessions (even across different runtimes):

# Session A (Python): produce data
sid_a = manager.create_session(runtime="python", dependencies={"pandas": ""}).session_id
manager.execute_code(sid_a, """
import pandas as pd
df = pd.DataFrame({'x': [1,2,3], 'y': [4,5,6]})
df.to_csv('/workspace/data.csv', index=False)
""")
export = manager.export_files(sid_a, [{"source": "data.csv"}])
path = export.files[0].path  # /workspace/data.csv

# Session B (R): consume the same data
sid_b = manager.create_session(runtime="r", dependencies={"readr": ""}).session_id
manager.import_files(sid_b, [{"session_id": sid_a, "path": path, "destination": "data.csv"}])
manager.execute_code(sid_b, 'df <- readr::read_csv("/workspace/data.csv"); summary(df)')

Importing Files

import_files copies files into the sandbox from the host or from another session:

# Import from host
result = manager.import_files(sid, [
    {"source": "/home/user/data.csv", "destination": "data.csv"},
    {"source": "/home/user/project/", "destination": "project/"},
])

# Import from another session (file must have been exported first)
result = manager.import_files(sid, [
    {"session_id": "abc123", "path": "/workspace/out.csv", "destination": "out.csv"},
])

Other runtimes work the same way — pass runtime="node" or runtime="r" to create_session.

Async Code

Node.js — if the last expression returns a Promise, the kernel awaits it before collecting output. Top-level await is also supported (falls back to an async IIFE wrapper when needed).

const axios = require('axios');
async function fetchData() {
    const resp = await axios.get('https://api.example.com/data');
    console.log(resp.data);
}
fetchData(); // Promise is awaited automatically

Python — IPython's autoawait handles top-level await. If a cell returns an unawaited coroutine, the kernel detects it and runs it with asyncio.run().

import aiohttp

async def fetch_data():
    async with aiohttp.ClientSession() as session:
        resp = await session.get('https://api.example.com/data')
        print(await resp.text())

fetch_data()  # coroutine is detected and executed automatically

Container Security

Each container is created with the following protections:

Protection

Setting

Effect

Memory limit

2048m (no swap)

OOM-kill on overflow, host unaffected

PID limit

512

Fork bombs are contained and killed

CPU quota

2 cores

Prevents CPU starvation on host

Writable dirs

tmpfs (/workspace, /tmp, /home/sandbox)

tmpfs dirs never touch host disk

tmpfs size

200m per mount

Limits in-container disk usage

User

sandbox (UID 65532)

No root inside container

Privileges

no-new-privileges

Cannot escalate via setuid/setgid

Network

Configurable (enabled by default)

Can be disabled per session

When a container crashes, the agent receives a clear CONTAINER_DIED error with the reason (OOM-killed, SIGKILL, segfault, etc.) and a hint to recreate the session.

Session Lifecycle

Sessions are automatically managed with garbage collection:

Behavior

Default

Setting

Idle timeout

30 min

SESSION_IDLE_TTL_SECONDS

Max lifetime

2 hours

SESSION_MAX_LIFETIME_SECONDS

GC interval

60 sec

SESSION_GC_INTERVAL_SECONDS

Max active threads

10

SESSION_MAX_ACTIVE_THREADS

Max sessions (global)

5

CONTAINER_MAX_SESSIONS

Max sessions per thread

3

CONTAINER_MAX_SESSIONS_PER_THREAD

Orphan cleanup age

5 min

CONTAINER_ORPHAN_MIN_AGE_SECONDS

On startup, the manager removes orphan containers older than the minimum age. On exit, all containers are stopped and removed via atexit and signal handlers (SIGTERM/SIGINT).

Configuration

All settings can be overridden via environment variables or .env. Defaults are shown from settings.py:

# ── LLM (provider-agnostic) ──
CHAT_MODEL=gpt-4o                    # Model name
CHAT_MODEL_PROVIDER=openai           # Provider: openai, anthropic, google_genai, ollama
CHAT_MODEL_API_KEY=sk-...            # API key (required)
CHAT_MODEL_BASE_URL=                 # Custom API base URL (optional)
CHAT_MODEL_SUPPORTS_VISION=          # Override vision detection (true/false, empty = auto)

# ── Container Limits ──
CONTAINER_MEMORY_LIMIT=2048m         # Docker memory limit (no swap)
CONTAINER_CPU_QUOTA=200000           # CPU quota (100000 = 1 core)
CONTAINER_PIDS_LIMIT=512             # Max PIDs per container
CONTAINER_TMPFS_SIZE=200m            # tmpfs size for writable dirs
CONTAINER_EXECUTION_TIMEOUT_SECONDS=30  # Default code execution timeout
CONTAINER_MAX_SESSIONS=5             # Max concurrent sessions (global)
CONTAINER_MAX_SESSIONS_PER_THREAD=3  # Max sessions per conversation
CONTAINER_EXECUTE_AS_ROOT=False      # Run terminal commands as root
CONTAINER_NETWORK_ENABLED=True       # Enable container networking (disable per session)
CONTAINER_ORPHAN_MIN_AGE_SECONDS=300 # Min age before orphan cleanup (5 min)

# ── Session Lifecycle / GC ──
SESSION_IDLE_TTL_SECONDS=1800        # Idle timeout (30 min)
SESSION_MAX_LIFETIME_SECONDS=7200    # Hard lifetime cap (2 hours)
SESSION_GC_INTERVAL_SECONDS=60       # GC check interval
SESSION_MAX_ACTIVE_THREADS=10        # Max active threads before eviction

# ── Output Truncation (characters) ──
MAX_STDOUT_CHARS=50000
MAX_STDERR_CHARS=120000
MAX_RESULT_CHARS=30000
MAX_TRACEBACK_CHARS=8000

# ── Encryption ──
ENCRYPTION_KEY=                      # Fernet key for settings encryption (optional)

# ── Storage ──
STORAGE_DIR=./storage                # Base dir for uploads
IMPORT_ALLOWED_DIRS=                 # Comma-separated host dirs allowed for import (empty = all)

# ── API ──
API_BASE_URL=http://127.0.0.1:8000   # API URL (for export download URLs)

# ── Agent ──
MAX_ITERATIONS=25                    # Max LangGraph iterations (recursion limit)

# ── PostgreSQL (checkpointer + Aegra) — all required, no defaults ──
POSTGRES_USER=sandbox_agent
POSTGRES_PASSWORD=sandbox_agent_secret
POSTGRES_DB=sandbox_agent
POSTGRES_HOST=localhost
POSTGRES_PORT=5432

Runtimes

Runtime

Base Image

Kernel

IPC

Pre-installed

Python

python:3.12-slim

IPython shell

UNIX socket

IPython + system libs

Node.js

node:22-slim

vm.createContext

UNIX socket

Bare runtime

R

rocker/r-ver:4

Dedicated R env

TCP :8765

jsonlite, base64enc, tidyverse, data.table, readxl, haven, httr2, DBI, RSQLite, rmarkdown, knitr, devtools, glmnet, randomForest

The R container uses a compiled C client binary for IPC, while Python and Node.js use native clients.

Architecture

flowchart TB
    CLI["CLI · Rich REPL"]
    MCP["MCP Server · FastMCP (stdio)"]
    UI["React · Frontend"]

    CLI --> API["Aegra REST API
    (LangGraph Platform)"]
    UI --> API
    API --> Agent["LangGraph ReAct Agent"]
    Agent --> Tools["LangChain Tools"]
    MCP --> Core["Core Tool Functions"]

    Tools --> Core
    Core --> SM["SandboxManager
    Docker SDK"]

    SM -->|"docker exec -i + JSON pipe"| Docker

    subgraph Docker ["Docker Containers
    isolated, hardened"]
        direction LR
        PY["Python
        IPython · UNIX socket"]
        JS["Node.js
        vm.createContext · UNIX socket"]
        R["R
        R env · TCP :8765"]
    end

    subgraph Storage ["Persistence"]
        PG["PostgreSQL
        checkpoints, exports"]
    end

    API --> PG
    SM --> PG

Inside each container, a persistent kernel (PID 1) holds execution state, and an ephemeral client connects to it via UNIX socket (Python/Node.js) or TCP (R) for each docker exec call:

flowchart TB
    SM["SandboxManager"] -->|"docker exec -i"| Client["Client (ephemeral)"]

    subgraph container ["Container"]
        Client -->|"UNIX socket / TCP"| Kernel["Kernel (PID 1, persistent)"]
        Kernel --- State["State
        variables, imports, data"]
    end

Testing

# Unit tests (no Docker required)
uv run pytest tests/test_cli.py tests/test_http_app.py -v

# Integration tests (requires Docker)
uv run pytest tests/test_manager.py tests/test_tools.py tests/test_export_files.py tests/test_mcp.py -v

# LangGraph debug trace (requires Docker + LLM API key)
uv run pytest tests/test_langgraph_debug.py -v -s

# API integration tests (requires Docker + running API: uv run sandbox-agent api dev)
uv run pytest tests/test_api.py -v -s

# Full suite
uv run pytest tests/ -v

Production Deployment

A production Dockerfile and docker-compose.yml are included:

# Start PostgreSQL + API
docker compose up -d

# Or build and run manually
docker build -t sandbox-agent-api .
docker run -p 8000:8000 --env-file .env sandbox-agent-api

The production image uses aegra serve with a non-root app user.

License

MIT — Eduardo Ramon Resser

Install Server
A
license - permissive license
A
quality
B
maintenance

Maintenance

Maintainers
Response time
5dRelease cycle
6Releases (12mo)

Resources

Unclaimed servers have limited discoverability.

Looking for Admin?

If you are the server author, to access and configure the admin panel.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/eduresser/sandbox-agent'

If you have feedback or need assistance with the MCP directory API, please join our Discord server