# Architecture Documentation
**Project:** Code Executor MCP
**Version:** 0.9.0
**Last Updated:** 2025-11-19
---
## Table of Contents
1. [System Overview](#system-overview)
2. [Core Components](#core-components)
3. [Progressive Disclosure Architecture](#progressive-disclosure-architecture)
4. [Security Architecture](#security-architecture)
5. [Discovery System](#discovery-system)
6. [Data Flow](#data-flow)
7. [Concurrency & Performance](#concurrency--performance)
8. [Design Decisions](#design-decisions)
9. [Resilience Patterns](#resilience-patterns)
10. [CLI Setup Wizard Architecture](#cli-setup-wizard-architecture)
11. [MCP Sampling Architecture (v1.0.0)](#mcp-sampling-architecture-v100)
---
## 1. System Overview
Code Executor MCP is a **universal MCP orchestration server** that implements the **progressive disclosure pattern** to eliminate context bloat from exposing multiple MCP servers' tool schemas.
### Problem Statement
Exposing 47 MCP tools directly to an AI agent consumes 141k tokens just for schemas, exhausting context before any work begins.
### Solution
**Two-tier access model:**
- **Tier 1 (Top-level):** 3 lightweight tools (~560 tokens)
- `executeTypescript` - Execute TypeScript code in Deno sandbox
- `executePython` - Execute Python code in Pyodide sandbox
- `health` - Server health check
- **Tier 2 (On-demand):** All MCP tools accessible via code execution
```typescript
// Inside sandbox, access any MCP tool on-demand
const result = await callMCPTool('mcp__zen__codereview', {...});
```
**Result:** 98% token reduction (141k → 1.6k tokens)
---
## 2. Core Components
### 2.1 Component Diagram
```
┌─────────────────────────────────────────────────────────────┐
│ AI Agent (Claude) │
│ (MCP Client Context) │
└────────────────┬────────────────────────────────────────────┘
│ MCP Protocol (STDIO)
│ Top-level tools: 3 tools, ~560 tokens
▼
┌─────────────────────────────────────────────────────────────┐
│ Code Executor MCP Server (Node.js) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ MCP Proxy Server (HTTP Localhost) │ │
│ │ • POST / (callMCPTool endpoint) │ │
│ │ • GET /mcp/tools (discovery endpoint - NEW v0.4.0) │ │
│ │ • Bearer token authentication │ │
│ │ • Rate limiting (30 req/60s) │ │
│ │ • Audit logging (AsyncLock mutex) │ │
│ └──────────────┬───────────────────────────────────────┘ │
│ │ │
│ ┌──────────────▼───────────────────────────────────────┐ │
│ │ MCP Client Pool │ │
│ │ • Manages connections to multiple MCP servers │ │
│ │ • Parallel queries (Promise.all) │ │
│ │ • Resilient aggregation (partial failure handling) │ │
│ │ • In-memory tool list (listAllTools) │ │
│ └──────────────┬───────────────────────────────────────┘ │
│ │ │
│ ┌──────────────▼───────────────────────────────────────┐ │
│ │ Schema Cache │ │
│ │ • LRU cache (max 1000 entries) │ │
│ │ • Disk persistence (~/.code-executor/cache.json) │ │
│ │ • 24h TTL with stale-on-error fallback │ │
│ │ • AsyncLock mutex (thread-safe writes) │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Sandbox Executors (Deno/Pyodide subprocesses) │ │
│ │ • Isolated execution context │ │
│ │ • Injected globals: │ │
│ │ - callMCPTool(name, params) │ │
│ │ - discoverMCPTools(options) - NEW v0.4.0 │ │
│ │ - getToolSchema(toolName) - NEW v0.4.0 │ │
│ │ - searchTools(query, limit) - NEW v0.4.0 │ │
│ │ • Restricted permissions (allowlist, network, fs) │ │
│ └──────────────────────────────────────────────────────┘ │
└────────────────┬────────────────────────────────────────────┘
│ MCP Protocol (STDIO)
│ External MCP Servers (parallel queries)
▼
┌─────────────────────────────────────────────────────────────┐
│ External MCP Servers (filesystem, zen, linear, etc.) │
│ • Queried in parallel via Promise.all (O(1) amortized) │
│ • Each returns tools/list and tools/call responses │
│ • Discovery: 50-100ms first call, <5ms cached │
└─────────────────────────────────────────────────────────────┘
```
### 2.2 Component Responsibilities
| Component | Responsibility (SRP) | Pattern | Concurrency Safe |
|-----------|---------------------|---------|------------------|
| MCP Proxy Server | Route HTTP requests, enforce auth/rate limiting, audit log | Proxy | Yes (AsyncLock on audit logs) |
| MCP Client Pool | Manage MCP connections, parallel query aggregation | Pool | Yes (read-only queries, write-once at startup) |
| Schema Cache | Cache tool schemas, disk persistence, LRU eviction | Cache | Yes (AsyncLock on disk writes) |
| Sandbox Executor | Execute untrusted code in isolated environment | Sandbox | Yes (independent subprocesses) |
| Discovery Functions | Provide in-sandbox tool discovery (v0.4.0) | Wrapper | Yes (stateless HTTP calls) |
---
## 3. Progressive Disclosure Architecture
### 3.1 Token Budget Preservation
**Design Goal:** Maintain ~1.6k tokens for top-level tools (98% reduction from 141k baseline)
**Achievement (v0.4.0):**
- **Tool count:** 3 tools (no increase from v0.3.x)
- **Token usage:** ~560 tokens (well below 1.6k budget)
- **Discovery functions:** Hidden from top-level (injected in sandbox only)
### 3.2 Two-Tier Access Model
**Tier 1: Top-Level Tools (Exposed to AI Agent)**
```typescript
// AI agent sees only these in context:
- executeTypescript(code, allowedTools?, timeoutMs?, permissions?)
- executePython(code, allowedTools?, timeoutMs?, permissions?)
- health()
```
**Tier 2: On-Demand Tools (Accessible Inside Sandbox)**
```typescript
// Inside executeTypescript code, AI agent can:
// 1. Execute any MCP tool (existing v0.3.x)
const result = await callMCPTool('mcp__zen__codereview', {
step: 'Analysis',
relevant_files: ['/path/to/file.ts'],
// ... other params
});
// 2. Discover available tools (NEW v0.4.0)
const allTools = await discoverMCPTools();
// Returns: ToolSchema[] (name, description, parameters)
// 3. Search tools by keyword (NEW v0.4.0)
const fileTools = await searchTools('file read write', 10);
// Returns: Top 10 tools matching keywords (OR logic, case-insensitive)
// 4. Inspect tool schema (NEW v0.4.0)
const schema = await getToolSchema('mcp__filesystem__read_file');
// Returns: Full JSON Schema for tool parameters + outputSchema (v0.6.0)
```
### 3.3 Output Schema Support (NEW v0.6.0)
**Design Goal:** Enable AI agents to understand tool response structure without trial execution
**Implementation:**
- All 3 code-executor tools provide Zod schemas for responses (`outputSchema`)
- Uses MCP SDK native support (ZodRawShape format)
- Graceful fallback for third-party tools without output schemas
**Response Schemas:**
```typescript
// ExecutionResult (run-typescript-code, run-python-code)
{
success: boolean,
output: string,
error?: string,
executionTimeMs: number,
toolCallsMade?: string[],
toolCallSummary?: ToolCallSummaryEntry[]
}
// HealthCheck (health)
{
healthy: boolean,
auditLog: { enabled: boolean },
mcpClients: { connected: number },
connectionPool: { active, waiting, max },
uptime: number,
timestamp: string
}
```
**Benefits:**
- ✅ AI agents know response structure upfront
- ✅ No trial-and-error required for filtering/aggregation
- ✅ Better code generation (correct field access)
- ✅ Optional field - no breaking changes
**Data Flow:**
```
1. Tool registration: Zod schema → MCP SDK Tool.outputSchema
2. Discovery: MCPClientPool returns ToolSchema with outputSchema
3. Schema cache: CachedToolSchema.outputSchema persisted (24h TTL)
4. Graceful fallback: Third-party tools return outputSchema: undefined
```
### 3.4 OutputSchema Protocol Support (v0.7.1+)
#### ✅ RESOLVED: MCP SDK v1.22.0 Native Support
**Status:** OutputSchema is now fully functional in the MCP protocol as of v0.7.1 (MCP SDK v1.22.0).
**What Changed:**
- ✅ MCP SDK v1.22.0 exposes `outputSchema` via `tools/list` protocol response
- ✅ All 5 code-executor tools expose response structure to AI agents
- ✅ External MCP clients can see outputSchema immediately
- ✅ No trial execution needed for response structure discovery
**Protocol Response (v1.22.0):**
```json
{
"tools": [
{
"name": "run-typescript-code",
"description": "...",
"inputSchema": { "type": "object", "properties": { ... } },
"outputSchema": { // ✅ NOW EXPOSED IN PROTOCOL
"type": "object",
"properties": {
"success": { "type": "boolean" },
"output": { "type": "string" },
"error": { "type": "string" },
"executionTimeMs": { "type": "number" }
}
}
}
]
}
```
**Verification Test:**
```bash
node test-outputschema-v122.mjs
# Result:
# ✅ run-typescript-code: outputSchema: YES! (6 fields)
# ✅ run-python-code: outputSchema: YES! (6 fields)
# ✅ health: outputSchema: YES! (6 fields)
# 🎉 SUCCESS! All tools have outputSchema exposed in protocol!
```
**Migration Details (v1.0.4 → v1.22.0):**
- Handler signatures updated: `(params)` → `(args, extra)`
- Added `RequestHandlerExtra` for request context (cancellation signals, session tracking)
- Runtime Zod validation preserved (zero functional changes)
- All 620 tests passing, zero regressions
**Impact:**
- **Issue #28 RESOLVED:** AI agents now see response structure upfront
- **No trial-and-error:** Agents can write correct filtering/aggregation code immediately
- **Progressive disclosure intact:** Still 98% token reduction (141k → 1.6k)
- **Future-proof:** Ready for ecosystem-wide outputSchema adoption
---
## 4. Security Architecture
### 4.1 Security Boundaries
```
┌─────────────────────────────────────────────────────────────┐
│ Security Boundary 1: MCP Proxy Server (Auth + Rate Limit) │
│ • Bearer token authentication (per-execution, 32-byte) │
│ • Rate limiting (30 req/60s per client) │
│ • Query validation (max 100 chars, alphanumeric+safe chars) │
│ • Audit logging (all requests, success/failure) │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────┐
│ Security Boundary 2: Tool Allowlist (Execution Gating) │
│ • Enforced by executeTypescript allowedTools parameter │
│ • Discovery bypasses allowlist (read-only metadata) │
│ • Execution still enforced (callMCPTool checks allowlist) │
│ • Trade-off documented: discovery = read, execution = write │
└─────────────────────────────────────────────────────────────┘
│
┌─────────────────────────────────────────────────────────────┐
│ Security Boundary 3: Sandbox Isolation (Code Execution) │
│ • Deno sandbox with restricted permissions │
│ • No filesystem access (unless explicitly allowed) │
│ • No network access (except localhost proxy) │
│ • No environment variable access │
│ • Memory limits enforced │
└─────────────────────────────────────────────────────────────┘
```
### 4.2 Security Trade-Off: Discovery Allowlist Bypass
**Decision (v0.4.0):** Discovery functions bypass tool allowlist for read-only metadata access.
**Rationale:**
- **Problem:** AI agents get stuck without knowing what tools exist (blind execution)
- **Solution:** Allow discovery of tool schemas (read-only metadata)
- **Mitigation:** Execution still enforces allowlist (two-tier security model)
- **Risk Assessment:** LOW - schemas are non-sensitive metadata, no execution without allowlist
**Security Model:**
| Operation | Allowlist Check | Auth Required | Rate Limited | Audit Logged |
|-----------|----------------|---------------|--------------|--------------|
| Discovery (discoverMCPTools) | ❌ Bypassed | ✅ Required | ✅ Yes (30/60s) | ✅ Yes |
| Execution (callMCPTool) | ✅ Enforced | ✅ Required | ✅ Yes (30/60s) | ✅ Yes |
**Constitutional Alignment:** This intentional exception is documented in spec.md Section 2 (Constitutional Exceptions) as BY DESIGN per Principle 2 (Security Zero Tolerance).
---
## 5. Discovery System (NEW v0.4.0)
### 5.1 Discovery Architecture
**Design Goal:** Enable AI agents to discover, search, and inspect MCP tools without manual documentation lookup.
```
┌─────────────────────────────────────────────────────────────┐
│ Discovery Flow (Single Round-Trip) │
│ │
│ AI Agent executes ONE TypeScript call: │
│ ┌─────────────────────────────────────────────────────┐ │
│ │ const tools = await discoverMCPTools(); │ │
│ │ const schema = await getToolSchema('tool_name'); │ │
│ │ const result = await callMCPTool('tool_name', {...});│ │
│ └─────────────────────────────────────────────────────┘ │
│ │
│ No context switching, variables persist across steps │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Sandbox → Proxy: HTTP GET /mcp/tools │
│ • 500ms timeout (fast fail, no hanging) │
│ • Bearer token in Authorization header │
│ • Optional ?q=keyword1+keyword2 search │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Proxy → MCP Servers: Parallel Queries (Promise.all) │
│ • Query all MCP servers simultaneously (O(1) amortized) │
│ • Use Schema Cache for schemas (24h TTL, disk-persisted) │
│ • Resilient aggregation (partial failures handled) │
│ • Performance: First call 50-100ms, cached <5ms │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Response: ToolSchema[] (JSON) │
│ [ │
│ { │
│ "name": "mcp__filesystem__read_file", │
│ "description": "Read file contents", │
│ "parameters": { /* JSON Schema */ } │
│ }, │
│ ... │
│ ] │
└─────────────────────────────────────────────────────────────┘
```
### 5.2 Discovery Functions
#### discoverMCPTools(options?)
**Purpose:** Fetch all available tool schemas from connected MCP servers
**Signature:**
```typescript
interface DiscoveryOptions {
search?: string[]; // Optional keyword array (OR logic, case-insensitive)
}
async function discoverMCPTools(
options?: DiscoveryOptions
): Promise<ToolSchema[]>
```
**Implementation:**
- Injected into sandbox as `globalThis.discoverMCPTools`
- Calls `GET /mcp/tools` endpoint (localhost proxy)
- 500ms timeout via `AbortSignal.timeout(500)`
- Returns full tool schemas with JSON Schema parameters
**Performance:**
- First call: 50-100ms (populates schema cache)
- Subsequent calls: <5ms (from cache, 24h TTL)
- Parallel queries across 3+ MCP servers: <100ms P95
#### getToolSchema(toolName)
**Purpose:** Retrieve full JSON Schema for a specific tool
**Signature:**
```typescript
async function getToolSchema(
toolName: string
): Promise<ToolSchema | null>
```
**Implementation:**
- Wrapper over `discoverMCPTools()` (DRY principle)
- Finds tool by name using `Array.find()`
- Returns `null` if tool not found (no exceptions)
#### searchTools(query, limit?)
**Purpose:** Search tools by keywords with result limiting
**Signature:**
```typescript
async function searchTools(
query: string,
limit?: number // Default: 10
): Promise<ToolSchema[]>
```
**Implementation:**
- Splits query by whitespace: `query.split(/\s+/)`
- Calls `discoverMCPTools({ search: keywords })`
- Applies result limit via `Array.slice(0, limit)`
- OR logic: matches if ANY keyword found in name/description
### 5.3 Parallel Query Pattern
**Design Decision:** Query all MCP servers in parallel using `Promise.all` for O(1) amortized latency.
**Sequential vs Parallel:**
```typescript
// ❌ Sequential (3 servers × 30ms each = 90ms)
for (const client of mcpClients) {
const tools = await client.listTools(); // Wait for each
allTools.push(...tools);
}
// ✅ Parallel (max 30ms, O(1) amortized)
const queries = mcpClients.map(client => client.listTools());
const results = await Promise.all(queries); // All at once
const allTools = results.flat();
```
**Resilient Aggregation:**
```typescript
// Handle partial failures gracefully
const queries = mcpClients.map(async client => {
try {
return await client.listTools();
} catch (error) {
console.error(`MCP server ${client.name} failed:`, error);
return { tools: [] }; // Return empty, don't block others
}
});
```
**Performance Benefit:**
- 1 MCP server: 30ms (baseline)
- 3 MCP servers (sequential): 90ms (3× slower)
- 3 MCP servers (parallel): 35ms (O(1) amortized)
- 10 MCP servers (parallel): 50ms (still O(1))
**Target Met:** P95 latency <100ms for 3 MCP servers (spec.md NFR-2)
### 5.4 Timeout Strategy
**Design Decision:** 500ms timeout for proxy→sandbox communication (fast fail, no retries).
**Rationale:**
- AI agents prefer fast failure over hanging
- 500ms allows parallel queries (100ms + network overhead)
- No retries: discovery errors should surface immediately
- Clear error messages guide AI agent to retry if transient
**Implementation:**
```typescript
// Sandbox side (fetch with timeout)
const controller = new AbortController();
const timeoutId = setTimeout(() => controller.abort(), 500);
try {
const response = await fetch(url, {
signal: controller.signal,
headers: { 'Authorization': `Bearer ${token}` }
});
return await response.json();
} catch (error) {
if (error.name === 'AbortError') {
throw new Error('Discovery timeout (500ms exceeded). MCP servers may be slow.');
}
throw error;
} finally {
clearTimeout(timeoutId);
}
```
---
## 6. Pyodide WebAssembly Sandbox (Python Executor)
### 6.1 Security Resolution: Issues #50/#59
**Problem:** Native Python executor (subprocess.spawn) had ZERO sandbox isolation.
**Solution:** Pyodide WebAssembly runtime with complete isolation.
### 6.2 Pyodide Architecture
```
┌─────────────────────────────────────────────────────────────┐
│ Python Code Execution │
└────────────────┬────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ Pyodide WebAssembly Sandbox (v0.26.4) │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ WebAssembly VM (Primary Boundary) │ │
│ │ • No native syscall access │ │
│ │ • Memory-safe (bounds checking, type safety) │ │
│ │ • Cross-platform consistency │ │
│ └──────────────┬───────────────────────────────────────┘ │
│ │ │
│ ┌──────────────▼───────────────────────────────────────┐ │
│ │ Virtual Filesystem (Emscripten FS) │ │
│ │ • In-memory only (no host access) │ │
│ │ • /tmp writable, / read-only │ │
│ │ • Host files completely inaccessible │ │
│ └──────────────┬───────────────────────────────────────┘ │
│ │ │
│ ┌──────────────▼───────────────────────────────────────┐ │
│ │ Network Access (pyodide.http.pyfetch) │ │
│ │ • Localhost only (127.0.0.1) │ │
│ │ • Bearer token authentication required │ │
│ │ • MCP proxy enforces tool allowlist │ │
│ └──────────────┬───────────────────────────────────────┘ │
│ │ │
│ ┌──────────────▼───────────────────────────────────────┐ │
│ │ Injected MCP Functions │ │
│ │ • call_mcp_tool(name, params) │ │
│ │ • discover_mcp_tools(search_terms) │ │
│ │ • get_tool_schema(tool_name) │ │
│ │ • search_tools(query, limit) │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
```
### 6.3 Two-Phase Execution Pattern
**Design:** Based on Pydantic's mcp-run-python (production-proven).
**Phase 1: Setup (Inject MCP Tool Access)**
```python
# Executed by Pyodide before user code
import js
from pyodide.http import pyfetch
async def call_mcp_tool(tool_name, params):
# Call MCP proxy with bearer auth
response = await pyfetch(
f'http://localhost:{js.PROXY_PORT}',
method='POST',
headers={'Authorization': f'Bearer {js.AUTH_TOKEN}'},
body=json.dumps({'toolName': tool_name, 'params': params})
)
return await response.json()
# Discovery functions also injected
```
**Phase 2: Execute User Code**
```python
# User's code runs in sandboxed environment
# Has access to injected functions but not host system
result = await call_mcp_tool('mcp__filesystem__read_file', {...})
```
**WHY Two-Phase?**
- Prevents user code from tampering with injection mechanism
- Clear separation of setup vs execution
- Injection happens in trusted context before untrusted code runs
### 6.4 Global Pyodide Cache
**Problem:** Pyodide initialization is expensive (~2-3s with npm package).
**Solution:** Global cached instance shared across executions.
```typescript
let pyodideCache: PyodideInterface | null = null;
async function getPyodide(): Promise<PyodideInterface> {
if (!pyodideCache) {
console.error('🐍 Initializing Pyodide (first run, ~10s)...');
pyodideCache = await loadPyodide({
indexURL: 'https://cdn.jsdelivr.net/pyodide/v0.26.4/full/',
stdin: () => { throw new Error('stdin disabled for security'); },
});
}
return pyodideCache;
}
```
**Performance:**
- First call: ~2-3s initialization (npm package includes files locally)
- Subsequent calls: <100ms (cache hit)
- Memory overhead: ~20MB (WASM module + Python runtime)
### 6.5 Security Boundaries
| Boundary | Enforcement | Attack Prevention |
|----------|-------------|-------------------|
| **WASM VM** | V8 engine | No syscalls, no native code execution |
| **Virtual FS** | Emscripten | No host file access (/etc/passwd, ~/.ssh) |
| **Network** | Fetch API + proxy | No external network, only localhost MCP |
| **MCP Allowlist** | Proxy validation | No unauthorized tool execution |
| **Timeout** | Promise.race() | No infinite loops, resource exhaustion |
**Attack Surface Reduction:** 99% vs native Python executor.
### 6.6 Limitations & Trade-offs
**Acceptable Limitations:**
- **Pure Python only** - No native C extensions (unless WASM-compiled)
- ✅ Most Python stdlib works (json, asyncio, math, etc.)
- ❌ No numpy, pandas, scikit-learn (unless Pyodide-compiled versions)
- **10-30% slower** - WASM overhead
- ✅ Acceptable for security-critical environments
- ✅ Still faster than Docker container startup
- **No multiprocessing/threading** - Single-threaded WASM
- ✅ Use async/await instead (fully supported)
- **4GB memory limit** - WASM 32-bit addressing
- ✅ Sufficient for most scripts
- ❌ Large ML models won't fit
**Security Trade-off:** Performance cost is acceptable for complete isolation.
### 6.7 Industry Validation
**Production Usage:**
- **Pydantic mcp-run-python** - Reference implementation
- **JupyterLite** - Run Jupyter notebooks in browser
- **Google Colab** - Similar WASM isolation approach
- **VS Code Python REPL** - Uses Pyodide for in-browser Python
- **PyScript** - HTML <py-script> tags powered by Pyodide
**Security Review:** Gemini 2.0 Flash validation via zen clink (research-specialist agent).
---
## 7. Data Flow
### 7.1 Tool Execution Flow (Existing v0.3.x)
```
1. AI Agent → executeTypescript(code)
2. Sandbox spawned (Deno subprocess)
3. Code executes: callMCPTool('tool_name', params)
4. Sandbox → HTTP POST localhost:PORT/
5. Proxy validates: Bearer token, rate limit, allowlist
6. Proxy → MCP Client Pool → External MCP Server
7. MCP Server executes tool, returns result
8. Result → Proxy → Sandbox → AI Agent
```
### 6.2 Tool Discovery Flow (NEW v0.4.0)
```
1. AI Agent → executeTypescript(code with discoverMCPTools())
2. Sandbox executes: discoverMCPTools({ search: ['file'] })
3. Sandbox → HTTP GET localhost:PORT/mcp/tools?q=file
4. Proxy validates: Bearer token, rate limit, query (<100 chars)
5. Proxy → MCP Client Pool.listAllToolSchemas(schemaCache)
6. Client Pool queries all MCP servers in parallel (Promise.all)
7. Schema Cache provides cached schemas (<5ms) or fetches (50ms)
8. Proxy filters by keywords (OR logic, case-insensitive)
9. Proxy audits: { action: 'discovery', searchTerms: ['file'], count: 5 }
10. Result → Sandbox → AI Agent (ToolSchema[] JSON)
```
### 6.3 Schema Caching Flow
```
1. First discovery call: Cache miss
→ Query MCP servers (50-100ms)
→ Store in LRU cache (in-memory, max 1000 entries)
→ Persist to disk (~/.code-executor/schema-cache.json, AsyncLock)
→ Return schemas
2. Subsequent calls (within 24h): Cache hit
→ Retrieve from LRU cache (<5ms)
→ No network calls
→ Return cached schemas
3. After 24h TTL: Cache expired
→ Re-query MCP servers (background refresh)
→ Update cache
→ Return fresh schemas
4. MCP server failure: Stale-on-error
→ Use expired cache entry (better than failure)
→ Log warning
→ Return stale schemas
```
---
## 7. Concurrency & Performance
### 7.1 Concurrency Safety (AsyncLock)
**Shared Resources Protected:**
| Resource | Lock Name | Why Protected | Performance Impact |
|----------|-----------|---------------|-------------------|
| Schema Cache Disk Writes | `schema-cache-write` | Prevent file corruption from concurrent updates | Negligible (writes rare, 24h TTL) |
| Audit Log Appends | `audit-log-write` | Prevent interleaved log entries | Negligible (<1ms lock hold) |
**AsyncLock Pattern:**
```typescript
import AsyncLock from 'async-lock';
const lock = new AsyncLock();
// Schema cache writes
await lock.acquire('schema-cache-write', async () => {
await fs.writeFile(cachePath, JSON.stringify(cache));
});
// Audit log appends
await lock.acquire('audit-log-write', async () => {
await fs.appendFile(auditLogPath, logEntry + '\n');
});
```
### 7.2 Performance Characteristics
| Operation | First Call | Cached Call | Target | Actual (v0.4.0) |
|-----------|-----------|-------------|--------|-----------------|
| discoverMCPTools (1 server) | 30ms | <5ms | <50ms | ✅ 30ms / 3ms |
| discoverMCPTools (3 servers) | 50-100ms | <5ms | <100ms P95 | ✅ 60ms / 4ms |
| discoverMCPTools (10 servers) | 80-150ms | <10ms | <150ms P95 | ✅ 120ms / 8ms |
| getToolSchema (specific tool) | 50ms | <5ms | N/A | ✅ Same as discover |
| searchTools (keyword filter) | 50ms | <5ms | N/A | ✅ Same as discover |
**Key Optimizations:**
- ✅ Parallel queries (Promise.all) → O(1) amortized complexity
- ✅ Schema Cache with 24h TTL → 20× faster (100ms → 5ms)
- ✅ In-memory LRU cache (max 1000 entries) → No disk I/O on hits
- ✅ Disk persistence → Survives restarts, no re-fetching
- ✅ Stale-on-error fallback → Resilient to transient failures
### 7.3 Memory & Storage
**Memory Footprint:**
- Schema Cache (in-memory): ~1-2MB (1000 schemas × ~1-2KB each)
- MCP Client connections: ~100KB per server
- Sandbox subprocesses: ~50MB per execution (isolated, cleaned up)
**Disk Storage:**
- Schema Cache: `~/.code-executor/schema-cache.json` (~500KB-1MB)
- Audit Logs: `~/.code-executor/audit-logs/*.jsonl` (append-only, rotated daily)
---
## 8. Design Decisions
### 8.1 Why Progressive Disclosure?
**Problem:** Exposing all MCP tool schemas exhausts context budget.
**Decision:** Hide tools behind code execution, load on-demand.
**Trade-offs:**
- ✅ **Benefit:** 98% token reduction (141k → 1.6k)
- ✅ **Benefit:** Zero context overhead for unused tools
- ❌ **Cost:** Two-step process (discover → execute)
- ✅ **Mitigation (v0.4.0):** Single round-trip workflow (discover + execute in one call)
### 8.2 Why Parallel Queries?
**Problem:** Sequential MCP queries scale linearly (3 servers = 3× latency).
**Decision:** Query all MCP servers in parallel using `Promise.all`.
**Trade-offs:**
- ✅ **Benefit:** O(1) amortized latency (max of all queries, not sum)
- ✅ **Benefit:** Meets <100ms P95 target for 3 servers
- ❌ **Cost:** More complex error handling (partial failures)
- ✅ **Mitigation:** Resilient aggregation (one failure doesn't block others)
### 8.3 Why 500ms Timeout?
**Problem:** Slow MCP servers cause AI agents to hang indefinitely.
**Decision:** 500ms timeout on sandbox→proxy discovery calls.
**Trade-offs:**
- ✅ **Benefit:** Fast fail (AI agent gets immediate feedback)
- ✅ **Benefit:** Allows parallel queries (100ms + 400ms network/overhead)
- ❌ **Cost:** May timeout on legitimately slow servers (10+)
- ✅ **Mitigation:** Clear error message guides retry, stale-on-error fallback
### 8.4 Why Bypass Allowlist for Discovery?
**Problem:** AI agents stuck without knowing what tools exist.
**Decision:** Discovery bypasses allowlist, execution still enforced.
**Trade-offs:**
- ✅ **Benefit:** AI agents can self-discover tools (no manual docs)
- ✅ **Benefit:** Read-only metadata, no execution without allowlist
- ❌ **Risk:** Information disclosure (tool names/descriptions visible)
- ✅ **Mitigation:** Two-tier security (discovery=read, execution=write), auth + rate limit + audit log
**Risk Assessment:** LOW - tool schemas are non-sensitive metadata, no code execution without allowlist enforcement.
### 8.5 Why Schema Cache with 24h TTL?
**Problem:** Querying MCP servers on every discovery call wastes 50-100ms.
**Decision:** Disk-persisted LRU cache with 24h TTL.
**Trade-offs:**
- ✅ **Benefit:** 20× faster (100ms → 5ms) on cache hits
- ✅ **Benefit:** Survives server restarts (disk persistence)
- ❌ **Cost:** Stale schemas if MCP servers update within 24h
- ✅ **Mitigation:** Smart refresh on validation failures, manual cache clear available
---
## 9. Resilience Patterns (v0.5.0)
### 9.1 Circuit Breaker Pattern
**Purpose:** Prevent cascade failures when MCP servers hang or fail repeatedly.
**Implementation:** Opossum library wrapping MCP client pool calls
**State Machine:**
```
CLOSED (Normal Operation)
↓ 5 consecutive failures
OPEN (Fail Fast - 30s cooldown)
↓ After 30s timeout
HALF-OPEN (Test with 1 request)
↓ Success → CLOSED | Failure → OPEN
```
**Configuration:**
- **Failure Threshold:** 5 consecutive failures
- **Cooldown Period:** 30 seconds
- **Half-Open Test:** 1 request
**WHY 5 failures?**
- Low enough to detect problems quickly
- High enough to avoid false positives from transient errors
- Balances responsiveness with stability
**WHY 30s cooldown?**
- Kubernetes default terminationGracePeriodSeconds is 30s
- AWS ALB deregistration delay is also 30s default
- Allows time for failing server to recover or be replaced
**Metrics Exposed:**
- `circuit_breaker_state` (gauge): 0=closed, 1=open, 0.5=half-open
- `circuit_breaker_failures_total` (counter): Total failures per server
**Example:**
```typescript
// Circuit breaker wraps MCP client pool calls
const breaker = new CircuitBreakerFactory({
failureThreshold: 5,
resetTimeout: 30000,
});
// Fails fast when circuit open (no waiting on broken server)
try {
const result = await breaker.callTool('mcp__server__tool', params);
} catch (error) {
if (error.message.includes('circuit open')) {
// Handle gracefully - server is known to be down
}
}
```
### 9.2 Connection Pool Overflow Queue
**Purpose:** Add request queueing and backpressure when connection pool reaches capacity.
**Implementation:** FIFO queue with timeout-based expiration and AsyncLock protection
**Architecture:**
```
MCP Request → Check Pool Capacity
↓ Pool under capacity (< 100 concurrent)
Execute Immediately
↓ Pool at capacity (≥ 100 concurrent)
Enqueue Request (max 200 in queue)
↓ Queue full
Return 503 Service Unavailable
↓ Queued successfully
Wait for slot (max 30s timeout)
↓ Timeout exceeded
Return 503 with retry-after hint
↓ Slot available
Dequeue and execute
```
**Configuration:**
- **Pool Capacity:** 100 concurrent requests (configurable via `POOL_MAX_CONCURRENT`)
- **Queue Size:** 200 requests (configurable via `POOL_QUEUE_SIZE`)
- **Queue Timeout:** 30 seconds (configurable via `POOL_QUEUE_TIMEOUT_MS`)
**WHY 100 concurrent requests?**
- Balances throughput vs MCP server resource consumption
- Most MCP servers handle 100 concurrent requests comfortably
- Configurable for tuning based on actual MCP server capacity
**WHY 200 queue size?**
- Provides 2× buffer beyond concurrency limit
- Balances memory usage (~40KB at 200 requests) vs utility
- More conservative than Nginx default (512)
**WHY 30s timeout?**
- Reasonable wait time for legitimate traffic
- Prevents queue from filling with stale requests
- Matches circuit breaker cooldown (30s recovery window)
**Metrics Exposed:**
- `pool_active_connections` (gauge): Current concurrent requests
- `pool_queue_depth` (gauge): Number of requests waiting in queue
- `pool_queue_wait_seconds` (histogram): Time spent waiting (buckets: 0.1s-30s)
**Example:**
```typescript
// Pool automatically queues when at capacity
const pool = new MCPClientPool({
maxConcurrent: 100,
queueSize: 200,
queueTimeoutMs: 30000,
});
// Request queued if pool full, executed when slot available
try {
const result = await pool.callTool('mcp__tool', params);
} catch (error) {
if (error.message.includes('Service Unavailable')) {
// Queue full or timeout - implement retry logic
}
}
```
### 9.3 Resilience Pattern Interaction
**Circuit Breaker + Queue:**
```
Request → Circuit Breaker Check
↓ Circuit OPEN
Fail Fast (no queue)
↓ Circuit CLOSED/HALF-OPEN
Check Pool Capacity
↓ Under capacity
Execute immediately
↓ At capacity
Enqueue (with timeout)
```
**Benefits:**
- Circuit breaker prevents queueing requests to known-bad servers
- Queue provides graceful degradation under load
- Combined: Fast failure for broken servers, queueing for healthy ones
**Failure Modes:**
1. **MCP Server Down:** Circuit breaker opens → immediate 503 (no queueing)
2. **MCP Server Slow:** Queue fills → 503 after 30s timeout
3. **High Load:** Queue drains as capacity frees → requests succeed with delay
### 9.4 Backpressure Signaling
**HTTP Status Codes:**
- `200 OK` - Request succeeded (no backpressure)
- `429 Too Many Requests` - Rate limit exceeded (per-client limit hit)
- `503 Service Unavailable` - Circuit open OR queue full/timeout
**Retry Guidance:**
```
503 Circuit Open
Retry-After: 30 (wait for circuit to close)
503 Queue Full
Retry-After: 60 (estimated queue drain time)
503 Queue Timeout
Retry-After: 30 (try again with fresh timeout)
```
**Monitoring:**
```prometheus
# Alert on high queue depth
pool_queue_depth > 150 # Queue >75% full
# Alert on frequent circuit opens
rate(circuit_breaker_failures_total[5m]) > 10
# Alert on slow queue processing
histogram_quantile(0.95, pool_queue_wait_seconds) > 15
```
### 9.5 Performance Impact
**Latency Overhead:**
- **Circuit Breaker:** <1ms per request (state check)
- **Queue Check:** <1ms per request (counter comparison)
- **Queue Wait:** 0-30s (depends on load)
**Memory Overhead:**
- **Circuit Breaker:** ~10KB per server (state tracking)
- **Connection Queue:** ~200 bytes per queued request (max ~40KB)
**Total Overhead:** Negligible (<0.1% CPU, <1MB RAM)
---
## 10. CLI Setup Wizard Architecture (v0.9.0)
### 10.1 Overview
The CLI setup wizard provides one-command initialization of code-executor-mcp with automatic MCP server discovery, wrapper generation, and daily sync scheduling.
**Entry Point:** `npm run setup` → `src/cli/index.ts`
**Design Goal:** Zero-config setup with smart defaults, cross-platform support, and idempotent operation.
### 10.2 Component Diagram
```
┌─────────────────────────────────────────────────────────────┐
│ CLI Entry Point │
│ (src/cli/index.ts) │
│ • Self-install check (SelfInstaller) │
│ • Lock acquisition (LockFileService) │
│ • Wizard orchestration │
└────────────────┬────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ CLIWizard │
│ (src/cli/wizard.ts) │
│ • Interactive prompts (tool selection, config questions) │
│ • Default config pattern (press Enter to skip) │
│ • Idempotent setup (merge/reset/keep existing configs) │
└────────────┬────────────────────────────────────────────────┘
│
├─────────────────┬──────────────────┬────────────┐
▼ ▼ ▼ ▼
┌──────────────────┐ ┌─────────────────┐ ┌──────────┐ ┌────────────┐
│ ToolDetector │ │ MCPDiscovery │ │ Wrapper │ │ Daily │
│ │ │ Service │ │Generator │ │ Sync │
│ • Detect Claude │ │ • Scan configs: │ │ • TS/Py │ │ • Schedule │
│ Code install │ │ ~/.claude.json│ │ wrapper│ │ setup │
│ • Validate paths │ │ .mcp.json │ │ gen │ │ • Platform │
│ │ │ • Merge servers │ │ • JSDoc │ │ specific │
└──────────────────┘ └─────────────────┘ └──────────┘ └────────────┘
```
### 10.3 Config Discovery & Merging
**Two-Location Scan Pattern:**
```typescript
// 1. Scan global Claude Code config
const globalServers = await discovery.scanToolConfig({
id: 'claude-code',
configPaths: {
linux: '~/.claude.json',
darwin: '~/.claude.json',
win32: '%USERPROFILE%\\.claude.json'
}
});
// 2. Scan project config
const projectServers = await discovery.scanProjectConfig('.mcp.json');
// 3. Merge (project overrides global for duplicate names)
const mergedServers = mergeMCPServers(globalServers, projectServers);
```
**Path Expansion:**
- `~` → `os.homedir()` (Linux/macOS)
- `%USERPROFILE%` → `process.env.USERPROFILE` (Windows)
- `%APPDATA%` → `process.env.APPDATA` (Windows)
**Fallback Behavior:**
- Config file not found → Prompt user for custom path or skip
- Invalid JSON → Log error, skip tool
- Missing `command` field → Log warning, skip server
### 10.4 Wrapper Generation
**Design:** Template-based code generation with schema-driven parameter types.
**Templates:**
```
src/cli/templates/
├── typescript-wrapper.hbs # TypeScript wrapper template
└── python-wrapper.hbs # Python wrapper template
```
**Generation Flow:**
```
1. Fetch tool schemas from MCP servers (via schema cache)
2. For each tool:
- Extract name, description, parameters (JSON Schema)
- Generate JSDoc comments from schema
- Generate TypeScript types from JSON Schema
- Render template with Handlebars
3. Write wrappers to output directory
```
**Example Output:**
```typescript
// Before (manual)
const file = await callMCPTool('mcp__filesystem__read_file', {
path: '/src/app.ts'
});
// After (wrapper)
import { filesystem } from './mcp-wrappers';
const file = await filesystem.readFile({ path: '/src/app.ts' });
```
**Benefits:**
- Type-safe with IntelliSense/autocomplete
- Self-documenting JSDoc from schemas
- No manual tool name lookups
- Matches actual MCP tool APIs
### 10.5 Daily Sync System
**Purpose:** Automatically regenerate wrappers when MCP servers change.
**Architecture:**
```
┌─────────────────────────────────────────────────────────────┐
│ Platform Scheduler (scheduled job) │
│ • macOS: launchd plist (~/.config/launchd/...) │
│ • Linux: systemd timer (~/.config/systemd/user/...) │
│ • Windows: Task Scheduler (HKCU\Software\Microsoft\...) │
└────────────────┬────────────────────────────────────────────┘
│
▼ (runs at 4-6 AM daily)
┌─────────────────────────────────────────────────────────────┐
│ DailySyncService │
│ (src/cli/daily-sync.ts) │
│ 1. Re-scan configs (~/.claude.json + .mcp.json) │
│ 2. Detect changes (new/removed/modified servers) │
│ 3. Regenerate wrappers if changes detected │
│ 4. Log sync status │
└─────────────────────────────────────────────────────────────┘
```
**Scheduler Implementation:**
| Platform | Mechanism | Config Location | Command |
|----------|-----------|-----------------|---------|
| **macOS** | launchd plist | `~/Library/LaunchAgents/com.code-executor.daily-sync.plist` | `launchctl load/unload` |
| **Linux** | systemd timer | `~/.config/systemd/user/code-executor-daily-sync.timer` | `systemctl --user enable/disable` |
| **Windows** | Task Scheduler | `HKCU\Software\Microsoft\Windows\CurrentVersion\Run` | `schtasks /create /delete` |
**Sync Execution:**
```bash
# Command executed by scheduler
npm run setup --sync-only --non-interactive
```
**Sync Logic:**
- Reads last sync state from `~/.code-executor/last-sync.json`
- Compares current MCP servers with last sync
- If changes detected → regenerate wrappers
- Update last sync state
- Exit 0 (success) or 1 (failure)
### 10.6 Lock File System
**Purpose:** Prevent concurrent wizard runs (race condition protection).
**Implementation:**
```typescript
class LockFileService {
private lockPath = '~/.code-executor/setup.lock';
async acquire(): Promise<void> {
if (await fs.exists(this.lockPath)) {
throw new Error('Setup wizard already running');
}
await fs.writeFile(this.lockPath, JSON.stringify({
pid: process.pid,
timestamp: Date.now()
}));
}
async release(): Promise<void> {
await fs.unlink(this.lockPath);
}
}
```
**Protection Against:**
- Multiple users running setup simultaneously
- Concurrent daily sync + manual setup
- Race conditions in wrapper file writes
### 10.7 Security Considerations
**Input Validation:**
- MCP server names: `[a-zA-Z0-9_-]+` only (no special chars)
- Config paths: No directory traversal (`.`, `..`, `~/../etc`)
- Template variables: Escaped before rendering (XSS prevention)
**Dangerous Pattern Detection:**
- MCP names with code injection patterns rejected (not escaped)
- Validation happens BEFORE template rendering (defense-in-depth)
- Tests: `tests/security/template-injection.test.ts` (387 lines)
**Privilege Escalation:**
- Wizard runs with user privileges (no sudo/admin required)
- Platform schedulers run as current user (not system-wide)
- Lock files in user home directory (no `/tmp` race conditions)
### 10.8 Component Responsibilities (SRP)
| Component | Responsibility | Why Separated |
|-----------|---------------|---------------|
| **CLIWizard** | Interactive prompts, user flow | UI/UX logic separate from business logic |
| **ToolDetector** | Detect AI tool installations | Tool-specific logic centralized |
| **MCPDiscoveryService** | Scan configs for MCP servers | Config parsing separate from UI |
| **WrapperGenerator** | Generate TS/Py wrappers | Code generation separate from discovery |
| **DailySyncService** | Daily sync orchestration | Scheduling logic separate from setup |
| **PlatformScheduler** | Platform detection | OS-specific logic encapsulated |
| **LockFileService** | Concurrent access control | Shared resource protection |
### 10.9 Idempotent Setup Pattern
**Design Goal:** Safe to run `npm run setup` multiple times without breaking existing config.
**Detection Flow:**
```
1. Check for existing config: ~/.code-executor/config.json
2. If exists:
- Prompt user: Merge, Reset, Keep existing
- Merge: Combine old + new MCP servers
- Reset: Delete old, use new config
- Keep: Skip setup, exit
3. If not exists:
- Create new config with defaults
```
**Merge Strategy:**
```typescript
function mergeMCPServers(
existing: MCPServerConfig[],
new: MCPServerConfig[]
): MCPServerConfig[] {
const merged = new Map<string, MCPServerConfig>();
// Add existing servers
for (const server of existing) {
merged.set(server.name, server);
}
// Override with new servers (project overrides global)
for (const server of new) {
merged.set(server.name, server);
}
return Array.from(merged.values());
}
```
### 10.10 Performance Characteristics
| Operation | First Run | Subsequent Runs | Notes |
|-----------|-----------|-----------------|-------|
| Tool detection | 50-100ms | <10ms | File system checks |
| MCP discovery | 100-200ms | 50-100ms | Schema cache helps |
| Wrapper generation | 200-500ms | 200-500ms | Template rendering dominant |
| Daily sync | 500ms-1s | 500ms-1s | Full re-scan + regeneration |
**Optimization Opportunities:**
- Schema cache reduces discovery latency (24h TTL)
- Template caching (compile once, render many)
- Parallel wrapper generation (Promise.all)
---
## Architecture Validation Checklist
### Constitutional Compliance
- [x] **Principle 1 (Progressive Disclosure):** Token impact 0% (3 tools maintained, ~560 tokens)
- [x] **Principle 2 (Security):** Zero tolerance met (auth, rate limit, audit, validation, intentional exception documented)
- [x] **Principle 3 (TDD):** Red-Green-Refactor followed, 95%+ discovery coverage, 90%+ overall
- [x] **Principle 4 (Type Safety):** TypeScript strict mode, no `any` types (use `unknown` + guards)
- [x] **Principle 5 (SOLID):** SRP verified (each component single purpose), DIP via abstractions
- [x] **Principle 6 (Concurrency):** AsyncLock on shared resources (cache writes, audit logs)
- [x] **Principle 7 (Fail-Fast):** Descriptive errors with schemas, no silent failures
- [x] **Principle 8 (Performance):** Measurement-driven (<100ms P95 met), parallel queries O(1)
- [x] **Principle 9 (Documentation):** Self-documenting code, WHY comments, architecture.md complete
### Quality Metrics
- **Test Coverage:** 95%+ (discovery endpoint), 90%+ (overall), 85%+ (integration)
- **Performance:** P95 <100ms (3 MCP servers), <5ms cached
- **Security:** Auth + rate limit + audit log + validation all enforced
- **Token Usage:** 3 tools, ~560 tokens (within 1.6k budget, 98% reduction maintained)
---
## 11. MCP Sampling Architecture (v1.0.0)
**Release:** v1.0.0 (2025-01-20)
**Status:** Beta
**Purpose:** Enable LLM-in-the-Loop execution for dynamic reasoning and analysis
### 11.1 Overview
MCP Sampling allows sandboxed code (TypeScript/Python) to invoke Claude during execution through simple helpers (`llm.ask()`, `llm.think()`). This enables "Claude asks Claude" scenarios for multi-step reasoning, code analysis, and data processing.
### 11.2 Architecture Diagram
```
┌─────────────────────────────────────────────────────────────┐
│ AI Agent (Claude/Cursor) │
│ │
│ 1. Send code with enableSampling: true │
└─────────────────────────────────────────────────────────────┘
↓ (executeTypescript/executePython)
┌─────────────────────────────────────────────────────────────┐
│ Code Executor MCP Server │
│ │
│ 2. Detect sampling enabled │
│ 3. Start SamplingBridgeServer │
│ - Generate 256-bit bearer token │
│ - Start HTTP server on random port (localhost only) │
│ - Inject llm helpers into sandbox │
└─────────────────────────────────────────────────────────────┘
↓ (Start sandbox with bridge URL + token)
┌─────────────────────────────────────────────────────────────┐
│ Sandbox (Deno/Pyodide) with Injected Helpers │
│ │
│ User Code: │
│ const result = await llm.ask("Analyze this code..."); │
│ ↓ │
│ 4. HTTP POST to bridge: localhost:PORT/sample │
│ Authorization: Bearer <token> │
│ Body: { messages, model, maxTokens, systemPrompt } │
└─────────────────────────────────────────────────────────────┘
↓ (Bearer token validation)
┌─────────────────────────────────────────────────────────────┐
│ SamplingBridgeServer (Security Layer) │
│ │
│ 5. Security Checks (in order): │
│ ✅ Validate Bearer Token (timing-safe comparison) │
│ ✅ Check Rate Limits (10 rounds, 10k tokens max) │
│ ✅ Validate System Prompt (allowlist check) │
│ ✅ Validate Request Schema (AJV deep validation) │
│ ↓ │
│ 6. Forward Request: │
│ ├─ Mode Detection (MCP SDK or Direct API) │
│ ├─ MCP Sampling (free) - if available │
│ └─ Direct Anthropic API (paid) - fallback │
└─────────────────────────────────────────────────────────────┘
↓ (Claude API call)
┌─────────────────────────────────────────────────────────────┐
│ Claude API (Anthropic) │
│ │
│ 7. Process Request: │
│ - Model: claude-sonnet-4-5 (default) │
│ - Response: { content, stop_reason, usage } │
└─────────────────────────────────────────────────────────────┘
↓ (Return response)
┌─────────────────────────────────────────────────────────────┐
│ SamplingBridgeServer (Post-Processing) │
│ │
│ 8. Content Filtering: │
│ ✅ Scan for secrets (OpenAI keys, GitHub tokens, AWS) │
│ ✅ Scan for PII (emails, SSNs, credit cards) │
│ ✅ Redact violations: [REDACTED_SECRET]/[REDACTED_PII] │
│ ↓ │
│ 9. Audit Logging: │
│ ✅ SHA-256 hash of prompt/response (no plaintext) │
│ ✅ Log: timestamp, model, tokens, duration, violations │
│ ✅ Write to: ~/.code-executor/audit-log.jsonl │
│ ↓ │
│ 10. Update Metrics: │
│ - Increment round counter │
│ - Add tokens to cumulative budget │
│ - Calculate quota remaining │
└─────────────────────────────────────────────────────────────┘
↓ (Return filtered response)
┌─────────────────────────────────────────────────────────────┐
│ Sandbox (Continue Execution) │
│ │
│ User Code: │
│ console.log(result); // Claude's filtered response │
│ ↓ │
│ 11. Execution completes, bridge shuts down gracefully │
└─────────────────────────────────────────────────────────────┘
↓ (Return execution result)
┌─────────────────────────────────────────────────────────────┐
│ Code Executor MCP Server │
│ │
│ 12. Return to AI Agent: │
│ { │
│ success: true, │
│ output: "...", │
│ samplingCalls: [...], // Array of all LLM calls │
│ samplingMetrics: { │
│ totalRounds: 2, │
│ totalTokens: 150, │
│ totalDurationMs: 1200, │
│ averageTokensPerRound: 75, │
│ quotaRemaining: { rounds: 8, tokens: 9850 } │
│ } │
│ } │
└─────────────────────────────────────────────────────────────┘
```
### 11.3 Core Components
#### 11.3.1 SamplingBridgeServer
**Purpose:** Ephemeral HTTP bridge between sandbox and Claude API with security enforcement
**Responsibilities:**
1. **Lifecycle Management**
- Start: Generate bearer token, find random port, start HTTP server
- Stop: Drain active requests (max 5s), close server gracefully
- Lifecycle: One bridge per execution, destroyed after completion
2. **Security Enforcement**
- Bearer token validation (timing-safe comparison)
- Rate limiting (rounds and tokens)
- System prompt allowlist validation
- Content filtering (secrets/PII redaction)
3. **Request Proxying**
- Mode detection: MCP SDK (free) or Direct API (paid)
- Request forwarding with proper authentication
- Response filtering and audit logging
**Key Methods:**
- `start(): Promise<{port, authToken}>` - Start bridge server
- `stop(): Promise<void>` - Graceful shutdown with request draining
- `getSamplingMetrics(): Promise<SamplingMetrics>` - Get current metrics
- `handleRequest(req, res)` - HTTP request handler (private)
**Configuration:**
```typescript
interface SamplingConfig {
enabled: boolean; // Enable/disable sampling
maxRoundsPerExecution: number; // Max LLM calls (default: 10)
maxTokensPerExecution: number; // Max tokens (default: 10,000)
timeoutPerCallMs: number; // Timeout per call (default: 30,000ms)
allowedSystemPrompts: string[]; // Prompt allowlist
contentFilteringEnabled: boolean; // Enable filtering (default: true)
}
```
#### 11.3.2 RateLimiter
**Purpose:** Prevent infinite loops and resource exhaustion
**Implementation:**
- **Round Counter**: Tracks number of sampling calls
- **Token Budget**: Cumulative token count across all calls
- **AsyncLock Protection**: Thread-safe counters for concurrent access
- **Quota Calculation**: Real-time remaining rounds/tokens
**Methods:**
- `async checkLimit(tokensRequested): Promise<{exceeded, metrics}>` - Check if request would exceed limits
- `async incrementUsage(tokensUsed): Promise<void>` - Increment counters after successful call
- `async getMetrics(): Promise<{roundsUsed, tokensUsed}>` - Get current usage
- `async getQuotaRemaining(): Promise<{rounds, tokens}>` - Get remaining quota
**Test Coverage:**
- ✅ T033-T036: Rate limiting tests (10 rounds, 10k tokens, 429 responses)
- ✅ T037: Concurrent access protection (AsyncLock verification)
#### 11.3.3 ContentFilter
**Purpose:** Detect and redact secrets/PII from Claude responses
**Patterns Detected:**
- **Secrets**: OpenAI keys (`sk-*`), GitHub tokens (`ghp_*`), AWS keys (`AKIA*`), JWT tokens (`eyJ*`)
- **PII**: Emails, SSNs, credit card numbers
**Methods:**
- `scan(content): {violations, filtered}` - Detect violations and return redacted content
- `filter(content, rejectOnViolation): string` - Filter with optional rejection mode
- `hasViolations(content): boolean` - Quick check for any violations
**Redaction Format:**
- Secrets: `[REDACTED_SECRET]`
- PII: `[REDACTED_PII]`
**Test Coverage:**
- ✅ T022-T026: Pattern detection tests (98%+ coverage)
- ✅ T115: Secret leakage redaction verification
#### 11.3.4 SamplingAuditLogger
**Purpose:** Log all sampling calls for security auditing and compliance
**Log Format (JSONL):**
```json
{
"timestamp": "2025-01-20T12:00:00.000Z",
"executionId": "exec-123",
"round": 1,
"model": "claude-sonnet-4-5",
"promptHash": "sha256:abc123...",
"responseHash": "sha256:def456...",
"tokensUsed": 75,
"durationMs": 600,
"status": "success",
"contentViolations": [
{ "type": "secret", "pattern": "openai_key", "count": 1 }
]
}
```
**Key Features:**
- **SHA-256 Hashing**: No plaintext secrets in logs
- **AsyncLock Protection**: Thread-safe concurrent writes
- **JSONL Format**: One entry per line, easy to parse
- **Location**: `~/.code-executor/audit-log.jsonl`
**Test Coverage:**
- ✅ T082-T084: Audit logging tests (13/13 passing)
### 11.4 API Design
#### 11.4.1 TypeScript API (Deno Sandbox)
**Simple Query:**
```typescript
const response = await llm.ask("What is 2+2?");
// Returns: "4"
```
**Multi-Turn Conversation:**
```typescript
const response = await llm.think({
messages: [
{ role: "user", content: "What is 2+2?" },
{ role: "assistant", content: "4" },
{ role: "user", content: "What about 3+3?" }
],
model: "claude-sonnet-4-5", // Optional
maxTokens: 1000, // Optional
systemPrompt: "", // Optional (must be in allowlist)
stream: false // Optional (not yet supported)
});
// Returns: "6"
```
#### 11.4.2 Python API (Pyodide Sandbox)
**Simple Query:**
```python
response = await llm.ask("What is 2+2?")
# Returns: "4"
```
**Multi-Turn Conversation:**
```python
response = await llm.think(
messages=[
{"role": "user", "content": "What is 2+2?"},
{"role": "assistant", "content": "4"},
{"role": "user", "content": "What about 3+3?"}
],
model="claude-sonnet-4-5", # Optional
max_tokens=1000, # Optional (snake_case for Python)
system_prompt="", # Optional (must be in allowlist)
stream=False # Optional (not supported in Pyodide)
)
# Returns: "6"
```
### 11.5 Security Model
#### 11.5.1 Threat Matrix
| Threat | Likelihood | Impact | Mitigation | Test |
|--------|-----------|--------|------------|------|
| Infinite loop API cost | High | High | Rate limiting (10 rounds) | T112 ✅ |
| Token exhaustion | Medium | High | Token budget (10k tokens) | T113 ✅ |
| Prompt injection | Medium | Medium | System prompt allowlist | T114 ✅ |
| Secret leakage | Low | Critical | Content filtering + SHA-256 logs | T115 ✅ |
| Timing attacks | Low | Medium | Constant-time comparison | T116 ✅ |
| Unauthorized access | Low | Medium | Bearer token + localhost binding | T014/T011 ✅ |
#### 11.5.2 Defense Layers
1. **Authentication Layer**: 256-bit bearer token (unique per execution)
2. **Rate Limiting Layer**: 10 rounds, 10,000 tokens per execution
3. **Validation Layer**: System prompt allowlist, AJV schema validation
4. **Content Filtering Layer**: Secrets/PII redaction before returning
5. **Audit Layer**: SHA-256 hashed logs for forensic analysis
### 11.6 Performance Characteristics
| Metric | Target | Measured | Status |
|--------|--------|----------|--------|
| Bridge startup time | <50ms | ~30ms | ✅ PASS |
| Per-call overhead | <100ms | ~60ms | ✅ PASS |
| Memory footprint | <50MB | ~15MB | ✅ PASS |
| Token validation | <10ms | ~5ms | ✅ PASS |
| Content filtering | <50ms | ~15ms | ✅ PASS |
### 11.7 Configuration Hierarchy
**Priority (highest to lowest):**
1. Per-execution parameters (`enableSampling`, `maxSamplingRounds`, `maxSamplingTokens`)
2. Environment variables (`CODE_EXECUTOR_SAMPLING_ENABLED`, `CODE_EXECUTOR_MAX_SAMPLING_ROUNDS`)
3. Configuration file (`~/.code-executor/config.json`)
4. Default values (enabled: false, maxRounds: 10, maxTokens: 10,000)
### 11.8 Hybrid Architecture (MCP SDK vs Direct API)
**Mode Detection:**
```typescript
detectSamplingMode(): 'mcp' | 'direct' {
if (this.mcpServer && typeof this.mcpServer.request === 'function') {
return 'mcp'; // MCP SDK available (free)
}
return 'direct'; // Fallback to Direct API (paid)
}
```
**MCP SDK Mode (Free):**
- Uses Claude Desktop's MCP SDK for sampling
- No additional API costs
- Requires Claude Desktop with MCP support
**Direct API Mode (Paid):**
- Uses Anthropic API directly
- Requires `ANTHROPIC_API_KEY`
- Pay-per-token pricing
**User Experience:**
- Automatic detection and fallback
- Clear logging of which mode is active
- Same API surface regardless of mode
### 11.9 Docker Support
**Detection:**
- Checks for `/.dockerenv` file
- Checks for Docker cgroup signatures in `/proc/self/cgroup`
**Bridge URL Handling:**
- **Host execution**: `http://localhost:PORT`
- **Docker execution**: `http://host.docker.internal:PORT`
**Docker Compose Example:**
```yaml
services:
code-executor:
image: aberemia24/code-executor-mcp:1.0.0
environment:
- CODE_EXECUTOR_SAMPLING_ENABLED=true
- ANTHROPIC_API_KEY=${ANTHROPIC_API_KEY}
extra_hosts:
- "host.docker.internal:host-gateway"
```
### 11.10 Test Coverage
**Total Sampling Tests: 74/74 passing (100%)**
| Component | Tests | Status |
|-----------|-------|--------|
| Bridge Server | 15/15 | ✅ PASS |
| Content Filter | 8/8 | ✅ PASS |
| TypeScript API | 4/4 | ✅ PASS |
| Python API | 3/3 | ✅ PASS |
| Config Schema | 23/23 | ✅ PASS |
| Audit Logging | 13/13 | ✅ PASS |
| Security Attacks | 8/8 | ✅ PASS |
**Key Tests:**
- T010-T016: Bridge server lifecycle (startup, shutdown, token validation)
- T022-T026: Content filtering (secrets, PII detection and redaction)
- T033-T037: Rate limiting (rounds, tokens, concurrent access)
- T044-T047: System prompt allowlist validation
- T053-T056: TypeScript sampling API
- T063-T066: Python sampling API
- T082-T084: Audit logging with SHA-256 hashes
- T112-T116: Security attack tests (infinite loop, token exhaustion, prompt injection, secret leakage, timing attacks)
### 11.11 Design Rationale
**Why Ephemeral Bridge Server?**
- **Security**: Unique bearer token per execution prevents cross-execution attacks
- **Isolation**: Localhost binding ensures no external access
- **Lifecycle**: Bridge destroyed after execution, no lingering processes
**Why Rate Limiting?**
- **Cost Control**: Prevent infinite loops from causing API cost explosions
- **Resource Management**: Prevent token exhaustion from overwhelming Claude API
- **User Protection**: Default limits protect users from accidental abuse
**Why Content Filtering?**
- **Secret Protection**: Prevent API keys, tokens, credentials from leaking into logs
- **Compliance**: PII redaction helps meet privacy regulations (GDPR, CCPA)
- **Defense-in-Depth**: Even if Claude accidentally generates secrets, they're redacted
**Why System Prompt Allowlist?**
- **Prompt Injection Defense**: Prevents attackers from bypassing security via custom system prompts
- **Controlled Behavior**: Ensures Claude operates within intended parameters
- **Auditability**: Limited set of prompts makes behavior predictable
**Why SHA-256 Audit Logs?**
- **Forensics**: Enable investigation of security incidents without exposing secrets
- **Deduplication**: Same prompt = same hash, enables pattern detection
- **Compliance**: Meets audit requirements without storing plaintext data
---
**Document Version:** 1.2.0 (Added MCP Sampling Architecture for v1.0.0)
**Contributors:** Alexandru Eremia
**Last Review:** 2025-11-19