Code Executor MCP Server

sampling-implementation-plan.md•42.8 KiB

# Code Executor MCP: Sampling Feature + Monetization Strategy **Version:** 0.4.0 (MVP) **Status:** In Development **Target:** 3-week implementation **Owner:** Alexandru Eremia --- ## Executive Summary This document outlines the complete technical implementation and business strategy for adding **MCP Sampling support** to code-executor-mcp. Sampling enables recursive LLM calls within sandboxed code, transforming the tool from a simple executor into a powerful agentic runtime. **Key Decisions:** - ✅ **Launch Strategy:** Community tier (100 calls/month) in open source - ✅ **Timeline:** 3 weeks for technical MVP - ✅ **Monetization:** Extract to `@code-executor/pro` package after validation (Month 3) - ✅ **License Model:** JWT + offline validation + 7-day phone-home for enterprises - ✅ **Pricing:** Free → $99/mo → $499/mo → Custom --- ## Part 1: Technical Implementation (Open Source MVP) ### Architecture Overview ``` User Code (Deno/Pyodide) ↓ sampleLLM() / llm.ask() ↓ HTTP Request → Sampling Bridge Server (localhost:random_port) ↓ Bearer Token Validation + Rate Limiting ↓ MCP SDK → Claude (sampling/createMessage) ↓ SSE Stream → Sandbox ``` ### Phase 1: Core Infrastructure #### 1.1 Sampling Bridge Server **File:** `src/sampling-bridge-server.ts` (NEW) **Responsibilities:** - HTTP server on localhost with random port (ephemeral) - Bearer token authentication (per-execution tokens) - Rate limiting (max rounds + max tokens per execution) - Forward sampling requests to Claude via MCP SDK - SSE streaming support for real-time responses - Graceful shutdown with request draining **Key Methods:** ```typescript class SamplingBridgeServer { constructor( private mcpServer: McpServer, private config: SamplingConfig ); async start(): Promise<{ port: number; authToken: string }>; async stop(): Promise<void>; // Internal private async handleSamplingRequest(req, res): Promise<void>; private validateToken(token: string): boolean; private enforceRateLimit(executionId: string): void; private validateSystemPrompt(prompt: string): void; getSamplingMetrics(executionId: string): SamplingMetrics; } ``` **Routes:** - `POST /sample` - Main sampling endpoint (SSE streaming) - `GET /health` - Health check for monitoring **Security Features:** 1. Token validation (401 if invalid) 2. Rate limiting (429 if quota exceeded) 3. System prompt allowlist (403 if not allowed) 4. Timeout protection (408 after 30s default) 5. Content filtering (redact secrets/PII in responses) #### 1.2 Configuration Schema **File:** `src/config-types.ts` (MODIFY) **Add:** ```typescript export const SamplingConfigSchema = z.object({ enabled: z.boolean().default(false).describe( 'Enable MCP Sampling globally (can be overridden per execution)' ), maxRoundsPerExecution: z.number().int().min(1).max(100).default(10).describe( 'Maximum sampling calls per execution (prevents infinite loops)' ), maxTokensPerExecution: z.number().int().min(100).max(100000).default(10000).describe( 'Maximum tokens consumed across all sampling calls' ), timeoutPerCallMs: z.number().int().min(1000).max(300000).default(30000).describe( 'Timeout for each individual sampling call' ), allowedSystemPrompts: z.array(z.string()).default([ '', 'You are a helpful assistant', 'You are a code analysis expert' ]).describe( 'Whitelist of allowed system prompts (security measure)' ), contentFilteringEnabled: z.boolean().default(true).describe( 'Enable content filtering to redact secrets/PII from responses' ) }); export type SamplingConfig = z.infer<typeof SamplingConfigSchema>; // Extend main config export const ConfigSchema = z.object({ // ... existing fields sampling: SamplingConfigSchema.optional() }); ``` **Environment Variable Overrides:** - `CODE_EXECUTOR_SAMPLING_ENABLED=true` - `CODE_EXECUTOR_MAX_SAMPLING_ROUNDS=20` - `CODE_EXECUTOR_MAX_SAMPLING_TOKENS=20000` - `CODE_EXECUTOR_SAMPLING_TIMEOUT_MS=60000` #### 1.3 Tool Schema Extensions **File:** `src/index.ts` (MODIFY - lines 225-316) **Extend `ExecuteTypescriptInputSchema`:** ```typescript export const ExecuteTypescriptInputSchema = z.object({ // ... existing fields enableSampling: z.boolean().optional().describe( 'Enable MCP Sampling for this execution (overrides global config)' ), maxSamplingRounds: z.number().int().min(1).max(100).optional().describe( 'Override global max sampling rounds for this execution' ), maxSamplingTokens: z.number().int().min(100).max(100000).optional().describe( 'Override global max tokens for this execution' ), samplingSystemPrompt: z.string().optional().describe( 'System prompt for sampling calls (must be in allowlist)' ) }); ``` **Same for `ExecutePythonInputSchema`.** #### 1.4 Execution Result Types **File:** `src/types.ts` (MODIFY) **Add:** ```typescript export interface SamplingCall { model: string; messages: Array<{ role: 'user' | 'assistant' | 'system'; content: any; }>; response: { content: any; stopReason?: string; }; durationMs: number; tokensUsed: number; timestamp: string; } export interface SamplingMetrics { totalRounds: number; totalTokens: number; totalDurationMs: number; averageTokensPerRound: number; quotaRemaining: { rounds: number; tokens: number; }; } export interface ExecutionResult { // ... existing fields samplingCalls?: SamplingCall[]; samplingMetrics?: SamplingMetrics; } ``` --- ### Phase 2: Executor Integration #### 2.1 TypeScript Executor (Deno) **File:** `src/sandbox-executor.ts` (MODIFY - lines 36-433) **Changes:** 1. **Accept sampling config in options:** ```typescript interface SandboxOptions { // ... existing fields samplingConfig?: { enabled: boolean; maxRounds: number; maxTokens: number; systemPrompt?: string; }; } ``` 2. **Start bridge server if enabled:** ```typescript async execute(options: SandboxOptions): Promise<ExecutionResult> { let samplingBridge: SamplingBridgeServer | null = null; try { // Start MCP proxy (existing) const mcpProxy = new MCPProxyServer(...); await mcpProxy.start(); // Start sampling bridge (new) if (options.samplingConfig?.enabled) { samplingBridge = new SamplingBridgeServer( this.mcpServer, options.samplingConfig ); const { port, authToken } = await samplingBridge.start(); // Inject into sandbox wrappedCode = injectSamplingHelpers( wrappedCode, port, authToken, options.samplingConfig ); } // ... execute code } finally { if (samplingBridge) { await samplingBridge.stop(); } } } ``` 3. **Inject sampling helper function:** ```typescript function injectSamplingHelpers( userCode: string, bridgePort: number, authToken: string, config: SamplingConfig ): string { return ` // Sampling Bridge Configuration globalThis.SAMPLING_BRIDGE_URL = 'http://localhost:${bridgePort}/sample'; globalThis.SAMPLING_AUTH_TOKEN = '${authToken}'; globalThis.SAMPLING_CONFIG = ${JSON.stringify(config)}; // Sampling Helper Function globalThis.sampleLLM = async ( messages: Array<{ role: string; content: any }>, options?: { model?: string; maxTokens?: number; systemPrompt?: string; stream?: boolean; } ): Promise<any> => { const response = await fetch(globalThis.SAMPLING_BRIDGE_URL, { method: 'POST', headers: { 'Authorization': \`Bearer \${globalThis.SAMPLING_AUTH_TOKEN}\`, 'Content-Type': 'application/json' }, body: JSON.stringify({ messages, model: options?.model || 'claude-sonnet-4-5', maxTokens: options?.maxTokens || 1024, systemPrompt: options?.systemPrompt || '', stream: options?.stream || false }) }); if (!response.ok) { const error = await response.json(); throw new Error(\`Sampling failed: \${error.message}\`); } // Handle streaming if (response.headers.get('content-type') === 'text/event-stream') { const reader = response.body.getReader(); const decoder = new TextDecoder(); let accumulated = ''; while (true) { const { done, value } = await reader.read(); if (done) break; const chunk = decoder.decode(value); const lines = chunk.split('\\n'); for (const line of lines) { if (line.startsWith('data: ')) { const data = line.slice(6); if (data === '[DONE]') { return JSON.parse(accumulated); } const parsed = JSON.parse(data); if (parsed.content) { accumulated = parsed.content; console.log('[Sampling Stream]', accumulated); } } } } } return await response.json(); }; // User code starts here ${userCode} `; } ``` #### 2.2 Python Executor (Pyodide) **File:** `src/pyodide-executor.ts` (MODIFY - lines 78-341) **Same bridge lifecycle as TypeScript.** **Inject Python sampling helper:** ```python import json from pyodide.http import pyfetch SAMPLING_BRIDGE_URL = '${bridgeUrl}' SAMPLING_AUTH_TOKEN = '${authToken}' async def sample_llm( messages: list, model: str = 'claude-sonnet-4-5', max_tokens: int = 1024, system_prompt: str = '', stream: bool = False ) -> dict: """ Call Claude via MCP Sampling bridge. Args: messages: List of message dicts with 'role' and 'content' model: Model identifier max_tokens: Max tokens in response system_prompt: System prompt (must be in allowlist) stream: Enable streaming (beta - limited support) Returns: Response dict with 'content', 'stopReason', etc. """ response = await pyfetch( SAMPLING_BRIDGE_URL, method='POST', headers={ 'Authorization': f'Bearer {SAMPLING_AUTH_TOKEN}', 'Content-Type': 'application/json' }, body=json.dumps({ 'messages': messages, 'model': model, 'maxTokens': max_tokens, 'systemPrompt': system_prompt, 'stream': stream }) ) if response.status != 200: error = await response.json() raise RuntimeError(f"Sampling failed: {error.get('message', 'Unknown error')}") # Note: Pyodide streaming support is limited # For now, return full response only return await response.json() ``` #### 2.3 Docker Executor Networking **File:** `src/sandbox-executor.ts` (Docker section) **Handle Docker-to-host networking:** ```typescript if (this.isDockerEnvironment) { // Replace localhost with Docker host const dockerBridgeUrl = bridgeUrl.replace( '127.0.0.1', 'host.docker.internal' ); // Add Docker networking args (Linux requires explicit host gateway) const networkArgs = process.platform === 'linux' ? ['--add-host', 'host.docker.internal:host-gateway'] : []; // ... spawn Docker container with networkArgs } ``` --- ### Phase 3: Security Implementation #### 3.1 Content Filter **File:** `src/security/content-filter.ts` (NEW) **Purpose:** Scan sampling responses for secrets and PII before returning to sandbox. ```typescript export interface ContentFilterConfig { enabled: boolean; redactSecrets: boolean; redactPII: boolean; rejectOnViolation: boolean; } export class ContentFilter { private readonly secretPatterns: RegExp[]; private readonly piiPatterns: RegExp[]; constructor(private config: ContentFilterConfig) { this.secretPatterns = [ /sk-[a-zA-Z0-9]{48}/g, // OpenAI keys /ghp_[a-zA-Z0-9]{36}/g, // GitHub tokens /xoxb-[0-9]{11}-[0-9]{11}-[a-zA-Z0-9]{24}/g, // Slack tokens /ya29\.[a-zA-Z0-9_-]{100,}/g, // Google OAuth /AKIA[0-9A-Z]{16}/g, // AWS access keys /eyJ[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+/g // JWT tokens ]; this.piiPatterns = [ /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b/g, // Emails /\b\d{3}-\d{2}-\d{4}\b/g, // SSN /\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b/g // Credit cards ]; } scan(content: string): { violations: Array<{ type: string; pattern: string; count: number }>; filtered: string; } { let filtered = content; const violations: Array<{ type: string; pattern: string; count: number }> = []; // Scan for secrets if (this.config.redactSecrets) { for (const pattern of this.secretPatterns) { const matches = content.match(pattern); if (matches && matches.length > 0) { violations.push({ type: 'secret', pattern: pattern.source, count: matches.length }); filtered = filtered.replace(pattern, '[REDACTED_SECRET]'); } } } // Scan for PII if (this.config.redactPII) { for (const pattern of this.piiPatterns) { const matches = content.match(pattern); if (matches && matches.length > 0) { violations.push({ type: 'pii', pattern: pattern.source, count: matches.length }); filtered = filtered.replace(pattern, '[REDACTED_PII]'); } } } return { violations, filtered }; } filter(content: string): string { if (!this.config.enabled) return content; const { violations, filtered } = this.scan(content); if (violations.length > 0) { if (this.config.rejectOnViolation) { throw new Error( `Content filter violation: ${violations.length} issues found. ` + `Types: ${violations.map(v => v.type).join(', ')}` ); } // Log violations console.warn('[ContentFilter] Violations detected:', violations); } return filtered; } } ``` #### 3.2 Audit Logging **File:** `src/audit-log.ts` (MODIFY) **Add sampling audit entries:** ```typescript export interface SamplingAuditEntry { timestamp: string; executionId: string; round: number; model: string; promptHash: string; // SHA-256 of messages responseHash: string; // SHA-256 of response tokensUsed: number; durationMs: number; status: 'success' | 'error' | 'rate_limited' | 'timeout'; errorMessage?: string; contentViolations?: Array<{ type: string; count: number }>; } export function logSamplingCall(entry: SamplingAuditEntry): void { const logEntry = { ...entry, type: 'sampling', timestamp: new Date().toISOString() }; // Write to audit log file (existing mechanism) appendToAuditLog(logEntry); // Also log to console in dev mode if (process.env.NODE_ENV === 'development') { console.log('[Sampling Audit]', logEntry); } } ``` --- ### Phase 4: Streaming Support #### 4.1 SSE Response Handling **In `src/sampling-bridge-server.ts`:** ```typescript private async handleSamplingRequest(req: IncomingMessage, res: ServerResponse) { // ... token validation, rate limiting const body = await this.readRequestBody(req); const { messages, model, maxTokens, systemPrompt, stream } = body; // Check if Claude supports streaming const supportsStreaming = this.checkMCPCapabilities('sampling.stream'); if (stream && supportsStreaming) { // Set SSE headers res.setHeader('Content-Type', 'text/event-stream'); res.setHeader('Cache-Control', 'no-cache'); res.setHeader('Connection', 'keep-alive'); try { // Request streaming from Claude const streamResponse = await this.mcpServer.request({ method: 'sampling/createMessage', params: { messages, modelPreferences: { hints: [{ name: model }] }, maxTokens, systemPrompt, includeContext: 'none' } }, { stream: true }); // Forward chunks to client for await (const chunk of streamResponse) { res.write(`data: ${JSON.stringify(chunk)}\n\n`); } res.write('data: [DONE]\n\n'); res.end(); } catch (error) { res.write(`data: {"error": "${error.message}"}\n\n`); res.end(); } } else { // Non-streaming response (default) const response = await this.mcpServer.request({ method: 'sampling/createMessage', params: { messages, modelPreferences: { hints: [{ name: model }] }, maxTokens, systemPrompt } }); res.writeHead(200, { 'Content-Type': 'application/json' }); res.end(JSON.stringify(response)); } } ``` --- ### Phase 5: Wrapper Generation #### 5.1 TypeScript Wrapper Template **File:** `templates/typescript-wrapper.hbs` (MODIFY or CREATE) **Add to generated wrappers:** ```typescript /** * LLM Sampling Interface (requires enableSampling: true) */ export interface LLMMessage { role: 'user' | 'assistant' | 'system'; content: { type: 'text' | 'image'; text?: string; source?: { type: string; data: string }; }; } export interface LLMResponse { content: Array<{ type: 'text'; text: string }>; stopReason?: 'end_turn' | 'max_tokens' | 'stop_sequence'; model: string; } export const llm = { /** * Advanced sampling with full control over messages */ async think(options: { messages: LLMMessage[]; model?: string; maxTokens?: number; systemPrompt?: string; stream?: boolean; }): Promise<LLMResponse> { if (typeof globalThis.sampleLLM === 'undefined') { throw new Error( 'Sampling not enabled for this execution. ' + 'Pass enableSampling: true to executeTypescript/executePython' ); } return await globalThis.sampleLLM(options.messages, { model: options.model || 'claude-sonnet-4-5', maxTokens: options.maxTokens || 1024, systemPrompt: options.systemPrompt, stream: options.stream || false }); }, /** * Simple text query (convenience wrapper) */ async ask(prompt: string, options?: { model?: string; maxTokens?: number; systemPrompt?: string; }): Promise<string> { const result = await this.think({ messages: [{ role: 'user', content: { type: 'text', text: prompt } }], ...options }); return result.content[0]?.text || ''; } }; ``` #### 5.2 Python Wrapper Template **File:** `templates/python-wrapper.hbs` (CREATE) ```python from typing import List, Dict, Optional, TypedDict class LLMMessage(TypedDict): role: str # 'user' | 'assistant' | 'system' content: Dict[str, any] class LLMResponse(TypedDict): content: List[Dict[str, str]] stopReason: Optional[str] model: str class LLM: """ LLM Sampling Interface (requires enableSampling=True) """ @staticmethod async def think( messages: List[LLMMessage], model: str = 'claude-sonnet-4-5', max_tokens: int = 1024, system_prompt: str = '', stream: bool = False ) -> LLMResponse: """ Advanced sampling with full control over messages """ if 'sample_llm' not in globals(): raise RuntimeError( 'Sampling not enabled for this execution. ' 'Pass enableSampling=True to executeTypescript/executePython' ) return await sample_llm( messages, model=model, max_tokens=max_tokens, system_prompt=system_prompt, stream=stream ) @staticmethod async def ask( prompt: str, model: str = 'claude-sonnet-4-5', max_tokens: int = 1024, system_prompt: str = '' ) -> str: """ Simple text query (convenience wrapper) """ result = await LLM.think( messages=[{ 'role': 'user', 'content': {'type': 'text', 'text': prompt} }], model=model, max_tokens=max_tokens, system_prompt=system_prompt ) return result['content'][0]['text'] if result['content'] else '' # Global instance for convenience llm = LLM() ``` --- ### Phase 6: Testing #### 6.1 Unit Tests **File:** `tests/sampling-bridge-server.test.ts` (NEW) Test coverage: - ✅ Server starts on random port and returns auth token - ✅ Token validation (valid token accepted, invalid rejected with 401) - ✅ Rate limiting enforcement (max rounds, max tokens, 429 response) - ✅ Timeout enforcement (30s default, 408 response) - ✅ System prompt allowlist (allowed prompts pass, others 403) - ✅ Graceful shutdown (drains active requests) - ✅ SSE streaming (chunks forwarded correctly) - ✅ Error handling (network errors, Claude API failures) **File:** `tests/content-filter.test.ts` (NEW) Test coverage: - ✅ Detect OpenAI API keys (sk-...) - ✅ Detect GitHub tokens (ghp_...) - ✅ Detect AWS keys (AKIA...) - ✅ Detect JWT tokens - ✅ Detect emails, SSNs, credit card numbers - ✅ Redaction mode (replace with [REDACTED]) - ✅ Rejection mode (throw error on violation) - ✅ False positive handling (legitimate code samples) **File:** `tests/sampling-executor-integration.test.ts` (NEW) Test coverage: - ✅ TypeScript: `llm.ask()` returns mocked response - ✅ TypeScript: `llm.think()` with multi-turn conversation - ✅ Python: `llm.ask()` via Pyodide - ✅ Python: `llm.think()` with messages array - ✅ Streaming: receive chunks incrementally (TypeScript) - ✅ Error handling: network errors, timeouts, rate limits - ✅ Concurrent: sampling + tool calls in same execution - ✅ Config override: global disabled, execution enables #### 6.2 Security Tests **File:** `tests/security/sampling-attacks.test.ts` (NEW) Test attack scenarios: - ✅ **Infinite loop:** Script calls `llm.ask()` in while loop → rate limit triggers at 10 rounds - ✅ **Token exhaustion:** Exceed `maxSamplingTokens` → 429 error with quota remaining - ✅ **Prompt injection:** Malicious system prompt → rejected by allowlist (403) - ✅ **Secret leakage:** Claude returns API key → content filter redacts it - ✅ **Timing attack:** Measure response times → no sensitive info leaked - ✅ **Resource exhaustion:** Large messages → handled gracefully with limits #### 6.3 Integration Tests **File:** `tests/integration/sampling-e2e.test.ts` (NEW) Test end-to-end workflows: - ✅ Multi-turn conversation (5 rounds): code analysis → follow-up questions - ✅ Tool calls + sampling: read file → ask Claude to analyze → use results - ✅ Config override: global disabled, per-execution enabled - ✅ Streaming: accumulate chunks, verify final response - ✅ Error recovery: Claude API down → graceful fallback - ✅ Metrics tracking: verify `samplingMetrics` in result #### 6.4 Mock Setup **File:** `tests/mocks/claude-sampling-server.ts` (NEW) Mock MCP server for testing: ```typescript export class MockClaudeSamplingServer { private responses: Map<string, any> = new Map(); // Pre-configure responses for tests addResponse(promptHash: string, response: any) { this.responses.set(promptHash, response); } // Simulate sampling request async handleSamplingRequest(params: any): Promise<any> { const hash = this.hashMessages(params.messages); return this.responses.get(hash) || { content: [{ type: 'text', text: 'Mock response' }] }; } // Simulate streaming async* streamResponse(params: any): AsyncGenerator<any> { const response = await this.handleSamplingRequest(params); const text = response.content[0].text; // Chunk by words const words = text.split(' '); for (const word of words) { yield { content: [{ type: 'text', text: word + ' ' }] }; await this.delay(10); } } } ``` --- ### Phase 7: Documentation #### 7.1 Feature Documentation **File:** `docs/sampling.md` (CREATE) **Contents:** 1. What is MCP Sampling? 2. Use cases (agentic workflows, code analysis, multi-step reasoning) 3. Quick start (enable sampling, first llm.ask() call) 4. Configuration options (global + per-execution) 5. Security considerations (rate limits, content filtering, allowlists) 6. Examples (TypeScript + Python) 7. Troubleshooting (common errors, quota exceeded, timeouts) #### 7.2 API Reference **File:** `README.md` (MODIFY) Add section: ```markdown ## MCP Sampling (Beta) Execute recursive LLM calls within sandboxed code for agentic workflows. ### Enable Sampling \`\`\`typescript const result = await client.callTool({ name: 'executeTypescript', arguments: { code: \` const analysis = await llm.ask('Analyze this code for bugs'); console.log(analysis); \`, enableSampling: true, // Enable sampling for this execution maxSamplingRounds: 5, // Limit to 5 LLM calls allowedTools: ['mcp__*'] } }); \`\`\` ### API - **llm.ask(prompt)** - Simple text query - **llm.think({ messages, model, maxTokens, systemPrompt, stream })** - Advanced sampling ### Limits - **Community Tier:** 100 sampling calls/month - **Pro Tier:** Unlimited (coming soon) ### Security - Rate limiting: 10 rounds per execution (configurable) - Token budget: 10,000 tokens per execution (configurable) - Content filtering: Automatically redacts secrets/PII - System prompt allowlist: Only pre-approved prompts allowed ``` #### 7.3 Examples **File:** `examples/sampling-demo.ts` (CREATE) ```typescript // Example: Multi-turn code analysis with sampling import { callMCPTool, llm } from './mcp-wrappers'; async function main() { // 1. Read code file const code = await callMCPTool('mcp__filesystem__read_file', { path: '/src/index.ts' }); // 2. Initial analysis const initialAnalysis = await llm.ask( `Analyze this TypeScript code for potential bugs:\n\n${code}` ); console.log('Initial Analysis:', initialAnalysis); // 3. Follow-up on specific issues const securityAnalysis = await llm.ask( `Based on your previous analysis, focus specifically on security vulnerabilities:\n\n${initialAnalysis}` ); console.log('\nSecurity Analysis:', securityAnalysis); // 4. Generate recommendations const recommendations = await llm.think({ messages: [ { role: 'user', content: { type: 'text', text: code } }, { role: 'assistant', content: { type: 'text', text: initialAnalysis } }, { role: 'user', content: { type: 'text', text: 'Provide 3 actionable recommendations to fix these issues' } } ], model: 'claude-sonnet-4-5', maxTokens: 2048 }); console.log('\nRecommendations:', recommendations.content[0].text); } main(); ``` --- ### Phase 8: Implementation Timeline #### Week 1: Core Infrastructure - **Day 1:** `SamplingBridgeServer` class (no streaming) - HTTP server setup - Token validation - Rate limiting - Basic request forwarding to Claude - **Day 2:** Config schema + tool schema updates - `SamplingConfigSchema` in `config-types.ts` - Extend `ExecuteTypescriptInputSchema` - Type definitions in `types.ts` - **Day 3:** TypeScript executor integration - Bridge lifecycle management - Inject `sampleLLM()` helper - Test basic sampling call - **Day 4:** Python executor integration - Bridge lifecycle (same as TS) - Inject `sample_llm()` helper - Test Python sampling - **Day 5:** Unit tests for bridge server - Token validation tests - Rate limiting tests - Timeout tests - System prompt allowlist tests #### Week 2: Security & Streaming - **Day 1:** Content filtering implementation - Create `ContentFilter` class - Secret detection patterns - PII detection patterns - Redaction vs rejection modes - **Day 2:** Token budget + rate limiting - Track tokens per execution - Enforce `maxSamplingTokens` - Return quota in error responses - **Day 3:** Streaming support (SSE) - Check MCP capabilities - Forward SSE chunks - Sandbox stream consumption - **Day 4:** Security tests (attacks, exploits) - Infinite loop test - Token exhaustion test - Prompt injection test - Secret leakage test - **Day 5:** Integration tests (e2e scenarios) - Multi-turn conversation test - Concurrent sampling + tool calls - Streaming test - Config override test #### Week 3: Polish & Documentation - **Day 1:** Wrapper generation updates - TypeScript template (`llm.think()`, `llm.ask()`) - Python template (`LLM` class) - Update generator logic - **Day 2:** Audit logging + metrics - `SamplingAuditEntry` in `audit-log.ts` - Log all sampling calls - Track metrics per execution - **Day 3:** Documentation (feature guide, API ref) - `docs/sampling.md` (complete guide) - README updates - JSDoc for new APIs - **Day 4:** Examples + migration guide - `examples/sampling-demo.ts` - Migration guide (if breaking changes) - Tutorial video/blog post - **Day 5:** Code review, final testing - Run full test suite - Check 90%+ coverage - Fix any edge cases - Prepare release notes --- ### Success Criteria **Functional Requirements:** - [x] TypeScript scripts can call `llm.ask()` and receive responses - [x] Python scripts can use `llm.think()` with message arrays - [x] Streaming works in TypeScript (SSE chunks received incrementally) - [x] Rate limiting prevents infinite loops (max 10 rounds default) - [x] Content filtering blocks secrets/PII in responses - [x] Config overrides work (per-execution > global > defaults) **Security Requirements:** - [x] 100% test coverage on security features (content filter, rate limiting) - [x] All sampling calls audited to log with SHA-256 hashes - [x] Token budget enforcement working (429 when quota exceeded) - [x] System prompt allowlist prevents injection (403 if not allowed) - [x] Sandbox isolation maintained (no privilege escalation) **Quality Requirements:** - [x] 90%+ overall test coverage - [x] No TypeScript errors (strict mode enabled) - [x] Documentation complete (feature guide + API ref + examples) - [x] Zero regressions in existing tests - [x] Performance: <100ms overhead for sampling setup --- ## Part 2: Business Strategy (Post-MVP) ### Monetization Model #### Tier Structure | Tier | Price | Target | Sampling Limit | Key Features | |------|-------|--------|----------------|--------------| | **Community** | Free | Hobbyists, OSS | 100 calls/month | All current GitHub features + basic sampling | | **Pro** | $99/mo | Startups, small teams | Unlimited | Advanced wrappers, HTTP transport, Redis cache | | **Team** | $499/mo | Growing companies | Unlimited | SSO, audit logs, 50 seats, priority support | | **Enterprise** | Custom | Large orgs | Unlimited | Multi-tenancy, on-premise, SLA, compliance | #### Usage-Based Add-ons - **Sampling Credits:** $0.01 per call (for Community tier overages) - **Additional Seats:** $10/seat/month (Team/Enterprise) - **Premium Support:** $2,000/mo (24/7, <1hr response) ### License Validation Architecture **JWT-Based Offline Validation:** ```typescript // License file structure { "license": "eyJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9...", "decoded": { "orgId": "enterprise-corp-uuid", "tier": "enterprise", "features": ["sampling", "multi_tenancy", "sso"], "expires": "2025-12-31T23:59:59Z", "seats": 100, "maxSamplingCallsPerMonth": -1 // -1 = unlimited } } ``` **Validation Flow:** 1. **Startup:** Validate JWT signature offline (no internet required) 2. **Every 7 days:** Phone home to license server (graceful failure if offline) 3. **Usage Tracking:** Track sampling calls locally, sync when online 4. **Grace Period:** 30 days if license server unreachable (enterprise-friendly) **Security:** - RSA-2048 signature (private key on license server only) - Org UUID binding (prevents license sharing) - Feature flags (granular control) - Expiry enforcement with 7-day warning ### Distribution Strategy **Dual Package Model:** ``` @code-executor/core (Open Source - npm public) ├── MIT License ├── Full source on GitHub ├── All current features └── Community sampling (100 calls/month) @code-executor/pro (Proprietary - npm auth required) ├── Commercial License ├── Compiled .js + .d.ts only (no source in npm) ├── Private GitHub repo (source available under NDA for security audits) └── Premium features: ├── Unlimited sampling ├── Advanced wrapper generation (all languages) ├── HTTP/SSE transport ├── Redis caching └── Extended timeouts ``` **Feature Gate Example:** ```typescript // In @code-executor/core (open source) if (samplingCallsThisMonth >= 100) { try { const pro = await import('@code-executor/pro'); const license = await pro.validateLicense(); if (!license.features.includes('unlimited_sampling')) { throw new Error( 'Community tier: 100 sampling calls/month limit reached. ' + 'Upgrade to Pro for unlimited: https://code-executor.dev/pricing' ); } } catch (importError) { throw new Error( '@code-executor/pro package not found. ' + 'Install with: npm install @code-executor/pro --auth-token=YOUR_LICENSE_KEY' ); } } ``` ### Implementation Timeline **Month 1-2: Build & Validate MVP (Current Plan)** - [x] Implement sampling in open source (3 weeks) - [ ] Launch community tier (100 calls/month) - [ ] Gather feedback from 50+ beta users - [ ] Measure engagement: % of users hitting 100-call limit - [ ] Validate product-market fit (surveys, interviews) **Month 3: Extract to Pro Package** - [ ] Create private GitHub repo: `code-executor-pro` - [ ] Move unlimited sampling to pro package - [ ] Build JWT license validation system - [ ] Set up license server (Stripe webhook integration) - [ ] Launch Pro tier ($99/mo, unlimited sampling) **Month 4-6: Team Features** - [ ] SSO integration (SAML 2.0, OIDC) - [ ] Advanced audit logging (Elasticsearch export) - [ ] Team management portal (invite users, manage seats) - [ ] Launch Team tier ($499/mo, 50 seats) - [ ] Target: 10 Pro customers + 2 Team customers ($2k MRR) **Month 7-12: Enterprise Sales** - [ ] Multi-tenancy architecture (isolated execution pools) - [ ] Compliance certifications (SOC2 Type 1, ISO 27001) - [ ] On-premise deployment option (Docker/Kubernetes) - [ ] First enterprise pilot ($10k/year contract) - [ ] Scale to $50k+ MRR ### Competitive Positioning | Tool | Model | Price | Our Differentiation | |------|-------|-------|---------------------| | Docker Enterprise | Per-seat | $75/seat/mo | We're cheaper for small teams | | HashiCorp Terraform | Tiered + usage | Free → $20 → Custom | Similar model, but we focus on LLM orchestration | | Elastic Cloud | Infrastructure | $95/mo starter | We're developer-focused, not infrastructure | | **Code Executor MCP** | **Tiered** | **Free → $99 → $499 → Custom** | **Only MCP orchestration server with sampling** | **Unique Value Proposition:** - ✅ **Only MCP server** with recursive LLM sampling (no competition) - ✅ **Open core model** builds trust + community - ✅ **Progressive disclosure** reduces Claude API costs by 98% - ✅ **Enterprise-ready** (air-gap support, compliance, SSO) ### Risk Mitigation **Risk 1: Token Cost Explosion** - **Mitigation:** Strict defaults (10 rounds, 10k tokens per execution) - **Monitoring:** Alert if user exceeds $10/day in Claude API costs - **Fallback:** Global kill switch via config **Risk 2: Claude API Changes** - **Mitigation:** Version check MCP SDK, graceful degradation - **Testing:** Integration tests against real Claude API (monthly) - **Fallback:** Disable sampling if `sampling/createMessage` unsupported **Risk 3: Piracy (Pro Package)** - **Mitigation:** Obfuscated code + license validation - **Acceptance:** Some piracy inevitable, focus on enterprise (80% revenue) - **Enforcement:** DMCA takedowns for public license key leaks **Risk 4: Community Backlash (Paywall)** - **Mitigation:** 100 calls/month free tier is generous (most users never hit it) - **Communication:** Transparent pricing, clear value prop for Pro - **Fallback:** Increase free tier limit to 200 calls/month if needed --- ## Files Summary ### New Files (10 implementation + 4 business) **Implementation:** 1. `src/sampling-bridge-server.ts` - Core bridge server 2. `src/security/content-filter.ts` - Secret/PII detection 3. `templates/typescript-wrapper.hbs` - TS wrapper with `llm` export 4. `templates/python-wrapper.hbs` - Python wrapper with `LLM` class 5. `tests/sampling-bridge-server.test.ts` - Bridge unit tests 6. `tests/content-filter.test.ts` - Content filter tests 7. `tests/sampling-executor-integration.test.ts` - Executor integration tests 8. `tests/security/sampling-attacks.test.ts` - Security attack tests 9. `tests/mocks/claude-sampling-server.ts` - Mock MCP server 10. `docs/sampling.md` - Feature documentation **Business (Post-MVP):** 11. `src/licensing/license-manager.ts` - JWT validation 12. `src/licensing/license-types.ts` - License schemas 13. `docs/pricing.md` - Pricing tiers documentation 14. `docs/enterprise.md` - Enterprise feature guide ### Modified Files (9 implementation + 3 business) **Implementation:** 1. `src/config-types.ts` - Add `SamplingConfigSchema` 2. `src/types.ts` - Add `SamplingCall`, `SamplingMetrics` interfaces 3. `src/index.ts` - Extend tool schemas with sampling params 4. `src/sandbox-executor.ts` - Inject sampling helpers (Deno) 5. `src/pyodide-executor.ts` - Inject Python sampling helpers 6. `src/audit-log.ts` - Log sampling calls with SHA-256 hashes 7. `src/wrapper-generator.ts` - Generate sampling helpers in wrappers 8. `README.md` - Document sampling feature + API 9. `CHANGELOG.md` - Version 0.4.0 release notes **Business (Post-MVP):** 10. `package.json` - Add `@code-executor/pro` peer dependency 11. `.npmignore` - Exclude business docs from open source package 12. `docs/roadmap.md` - Update with monetization timeline ### Total LOC Estimate **Implementation:** ~2,500 lines - Core: 800 lines (`sampling-bridge-server.ts`, configs, types) - Executors: 400 lines (injection logic, helpers) - Security: 300 lines (content filter, audit logging) - Tests: 800 lines (unit, integration, security, e2e) - Documentation: 200 lines (feature guide, examples) **Business (Post-MVP):** ~1,000 lines - Licensing: 400 lines (JWT validation, license server client) - Feature gates: 200 lines (tier enforcement) - Tests: 300 lines (license validation, feature gate tests) - Documentation: 100 lines (pricing, enterprise) **Total:** ~3,500 lines (implementation + business) --- ## Next Steps ### Immediate Actions (Week 1, Day 1) 1. **Create tracking document** ✅ (this file) 2. **Set up development branch:** ```bash git checkout -b feature/sampling-mvp ``` 3. **Install dependencies** (if any new ones needed): ```bash npm install --save-dev @types/node ``` 4. **Begin Phase 1:** Create `src/sampling-bridge-server.ts` ### Questions to Resolve Before full implementation, please confirm: 1. **MCP SDK Version:** Which version supports `sampling/createMessage`? - Check: https://github.com/modelcontextprotocol/specification - Action: Update `package.json` if newer version needed 2. **Claude Model Defaults:** Which model for sampling? - Recommendation: `claude-sonnet-4-5` (balance of speed + quality) - Alternative: `claude-opus-4` (enterprise tier only, higher quality) 3. **Community Tier Limit:** 100 calls/month generous enough? - Analysis: Average user makes 10-20 sampling calls per script - Recommendation: Start with 100, increase to 200 if too restrictive 4. **Pricing Validation:** $99 Pro / $499 Team / Custom Enterprise correct? - Benchmark: Terraform Cloud ($20/user), Docker Enterprise ($75/seat) - Recommendation: Start with $99, A/B test $79 vs $99 after 3 months ### Communication Plan **Internal (Development Team):** - Daily standups during Week 1-3 - Code reviews via GitHub PR (review within 24h) - Blocker discussions in project Slack channel **External (Community):** - Announce sampling feature in GitHub Discussions (Month 2) - Beta program invitation (50 users, Month 2) - Blog post: "How We Built Recursive LLM Sampling" (Month 3) - Product Hunt launch: Code Executor MCP Pro (Month 3) **Enterprise (Sales):** - Create enterprise deck (Month 3) - Outreach to 20 target companies (Month 4) - Pilot program: 3-month free trial for early adopters (Month 4-6) --- ## Success Metrics ### Technical Metrics **Performance:** - [x] Sampling overhead: <100ms per call - [x] Bridge server startup: <50ms - [x] Memory footprint: <50MB for bridge server - [x] Concurrent executions: 100+ without degradation **Quality:** - [x] Test coverage: 90%+ overall, 100% security - [x] TypeScript strict mode: zero errors - [x] Linting: zero warnings - [x] Documentation: 100% API coverage **Security:** - [x] Zero critical vulnerabilities (npm audit) - [x] Content filter: 99%+ secret detection rate - [x] Rate limiting: prevents all infinite loop attacks - [x] Audit logging: 100% sampling calls logged ### Business Metrics **Month 1-2 (MVP Launch):** - [ ] GitHub stars: 1,000+ (from current 500) - [ ] Community users: 50+ active (using sampling) - [ ] Beta feedback: 8+ NPS score - [ ] Conversion interest: 20%+ willing to pay **Month 3 (Pro Launch):** - [ ] Pro customers: 10 ($1k MRR) - [ ] Community retention: 80%+ monthly active - [ ] Churn rate: <5% monthly - [ ] Support tickets: <10/week **Month 6 (Team Launch):** - [ ] Pro customers: 30 ($3k MRR) - [ ] Team customers: 5 ($2.5k MRR) - [ ] Total MRR: $5.5k - [ ] CAC: <$500 (organic growth) **Month 12 (Enterprise):** - [ ] Enterprise customers: 2 ($20k ARR each) - [ ] Pro+Team: 50 customers ($10k MRR) - [ ] Total ARR: $160k ($13k MRR) - [ ] Team size: 3 (founder + 2 engineers) --- ## Appendix ### A. MCP Sampling Specification **Method:** `sampling/createMessage` **Request:** ```json { "method": "sampling/createMessage", "params": { "messages": [ { "role": "user", "content": { "type": "text", "text": "Analyze this code for bugs" } } ], "modelPreferences": { "hints": [{ "name": "claude-sonnet-4-5" }] }, "systemPrompt": "You are a code analysis expert", "maxTokens": 1024, "includeContext": "none" } } ``` **Response:** ```json { "model": "claude-sonnet-4-5", "stopReason": "end_turn", "role": "assistant", "content": { "type": "text", "text": "Analysis: I found 3 potential issues..." } } ``` ### B. Environment Variables Reference **Sampling Configuration:** - `CODE_EXECUTOR_SAMPLING_ENABLED=true` - Enable sampling globally - `CODE_EXECUTOR_MAX_SAMPLING_ROUNDS=20` - Override max rounds - `CODE_EXECUTOR_MAX_SAMPLING_TOKENS=20000` - Override max tokens - `CODE_EXECUTOR_SAMPLING_TIMEOUT_MS=60000` - Override timeout - `CODE_EXECUTOR_SAMPLING_CONTENT_FILTER=true` - Enable content filtering **Licensing (Post-MVP):** - `CODE_EXECUTOR_LICENSE_FILE=/path/to/license.json` - License file path - `CODE_EXECUTOR_LICENSE_SERVER=https://license.code-executor.dev` - License server URL - `CODE_EXECUTOR_TIER=pro|team|enterprise` - Override tier (dev/test only) ### C. Resources **Documentation:** - MCP Specification: https://spec.modelcontextprotocol.io/ - Claude API Docs: https://docs.anthropic.com/claude/reference - Deno Security Model: https://deno.com/manual/basics/permissions **Tools:** - GitHub: https://github.com/aberemia24/code-executor-MCP - npm: https://www.npmjs.com/package/code-executor-mcp - Docker Hub: https://hub.docker.com/r/aberemia24/code-executor-mcp **Community:** - Discussions: https://github.com/aberemia24/code-executor-MCP/discussions - Issues: https://github.com/aberemia24/code-executor-MCP/issues - Discord: [TBD - create after 1k stars] --- **Document Version:** 1.0 **Last Updated:** 2025-01-20 **Next Review:** After Week 1 completion

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/aberemia24/code-executor-MCP'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

sampling-implementation-plan.md•42.8 KiB