Skip to main content
Glama
PROCESS_POOL.md30.4 kB
# Process Pool Documentation **Location:** `src/process-pool/` **Purpose:** Manage Claude process lifecycle with LRU eviction and health monitoring **Performance:** 52% faster than cold starts through intelligent process reuse --- ## Table of Contents 1. [Overview](#overview) 2. [Architecture](#architecture) 3. [Component Details](#component-details) 4. [LRU Eviction Algorithm](#lru-eviction-algorithm) 5. [Health Check System](#health-check-system) 6. [Process Lifecycle](#process-lifecycle) 7. [Integration Points](#integration-points) 8. [Performance Analysis](#performance-analysis) --- ## Overview The Process Pool manages a **bounded pool** of Claude CLI processes with: - **LRU (Least Recently Used) eviction** when pool limit reached - **Automated health checks** every 30 seconds - **Session isolation** via poolKey (`fromTeam->toTeam`) - **Event-driven lifecycle** management **Key Innovation:** Process pooling delivers 52% performance improvement by reusing warm processes instead of spawning new ones for every request. --- ## Architecture ### Two-Layer Structure ``` ┌─────────────────────────────────────────────────────────────────┐ │ ClaudeProcessPool (pool-manager.ts) │ │ Pool Manager │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ processes: Map<poolKey, ClaudeProcess> │ │ │ │ sessionToProcess: Map<sessionId, poolKey> │ │ │ │ accessOrder: string[] (LRU tracking) │ │ │ │ │ │ │ │ getOrCreateProcess(team, sessionId, fromTeam) │ │ │ │ evictLRU() │ │ │ │ performHealthCheck() │ │ │ │ terminateAll() │ │ │ └───────────────────────────────────────────────────────────┘ │ └──────┬──────────────────────────────────────────────────────────┘ │ manages ▼ ┌─────────────────────────────────────────────────────────────────┐ │ ClaudeProcess (claude-process.ts) │ │ Dumb Pipe │ │ ┌───────────────────────────────────────────────────────────┐ │ │ │ teamName: string │ │ │ │ sessionId: string │ │ │ │ currentCacheEntry: CacheEntry | null │ │ │ │ isReady: boolean │ │ │ │ │ │ │ │ spawn(spawnCacheEntry) │ │ │ │ executeTell(cacheEntry) │ │ │ │ terminate() │ │ │ │ getBasicMetrics() │ │ │ └───────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` --- ## Component Details ### ClaudeProcessPool (pool-manager.ts) **Responsibility:** Central coordinator for all Claude processes **State:** ```typescript class ClaudeProcessPool extends EventEmitter { // Core process storage private processes = new Map<string, ClaudeProcess>(); // Session → poolKey mapping for fast lookup private sessionToProcess = new Map<string, string>(); // LRU tracking (least recent at front) private accessOrder: string[] = []; // Health check timer private healthCheckInterval: NodeJS.Timeout | null; } ``` **Pool Key Format:** ``` fromTeam → toTeam ──────────────── iris → alpha (Iris team to alpha team) alpha → beta (Alpha team to beta team) frontend → backend (Frontend team to backend team) ``` **Key Insight:** Pool key creates **conversation isolation**. Each team pair gets its own process, preventing context mixing. --- ### Method: getOrCreateProcess() **Purpose:** Get existing process or create new one (with LRU eviction if needed) **Signature:** ```typescript async getOrCreateProcess( teamName: string, sessionId: string, fromTeam: string = null ): Promise<ClaudeProcess> ``` **Flow:** ``` ┌─────────────────────────────────────────────────────────────────┐ │ 1. Generate Pool Key │ │ poolKey = `${fromTeam}->${teamName}` │ └────────────────────┬────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ 2. Check if Process Exists │ │ existing = processes.get(poolKey) │ │ if (existing && existing.isReady && !existing.isBusy): │ │ Update LRU access order │ │ return existing ←── Fast path (52% faster!) │ └────────────────────┬────────────────────────────────────────────┘ │ Process not found or unhealthy ▼ ┌─────────────────────────────────────────────────────────────────┐ │ 3. Check Pool Limit │ │ if (processes.size >= maxProcesses): │ │ await evictLRU() ←── Make space │ └────────────────────┬────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ 4. Create New Process │ │ process = new ClaudeProcess(teamName, config, sessionId) │ │ Set up event listeners (terminated, exited, error) │ └────────────────────┬────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ 5. Spawn with Temporary Cache Entry │ │ spawnCacheEntry = new CacheEntryImpl(SPAWN, "ping") │ │ await process.spawn(spawnCacheEntry) │ │ ⚠️ If spawn fails → cleanup zombie process, throw error │ └────────────────────┬────────────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────┐ │ 6. Add to Pool │ │ processes.set(poolKey, process) │ │ sessionToProcess.set(sessionId, poolKey) │ │ accessOrder.push(poolKey) ←── Most recently used │ │ return process │ └─────────────────────────────────────────────────────────────────┘ ``` **Zombie Process Cleanup (Critical):** ```typescript try { await process.spawn(spawnCacheEntry); // Success - add to pool } catch (error) { // CRITICAL: Clean up the zombie process await process.terminate().catch((termError) => { logger.warn({ error: termError }, "Failed to terminate zombie process"); }); // Re-throw the original error throw error; } ``` **Why This Matters:** If spawn fails, the process object exists in a corrupted state. Must terminate to free resources before propagating error. --- ## LRU Eviction Algorithm **Purpose:** When pool limit reached, remove least recently used idle process **Trigger:** `processes.size >= maxProcesses` (default: 10) **Algorithm:** ``` ┌─────────────────────────────────────────────────────────────────┐ │ evictLRU() Method │ └─────────────────────────────────────────────────────────────────┘ Step 1: Find first IDLE process in accessOrder (Iterate from front - least recent first) accessOrder: [ "iris->alpha", ← Least recent (check first) "alpha->beta", "frontend->gamma" ← Most recent (check last) ] Step 2: Check process status if (process.getBasicMetrics().isBusy === false): Select as victim Step 3: If no idle processes found Evict oldest anyway (front of accessOrder) (All processes busy - need to make space) Step 4: Terminate victim process await terminateProcess(victimPoolKey) - Send SIGTERM - Event handlers clean up maps - Remove from accessOrder Step 5: Space available for new process getOrCreateProcess() can now add new process ``` **Example Scenario:** ``` Pool State (maxProcesses: 3): processes: { "iris->alpha": ClaudeProcess (isBusy: false, lastUsed: 10s ago), "frontend->backend": ClaudeProcess (isBusy: true, lastUsed: 5s ago), "alpha->gamma": ClaudeProcess (isBusy: false, lastUsed: 2s ago) } accessOrder: ["iris->alpha", "frontend->backend", "alpha->gamma"] New Request: getOrCreateProcess("delta", ..., fromTeam="mobile") Action: 1. Pool full (3/3) 2. evictLRU() called 3. Check "iris->alpha" → NOT busy → SELECT AS VICTIM 4. Terminate "iris->alpha" 5. Pool now (2/3), space for "mobile->delta" ``` **Access Order Maintenance:** ```typescript private updateAccessOrder(poolKey: string): void { // Remove if exists this.removeFromAccessOrder(poolKey); // Add to end (most recently used) this.accessOrder.push(poolKey); } private removeFromAccessOrder(poolKey: string): void { const index = this.accessOrder.indexOf(poolKey); if (index > -1) { this.accessOrder.splice(index, 1); } } ``` **Updated On:** - Process created - Process accessed (getOrCreateProcess) - Process used (sendCommandToSession) --- ## Health Check System **Purpose:** Detect and remove unhealthy processes every 30 seconds **Configuration:** ```typescript healthCheckInterval: 30000 // ms (from config.yaml) ``` **Implementation:** ```typescript private startHealthCheck(): void { this.healthCheckInterval = setInterval(() => { this.performHealthCheck(); }, this.config.healthCheckInterval); } private performHealthCheck(): void { const processesToRemove: string[] = []; for (const [poolKey, process] of this.processes) { const metrics = process.getBasicMetrics(); // Remove stopped processes if (metrics.status === 'stopped') { processesToRemove.push(poolKey); continue; } // Log metrics for monitoring this.logger.debug('Process health check', { poolKey, status: metrics.status, uptime: metrics.uptime, isBusy: metrics.isBusy, }); } // Clean up stopped processes for (const poolKey of processesToRemove) { this.logger.info('Removing stopped process from pool', { poolKey }); this.processes.delete(poolKey); this.removeFromAccessOrder(poolKey); } // Emit health check event this.emit('health-check', this.getStatus()); } ``` **Health Indicators:** - `status === 'stopped'` → Remove from pool - `isReady === false && isSpawning === false` → Zombie state - Process crashes → Caught by event handlers, cleaned up immediately **Event-Driven Cleanup:** ```typescript // Set up event listeners when creating process process.on("terminated", (data) => { this.emit("process-terminated", data); this.processes.delete(poolKey); this.sessionToProcess.delete(sessionId); this.removeFromAccessOrder(poolKey); }); process.on("exited", (data) => { this.emit("process-exited", data); this.processes.delete(poolKey); this.sessionToProcess.delete(sessionId); this.removeFromAccessOrder(poolKey); }); ``` **Result:** Process crashes trigger immediate cleanup, health checks catch any missed cases. --- ## Process Lifecycle ### Creation Flow ``` ┌────────────────────────────────────────────────────────┐ │ Iris.sendMessage() → Need process for team "alpha" │ └────────────────────┬───────────────────────────────────┘ │ ▼ ┌────────────────────────────────────────────────────────┐ │ ProcessPool.getOrCreateProcess("alpha", sessionId, │ │ fromTeam="iris") │ │ 1. Check pool: processes.get("iris->alpha") │ │ 2. Not found → Create new │ └────────────────────┬───────────────────────────────────┘ │ ▼ ┌────────────────────────────────────────────────────────┐ │ new ClaudeProcess("alpha", config, sessionId) │ │ - Constructor initializes state │ │ - Sets up logging │ │ - NO child process yet │ └────────────────────┬───────────────────────────────────┘ │ ▼ ┌────────────────────────────────────────────────────────┐ │ process.spawn(spawnCacheEntry) │ │ 1. Create spawnCacheEntry (type=SPAWN, msg="ping") │ │ 2. Spawn child process with claude CLI │ │ 3. Set currentCacheEntry = spawnCacheEntry │ │ 4. Write "ping" to stdin │ │ 5. Wait for system/init message │ │ 6. isReady = true │ └────────────────────┬───────────────────────────────────┘ │ ▼ ┌────────────────────────────────────────────────────────┐ │ Pool adds process │ │ - processes.set("iris->alpha", process) │ │ - sessionToProcess.set(sessionId, "iris->alpha") │ │ - accessOrder.push("iris->alpha") │ │ - Emit 'process-spawned' event │ └────────────────────────────────────────────────────────┘ ``` ### Usage Flow ``` ┌────────────────────────────────────────────────────────┐ │ Iris.sendMessage("alpha", "What is 2+2?") │ └────────────────────┬───────────────────────────────────┘ │ ▼ ┌────────────────────────────────────────────────────────┐ │ Get process from pool │ │ process = pool.getOrCreateProcess("alpha", ...) │ │ - Already exists → Return existing (fast!) │ │ - Update accessOrder (move to end) │ └────────────────────┬───────────────────────────────────┘ │ ▼ ┌────────────────────────────────────────────────────────┐ │ Iris creates tell cache entry │ │ tellEntry = cacheSession.createEntry(TELL, "2+2?") │ └────────────────────┬───────────────────────────────────┘ │ ▼ ┌────────────────────────────────────────────────────────┐ │ Execute tell │ │ process.executeTell(tellEntry) │ │ 1. Check isBusy → Throw if already processing │ │ 2. Set currentCacheEntry = tellEntry │ │ 3. Write message to stdin (JSON) │ │ 4. stdout → handleStdoutData() → pipe to cache │ └────────────────────┬───────────────────────────────────┘ │ ▼ ┌────────────────────────────────────────────────────────┐ │ Iris waits for completion │ │ - Subscribes to tellEntry.messages$ (RxJS) │ │ - Receives stream of messages │ │ - Detects 'result' message → Complete │ └────────────────────┬───────────────────────────────────┘ │ ▼ ┌────────────────────────────────────────────────────────┐ │ Process returns to idle state │ │ - currentCacheEntry = null │ │ - isReady = true, isBusy = false │ │ - Ready for next tell │ └────────────────────────────────────────────────────────┘ ``` ### Termination Flow ``` ┌────────────────────────────────────────────────────────┐ │ Trigger: Idle timeout / LRU eviction / Manual kill │ └────────────────────┬───────────────────────────────────┘ │ ▼ ┌────────────────────────────────────────────────────────┐ │ process.terminate() │ │ 1. Send SIGTERM to child process │ │ 2. Set 5s force kill timer (SIGKILL fallback) │ │ 3. Wait for 'exit' event │ └────────────────────┬───────────────────────────────────┘ │ ▼ ┌────────────────────────────────────────────────────────┐ │ Child process exits │ │ - Emit 'process-terminated' event │ │ - Clear force kill timer │ └────────────────────┬───────────────────────────────────┘ │ ▼ ┌────────────────────────────────────────────────────────┐ │ Event handler in Pool Manager │ │ process.on("terminated", () => { │ │ processes.delete(poolKey) │ │ sessionToProcess.delete(sessionId) │ │ accessOrder.remove(poolKey) │ │ }); │ └────────────────────────────────────────────────────────┘ ``` --- ## Integration Points ### With Iris Orchestrator **Iris creates and manages processes:** ```typescript class IrisOrchestrator { private processPool: ClaudeProcessPool; async sendMessage(fromTeam, toTeam, message) { // Get or create process const process = await this.processPool.getOrCreateProcess( toTeam, session.sessionId, fromTeam ); // Check if ready const metrics = process.getBasicMetrics(); if (!metrics.isReady) { // Should never happen - spawn is part of getOrCreateProcess throw new Error("Process not ready"); } // Execute tell process.executeTell(tellEntry); } } ``` ### With Cache System **Pool creates temporary cache entries for spawn:** ```typescript // In getOrCreateProcess() const spawnCacheEntry = new CacheEntryImpl(CacheEntryType.SPAWN, "ping"); await process.spawn(spawnCacheEntry); ``` **Why?** Spawn needs to capture init messages. Iris doesn't manage spawn cache entries - they're internal to the pool. ### With Session Manager **Pool uses session IDs for process isolation:** ```typescript // Each session gets its own process (or shares via pool key) const process = await pool.getOrCreateProcess( teamName, session.sessionId, // Used for --resume flag fromTeam ); ``` ### With Configuration **Pool respects configuration limits:** ```typescript interface ProcessPoolConfig { maxProcesses: number; // LRU eviction threshold healthCheckInterval: number; // Health check frequency idleTimeout: number; // Per-process idle timeout } ``` --- ## Performance Analysis ### Cold Start (No Pool) **Steps:** 1. Session file creation: ~30s (with 30s timeout) 2. Process spawn: ~2-5s 3. Init message wait: ~1s 4. Total: ~33-36s **Every Request:** Full cold start cost **3 Sequential Messages:** 3 × 36s = **108s** ### Warm Start (With Pool) **First Message (Cold):** 1. Session file creation: ~30s 2. Process spawn + cache: ~3s 3. Total: ~33s **Subsequent Messages (Warm):** 1. Get from pool: <1ms 2. Execute tell: ~2-3s (API latency) 3. Total: **~2-3s** **3 Sequential Messages:** 33s + 2s + 2s = **37s** **Improvement:** (108s - 37s) / 108s = **66% faster!** **Caveat:** First message always pays cold start cost. Pool benefits subsequent messages. ### Adjusted Analysis (Session File Pre-Created) If session files are pre-initialized (eager initialization at startup): **Cold Start (No Pool):** - Process spawn: ~2-5s per message - 3 messages: **6-15s** **Warm Start (With Pool):** - First spawn: ~3s - Next 2 messages: ~2s each - 3 messages: **7s** **Improvement:** (11s - 7s) / 11s = **36-52% faster** (varies by spawn time) ### Memory Trade-offs **Per Process:** - ClaudeProcess object: ~1 KB - Child process (Claude CLI): ~150 MB - Stdio buffers: ~1 MB **Pool of 10 Processes:** ~1.5 GB RAM **Trade-off Decision:** - Small pool (3-5): Lower memory, more evictions - Large pool (15-20): Higher memory, fewer evictions - Default (10): Balanced for typical usage --- ## API Reference ### ClaudeProcessPool ```typescript class ClaudeProcessPool extends EventEmitter { constructor( configManager: TeamsConfigManager, config: ProcessPoolConfig ); // Get or create process (main method) async getOrCreateProcess( teamName: string, sessionId: string, fromTeam: string ): Promise<ClaudeProcess>; // Get process by session ID getProcessBySessionId(sessionId: string): ClaudeProcess | undefined; // Terminate specific process async terminateProcess(teamName: string): Promise<void>; // Terminate all processes (shutdown) async terminateAll(): Promise<void>; // Get pool status getStatus(): ProcessPoolStatus; // Send command to session (e.g., /compact) async sendCommandToSession( sessionId: string, command: string ): Promise<string | null>; } ``` ### ClaudeProcess (Dumb Pipe) ```typescript class ClaudeProcess extends EventEmitter { constructor( teamName: string, irisConfig: IrisConfig, sessionId: string | null ); // Spawn process with cache entry for init messages async spawn(spawnCacheEntry: CacheEntry): Promise<void>; // Execute tell (throws ProcessBusyError if busy) executeTell(cacheEntry: CacheEntry): void; // Get basic metrics (no business logic) getBasicMetrics(): BasicProcessMetrics; // Terminate process async terminate(): Promise<void>; } ``` ### BasicProcessMetrics ```typescript interface BasicProcessMetrics { teamName: string; pid: number | null; uptime: number; isReady: boolean; isSpawning: boolean; isBusy: boolean; } ``` **Key Point:** NO `status` or `currentProcessingId` - that's managed by Iris in SessionManager, not by ClaudeProcess. --- ## Testing Strategy **Unit Tests:** - LRU eviction logic - Access order updates - Pool key generation - Zombie process cleanup **Integration Tests:** - Process spawn and reuse - LRU eviction under load - Health check detection - Event propagation **Load Tests:** - 100 concurrent tells to same team - 20 teams with 5 processes each - Pool limit with sustained load - Memory leak detection --- ## Future Enhancements ### 1. Intelligent Pre-warming **Current:** Processes created on-demand **Enhancement:** Pre-spawn processes for frequently used teams ```typescript async prewarmTeam(teamName: string): Promise<void> { // Spawn process in background // Ready for instant use on first request } ``` ### 2. Dynamic Pool Sizing **Current:** Fixed maxProcesses limit **Enhancement:** Adjust based on memory pressure ```typescript if (process.memoryUsage().heapUsed > threshold) { maxProcesses = Math.max(3, maxProcesses - 1); } ``` ### 3. Process Affinity **Current:** Random process assignment (via pool key) **Enhancement:** Pin specific callers to specific processes --- **Document Version:** 1.0 **Last Updated:** October 2025

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/jenova-marie/iris-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server