# Iris MCP - System Architecture
**Version:** 3.0 (Major Update)
**Date:** October 18, 2025
**Status:** Production-ready with Dashboard, Transport Abstraction, and Reverse MCP
## Table of Contents
1. [Executive Summary](#executive-summary)
2. [Architectural Principles](#architectural-principles)
3. [System Overview](#system-overview)
4. [Component Hierarchy](#component-hierarchy)
5. [Two-Timeout Architecture](#two-timeout-architecture)
6. [Data Flow](#data-flow)
7. [State Management](#state-management)
8. [Event-Driven Communication](#event-driven-communication)
9. [Future Phases](#future-phases)
---
## Executive Summary
Iris MCP is a Model Context Protocol server that enables **cross-project Claude Code coordination**. Multiple Claude instances running in different project directories can communicate and collaborate through MCP tools, coordinated by a central Iris orchestrator.
**Key Innovation:** The refactored architecture implements a **"dumb pipe, smart brain"** pattern where:
- **ClaudeProcess** = Pure I/O pipe (no business logic)
- **Iris** = Central orchestrator (all business logic)
- **Cache** = Event-driven storage with RxJS observables
- **Two Timeouts** = Separate concerns for process health vs. caller patience
**Performance:** 52% faster than cold starts through intelligent process pooling with LRU eviction.
---
## Architectural Principles
### 1. Separation of Concerns
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ BUSINESS LOGIC LAYER โ
โ (Iris Brain) โ
โ - Completion detection โ
โ - Timeout orchestration โ
โ - Process state management โ
โ - Cache coordination โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TRANSPORT LAYER โ
โ (ClaudeProcess - Dumb Pipe) โ
โ - Spawn processes โ
โ - Write stdin โ
โ - Read stdout/stderr โ
โ - Pipe to cache (NO decisions) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ STORAGE LAYER โ
โ (Cache with RxJS Observables) โ
โ - Store protocol messages โ
โ - Emit events on new data โ
โ - Survive process recreation โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
**Why This Matters:**
- ClaudeProcess can be restarted without losing business logic state
- Cache survives process crashes, preserving partial responses
- Iris can orchestrate multiple processes with centralized intelligence
### 2. Event-Driven Architecture
The system uses **RxJS observables** for reactive programming:
```typescript
// Cache emits events when messages arrive
cacheEntry.messages$.subscribe(message => {
// Iris reacts to new data
if (message.type === 'result') {
iris.handleCompletion();
}
});
```
**Benefits:**
- Decoupled components
- Real-time reactivity
- Easy to extend with new observers
- Foundation for Phase 5 Intelligence Layer
### 3. Process Isolation
Each **fromTeam โ toTeam** pair gets its own:
- Session record (SQLite)
- Claude process (isolated conversation)
- Cache session (message history)
```
team-iris โ team-alpha โโโบ Session A โโโบ Process A โโโบ Cache A
team-iris โ team-beta โโโบ Session B โโโบ Process B โโโบ Cache B
team-alpha โ team-beta โโโบ Session C โโโบ Process C โโโบ Cache C
```
### 4. Graceful Degradation
System handles failures gracefully:
- Process crashes โ Cache preserved, process recreated
- Response timeout โ Process restarted, partial results available
- Pool limit reached โ LRU eviction with warning
- Configuration errors โ Clear error messages with remediation steps
---
## System Overview
### High-Level Architecture
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MCP CLIENT โ
โ (Claude Code Instance) โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MCP Protocol (stdio/HTTP)
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MCP SERVER (index.ts) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Tool Registration (18 tools) โ โ
โ โ - send_message - team_wake - list_teams โ โ
โ โ - ask_message - team_launch - get_logs โ โ
โ โ - quick_message - team_wake_all - get_date โ โ
โ โ - session_reboot - team_sleep - get_agent โ โ
โ โ - session_delete - team_status - permissions__appr โ โ
โ โ - session_fork - session_report โ โ
โ โ - session_cancel โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Tool Invocation
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ IRIS ORCHESTRATOR (iris.ts) โ
โ THE BRAIN โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โข Completion detection (watches for 'result' messages) โ โ
โ โ โข responseTimeout (120s default, resets on each message) โ โ
โ โ โข mcpTimeout (-1=async, 0=forever, N=partial after Nms) โ โ
โ โ โข Process state management (spawning/idle/processing) โ โ
โ โ โข Cache coordination (creates entries, subscribes) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโฌโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โ manages โ coordinates โ queries/updates
โผ โผ โผ
โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ
โ PROCESS POOL โ โ CACHE MANAGER โ โ SESSION MANAGER โ
โ (pool-manager) โ โ (cache-manager) โ โ (session-manager) โ
โโโโโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโโโโโโโโโโค
โ LRU Eviction โ โ MessageCaches โ โ SQLite Storage โ
โ Health Checks โ โ RxJS Observablesโ โ Process State โ
โ Max 10 Process โ โ Message History โ โ Usage Statistics โ
โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ
โ โ โ
โ contains โ contains โ persists
โผ โผ โผ
โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ
โ CLAUDE PROCESS โ โ MESSAGE CACHE โ โ SQLite Database โ
โ (coordinator) โ โ (per team pair) โ โ team-sessions.db โ
โโโโโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโโโโโโโโโโค
โ transport โ โ createEntry() โ โ id (PK) โ
โ spawn() โ โ getAllEntries() โ โ from_team โ
โ executeTell() โ โ getStats() โ โ to_team โ
โโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โ session_id (UNIQUE) โ
โ โ created_at โ
โ contains โ last_used_at โ
โ message_count โ
โ status โ
โ process_state โ
โ current_cache_id โ
โ last_response_at โ
โ launch_command โ
โ team_config_snapshot โ
โโโโโโโโโโโโโโโโโโโโโโโโ
โผ
โโโโโโโโโโโโโโโโโโโ
โ CACHE ENTRY โ
โ (per tell/spawn)โ
โโโโโโโโโโโโโโโโโโโค
โ messages[] โ
โ messages$ (RxJS)โ
โ status โ
โ complete() โ
โ terminate() โ
โโโโโโโโโโโโโโโโโโโ
โ
โ contains
โผ
โโโโโโโโโโโโโโโโโโโ
โ CACHE MESSAGE โ
โ (protocol msg) โ
โโโโโโโโโโโโโโโโโโโค
โ timestamp โ
โ type โ
โ data (raw JSON) โ
โโโโโโโโโโโโโโโโโโโ
```
### Component Responsibilities
| Component | Responsibility | Business Logic? |
|-----------|---------------|-----------------|
| **MCP Server** | Register tools, validate inputs | Minimal |
| **Iris Orchestrator** | ALL business logic | โ
YES |
| **Process Pool** | Manage process lifecycle, LRU | Limited (lifecycle) |
| **ClaudeProcess** | Spawn, stdio piping | โ NO |
| **Cache Manager** | Manage message caches | Minimal |
| **MessageCache** | Store entries for team pair | No |
| **Cache Entry** | Store messages, emit events | No (just storage) |
| **Session Manager** | Persist session metadata | No (just CRUD) |
| **Config Manager** | Load/validate config | No (just I/O) |
---
## Component Hierarchy
### Cache Hierarchy
```
CacheManager (singleton)
โ
โโโ MessageCache (sessionId: "uuid-1", fromTeam: null, toTeam: "alpha")
โ โ
โ โโโ CacheEntry (type: SPAWN, tellString: "ping")
โ โ โโโ CacheMessage[] (system/init, assistant, result)
โ โ
โ โโโ CacheEntry (type: TELL, tellString: "What is 2+2?")
โ โ โโโ CacheMessage[] (user, assistant, stream_event, result)
โ โ
โ โโโ CacheEntry (type: TELL, tellString: "Explain quantum physics")
โ โโโ CacheMessage[] (user, assistant, assistant, result)
โ
โโโ MessageCache (sessionId: "uuid-2", fromTeam: "alpha", toTeam: "beta")
โโโ CacheEntry (type: TELL, tellString: "Review this PR")
โโโ CacheMessage[] (...)
```
**Lifetime:**
- `CacheManager`: Lives for entire Iris process lifetime
- `MessageCache`: Lives until explicitly destroyed (survives process crashes)
- `CacheEntry`: Lives until completed/terminated
- `CacheMessage`: Immutable once added
### Process Pool Hierarchy
```
ClaudeProcessPool
โ
โโโ ClaudeProcess (poolKey: "iris->alpha", sessionId: "uuid-1")
โ - teamName: "alpha"
โ - isReady: true
โ - isBusy: false
โ - currentCacheEntry: null
โ
โโโ ClaudeProcess (poolKey: "frontend->backend", sessionId: "uuid-2")
โ - teamName: "backend"
โ - isReady: true
โ - isBusy: true
โ - currentCacheEntry: <pointer to cache entry>
โ
โโโ ClaudeProcess (poolKey: "alpha->beta", sessionId: "uuid-3")
- teamName: "beta"
- isReady: false
- isBusy: false
- currentCacheEntry: null
```
**Pool Key Format:** `fromTeam->toTeam` (e.g., `"iris->alpha"`, `"alpha->beta"`, `"frontend->backend"`)
**LRU Tracking:** Array of pool keys ordered by access time (least recent first)
---
## Transport Abstraction Layer
### Critical Architectural Component
**ClaudeProcess does NOT directly spawn processes.** It delegates to a **Transport abstraction layer** that handles local and remote execution transparently.
### Architecture Diagram
```
Iris Orchestrator
โ
ClaudeProcessPool
โ
ClaudeProcess (wrapper/coordinator)
โ
Transport (abstraction interface)
โโโ LocalTransport โ child_process.spawn()
โโโ SSHTransport โ OpenSSH client (ssh command)
```
### Transport Interface
**Location:** `src/transport/transport.interface.ts`
```typescript
interface Transport {
// RxJS reactive streams
status$: Observable<TransportStatus>; // STOPPED โ CONNECTING โ SPAWNING โ READY โ BUSY
errors$: Observable<Error>; // Error stream
// Core operations
spawn(
spawnCacheEntry: CacheEntry,
commandInfo: CommandInfo, // Pre-built command (executable, args, cwd)
spawnTimeout?: number // Timeout in ms (default: 20000)
): Promise<void>;
executeTell(cacheEntry: CacheEntry): void;
terminate(): Promise<void>;
// State queries
isReady(): boolean;
isBusy(): boolean;
getPid(): number | null; // Local only, null for remote
// Metrics & debugging
getMetrics(): TransportMetrics;
getLaunchCommand?(): string | null; // Debug: Get full launch command
getTeamConfigSnapshot?(): string | null; // Debug: Get team config JSON
cancel?(): void; // Send ESC to stdin (attempt cancel)
}
```
### Implementations
#### 1. LocalTransport โ
**Location:** `src/transport/local-transport.ts`
**Purpose:** Execute Claude CLI on the local machine
**Mechanism:**
- Uses Node.js `child_process.spawn()`
- Direct stdio piping to cache
- Process runs in team's project directory
**Key Features:**
- Fast startup (~2s warm, ~7s cold)
- Direct process control
- Native stdio handling
- PID tracking
#### 2. SSHTransport โ
**Location:** `src/transport/ssh-transport.ts`
**Purpose:** Execute Claude CLI on remote hosts via SSH
**Mechanism:**
- Uses OpenSSH client (`ssh` command)
- Tunnels stdio over SSH connection
- Supports all SSH features (agent forwarding, ProxyJump, etc.)
**Key Features:**
- Automatic SSH config integration (`~/.ssh/config`)
- Keepalive support (ServerAliveInterval, ServerAliveCountMax)
- Reverse MCP tunneling (`ssh -R` for remote โ local calls)
- Session MCP configuration (bidirectional communication)
- Remote MCP config file deployment
**Configuration:**
```yaml
teams:
team-remote:
remote: ssh inanna # OpenSSH command
path: /opt/containers # Remote path
enableReverseMcp: true # SSH tunnel for callbacks
sessionMcpEnabled: true # Deploy MCP config files
```
#### 3. TransportFactory โ
**Location:** `src/transport/transport-factory.ts`
**Purpose:** Select appropriate transport based on team configuration
**Logic:**
```typescript
class TransportFactory {
static create(teamName: string, config: IrisConfig, sessionId: string): Transport {
if (config.remote) {
return new SSHTransport(teamName, config, sessionId);
}
return new LocalTransport(teamName, config, sessionId);
}
}
```
### ClaudeProcess Integration
**ClaudeProcess is now a thin coordinator:**
```typescript
class ClaudeProcess extends EventEmitter {
private transport: Transport; // Abstraction
constructor(teamName: string, config: IrisConfig, sessionId: string) {
this.transport = TransportFactory.create(teamName, config, sessionId);
}
async spawn(cacheEntry: CacheEntry): Promise<void> {
return this.transport.spawn(cacheEntry, commandInfo, timeout);
}
executeTell(cacheEntry: CacheEntry): void {
this.transport.executeTell(cacheEntry);
}
}
```
**ClaudeProcess responsibilities:**
- โ
Coordinate transport lifecycle
- โ
Bridge transport events to ProcessPool
- โ
Maintain status observables
- โ
Track metrics
- โ Does NOT spawn processes directly
- โ Does NOT manage stdio (delegated to transport)
### Benefits of Transport Abstraction
1. **Remote Execution:** Teams can run on any SSH-accessible host
2. **Transparency:** Iris treats local/remote identically
3. **Extensibility:** Easy to add new transports (Docker, Kubernetes, WebSocket)
4. **Testability:** Mock transports for unit testing
5. **Separation of Concerns:** Process orchestration vs. execution mechanism
### RxJS Reactive Streams
Both LocalTransport and SSHTransport emit reactive status updates:
```typescript
transport.status$.subscribe(status => {
// STOPPED โ CONNECTING โ SPAWNING โ READY โ BUSY โ READY
console.log('Transport status changed:', status);
});
transport.errors$.subscribe(error => {
// Handle transport-level errors
console.error('Transport error:', error);
});
```
**Integration:** ClaudeProcess subscribes to transport observables and forwards to ProcessPool.
---
## Two-Timeout Architecture
The refactored system separates two distinct timeout concerns:
### 1. Response Timeout (Process Health Monitor)
**Source:** `config.yaml` โ `settings.responseTimeout` (default: 120000ms = 2 minutes)
**Managed By:** Iris
**Purpose:** Detect stalled Claude processes
**Behavior:** Timer resets on EVERY message received from Claude
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Response Timeout Lifecycle โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Tell sent to ClaudeProcess
โ
โผ
[Start responseTimeout timer: 120s]
โ
โโโโบ Message received โ [Reset timer to 120s]
โโโโบ Message received โ [Reset timer to 120s]
โโโโบ Message received โ [Reset timer to 120s]
โโโโบ 'result' message โ [Complete successfully, clear timer]
โ
โโโโบ 120s elapsed with NO messages
โ
โผ
[RESPONSE TIMEOUT!]
โ
โโโบ Terminate cache entry (reason: RESPONSE_TIMEOUT)
โโโบ Kill process
โโโบ Update session state to 'stopped'
โโโบ Cache preserved for retrieval
```
**Key Points:**
- Timer is **cumulative** (resets on each message)
- Claude streaming many messages = timer keeps resetting
- Claude hangs/crashes = timer expires after 120s of silence
- Process recreated, cache preserved
### 2. MCP Timeout (Caller Patience)
**Source:** Tool call parameter `timeout: number`
**Managed By:** Iris (but honors caller's wishes)
**Purpose:** Control how long the MCP caller waits for a response
**Behavior:** Fixed duration, does NOT reset
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MCP Timeout Modes โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
timeout: -1 โ ASYNC MODE
Return immediately: { status: "async", sessionId }
Process continues running
Caller retrieves results later via team_cache_read
timeout: 0 โ WAIT FOREVER
Wait until 'result' message or responseTimeout
No partial results returned
Only returns on completion or error
timeout: N โ PARTIAL MODE (N milliseconds)
Wait N ms, then return:
{
status: "mcp_timeout",
partialResponse: "extracted text so far...",
rawMessages: [...all messages received]
}
Process continues running in background
```
### Interaction Between Timeouts
```
Example: timeout=30000 (30s MCP), responseTimeout=120000 (120s response)
Time Event
โโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
0s Tell sent, both timers start
10s Message received โ responseTimeout resets to 120s (now 130s total)
20s Message received โ responseTimeout resets to 120s (now 140s total)
30s โ ๏ธ MCP TIMEOUT! โ Return partial results to caller
๐ Process STILL RUNNING in background
๐ responseTimeout STILL ACTIVE (resets to 120s at 140s)
45s Message received โ responseTimeout resets (now 165s)
50s 'result' message โ Process completes successfully
โ
Cache entry marked complete
โน๏ธ Caller already got partial response at 30s
```
**Key Insight:** The two timeouts are **orthogonal**:
- MCP timeout controls **caller behavior**
- Response timeout controls **process health**
---
## Data Flow
### Complete Tell Flow (Successful Case)
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 1. MCP Tool Call โ
โ send_message(toTeam: "alpha", message: "Hello", timeout: 30000) โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 2. Iris Orchestrator (THE BRAIN) โ
โ a. Get/create session โ SessionManager โ
โ b. Check if busy โ session.processState === "processing"? โ
โ c. Get/create MessageCache โ CacheManager โ
โ d. Get/create ClaudeProcess โ ProcessPool โ
โ e. Spawn if needed (with SPAWN cache entry) โ
โ f. Create TELL cache entry โ
โ g. Update session.processState = "processing" โ
โ h. Start responseTimeout timer (120s, resets on messages) โ
โ i. Subscribe to cacheEntry.messages$ (RxJS observable) โ
โ j. Execute: process.executeTell(cacheEntry) โ
โ k. Start MCP timeout promise (30s fixed) โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 3. ClaudeProcess (DUMB PIPE) โ
โ a. Check: currentCacheEntry === null? (or throw ProcessBusy) โ
โ b. Set: currentCacheEntry = cacheEntry โ
โ c. Write to stdin: JSON.stringify({ type: "user", message }) โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 4. Claude Code Process (External - Black Box) โ
โ - Receives user message via stdin โ
โ - Thinks, uses tools, generates response โ
โ - Writes newline-delimited JSON to stdout โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ stdout (stream)
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 5. ClaudeProcess.handleStdoutData() (DUMB PIPE) โ
โ FOR EACH line in stdout: โ
โ - Parse JSON โ
โ - currentCacheEntry.addMessage(json) โ THAT'S IT! โ
โ - IF json.type === "result": โ
โ currentCacheEntry = null (clear for next tell) โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 6. CacheEntry (EVENT EMITTER) โ
โ addMessage(json): โ
โ - messages.push({ timestamp, type, data: json }) โ
โ - messagesSubject.next(message) โ Emit to RxJS observable โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ RxJS subscription
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 7. Iris RxJS Subscription (BUSINESS LOGIC) โ
โ cacheEntry.messages$.subscribe(msg => { โ
โ sessionManager.updateLastResponse(sessionId); โ
โ resetResponseTimeout(); โ Reset 120s timer โ
โ โ
โ IF msg.type === "result": โ
โ handleTellCompletion(): โ
โ - cacheEntry.complete() โ
โ - sessionManager.updateProcessState("idle") โ
โ - sessionManager.incrementMessageCount() โ
โ - subscription.unsubscribe() โ
โ - clearTimeout(responseTimeout) โ
โ - Resolve MCP promise with full response โ
โ }); โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
### Error Flow (Response Timeout)
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Claude Stops Responding (Hung/Crashed) โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ 120s with NO messages
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Iris.handleResponseTimeout() (RECOVERY LOGIC) โ
โ 1. cacheEntry.terminate(RESPONSE_TIMEOUT) โ
โ - Sets status = "terminated" โ
โ - Sets terminationReason โ
โ - Completes RxJS observable (messagesSubject.complete()) โ
โ โ
โ 2. Get MessageCache (still alive in CacheManager!) โ
โ โ
โ 3. Terminate old process โ
โ - oldProcess.terminate() โ SIGTERM/SIGKILL โ
โ - Process removed from pool โ
โ โ
โ 4. Update session state โ
โ - sessionManager.updateProcessState("stopped") โ
โ - sessionManager.setCurrentCacheSessionId(null) โ
โ โ
โ 5. Cache preserved for retrieval! โ
โ - MessageCache still contains all entries โ
โ - Partial responses available via team_cache_read โ
โ โ
โ 6. Next tell will create new process โ
โ - Same MessageCache reused โ
โ - New cache entry added to same cache โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
```
---
## State Management
### Process State Machine
```
โโโโโโโโโโโโ
โ stopped โ โ Initial state, no process
โโโโโโฌโโโโโโ
โ getOrCreateProcess()
โ spawn(spawnCacheEntry)
โผ
โโโโโโโโโโโโ
โโโโโโบโ spawning โ โ Process starting, waiting for init
โ โโโโโโฌโโโโโโ
โ โ init message received
โ โ isReady = true
โ โผ
โ โโโโโโโโโโโโ
โ โ idle โ โ Ready, not processing
โ โโโโโโฌโโโโโโ
โ โ executeTell(cacheEntry)
โ โผ
โ โโโโโโโโโโโโโโ
โ โ processing โ โ Actively processing a tell
โ โโโโโโฌโโโโโโโโ
โ โ
โ โโโโบ 'result' message โ idle
โ โ
โ โโโโบ responseTimeout โ terminating
โ โ
โ โโโโบ process.terminate() โ terminating
โ โ
โ โผ
โ โโโโโโโโโโโโโโโ
โโโโโโโโโโโโค terminating โ โ Shutting down
โโโโโโโฌโโโโโโโโ
โ process exit
โผ
โโโโโโโโโโโโ
โ stopped โ
โโโโโโโโโโโโ
```
**Stored In:** `SessionManager` โ SQLite โ `team_sessions.process_state`
**Managed By:** Iris (updates via `sessionManager.updateProcessState()`)
### Cache Entry Status
```
โโโโโโโโโโ
โ active โ โ Receiving messages, observable open
โโโโโฌโโโโโ
โ
โโโโโโโโโดโโโโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโโโ โโโโโโโโโโโโโโ
โ completed โ โ terminated โ
โโโโโโโโโโโโโ โโโโโโโโโโโโโโ
โ โ
โ โ
'result' msg responseTimeout / crash / manual
```
**Transitions:**
- `active โ completed`: Normal completion (result message)
- `active โ terminated`: Error condition (timeout, crash, manual kill)
---
## Event-Driven Communication
### RxJS Observable Flow
```
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ CacheEntry (Publisher) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ private messagesSubject = new Subject<CacheMessage>(); โ โ
โ โ public messages$: Observable<CacheMessage>; โ โ
โ โ โ โ
โ โ addMessage(data: any): void { โ โ
โ โ const msg = { timestamp, type, data }; โ โ
โ โ this.messages.push(msg); โ โ
โ โ this.messagesSubject.next(msg); โ Emit to subscribers โ โ
โ โ } โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Observable stream
โ
โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโ
โ โ โ
โผ โผ โผ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
โ Subscriber โ โ Subscriber โ โ Subscriber โ
โ (Iris) โ โ (Iris) โ โ (Future) โ
โโโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโโค โโโโโโโโโโโโโโโโค
โ Response โ โ Completion โ โ Analytics โ
โ Timeout โ โ Detection โ โ Dashboard โ
โ Timer Reset โ โ ('result') โ โ Monitoring โ
โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ
```
**Benefits for Future Phases:**
- Phase 2 Dashboard can subscribe for real-time updates
- Phase 3 API can stream events to WebSocket clients
- Phase 5 Intelligence can observe patterns for meta-cognition
### Event Types Emitted
| Source | Event Name | Data | Purpose |
|--------|-----------|------|---------|
| ClaudeProcess | `process-spawned` | `{ teamName, pid }` | Process started |
| ClaudeProcess | `process-exited` | `{ teamName, code, signal }` | Process ended |
| ClaudeProcess | `process-error` | `{ teamName, error }` | Process error |
| ClaudeProcess | `process-terminated` | `{ teamName }` | Manual termination |
| ClaudeProcessPool | `process-spawned` | (forwarded) | Pool awareness |
| ClaudeProcessPool | `health-check` | `{ status }` | Periodic health |
| CacheEntry | `messages$` | `CacheMessage` | New message (RxJS) |
---
## Future Phases
Iris is designed for **five progressive phases**:
### Phase 1: Core MCP Server โ
(CURRENT)
- Process pooling with LRU eviction
- MCP tools for team coordination
- Two-timeout architecture
- Event-driven cache with RxJS
- SQLite session persistence
**Status:** Complete (refactored Oct 2025)
### Phase 2: React Dashboard โ
**IMPLEMENTED**
**Location:** `src/dashboard/`
**Tech Stack:** React 18 + Vite + Express + Socket.io
**Purpose:** Web UI for monitoring teams, processes, and permissions
**Status:** Production-ready, fully functional
**Server-Side (src/dashboard/server/):**
- `index.ts` - Express server with WebSocket support
- `state-bridge.ts` - State synchronization with Iris core
- `routes/processes.ts` - Process management API
- `routes/config.ts` - Configuration management API
**Client-Side (src/dashboard/client/):**
- `ProcessMonitor.tsx` - Real-time process status monitoring
- `LogViewer.tsx` - Live log streaming with filtering
- `ConfigEditor.tsx` - Visual configuration editor
- `PermissionApprovalModal.tsx` - Manual permission approval UI
- `useWebSocket.ts` - WebSocket integration hook
**Features Implemented:**
- โ
Real-time process status with WebSocket updates
- โ
Permission approval system with modal dialogs
- โ
Real-time log streaming from wonder-logger
- โ
Session history and statistics
- โ
Manual process control (wake/sleep/terminate)
- โ
Configuration editor with validation
- โ
Health metrics visualization
- โ
Debug info display (launch commands, config snapshots)
**Integration:** Subscribes to RxJS observables via DashboardStateBridge, forwards events via Socket.io
### Phase 3: HTTP/WebSocket API โ ๏ธ **PARTIALLY IMPLEMENTED**
**Location:** `src/mcp_server.ts` (integrated) + `src/api/` (planned separate module)
**Tech Stack:** Express + StreamableHTTPServerTransport
**Purpose:** HTTP transport for MCP + external integrations
**Status:** HTTP/WS functionality exists, separate REST API module pending
**Currently Implemented (in MCP server):**
- โ
HTTP transport mode (`run("http", port)`)
- โ
`/mcp` - General MCP HTTP endpoint (JSON-RPC over HTTP)
- โ
`/mcp/:sessionId` - Session-specific endpoint for Reverse MCP
- โ
Express server with JSON middleware
- โ
WebSocket support via Dashboard server
- โ
StreamableHTTPServerTransport integration
**Planned (separate src/api/ module):**
- ๐ฎ RESTful API wrapper around MCP tools
- ๐ฎ `POST /api/teams/tell` - HTTP version of send_message
- ๐ฎ `GET /api/teams/:name/status` - Team status endpoint
- ๐ฎ `WS /api/stream` - Dedicated real-time event stream
**Note:** HTTP/WebSocket capabilities are fully functional for Dashboard and Reverse MCP, but a dedicated REST API module is still planned.
### Phase 4: CLI Interface โ ๏ธ **PARTIALLY IMPLEMENTED**
**Location:** `src/cli/`
**Tech Stack:** Plain TypeScript commands (Ink integration planned)
**Purpose:** Terminal commands for installation and management
**Status:** Basic commands implemented, interactive TUI pending
**Currently Implemented (src/cli/commands/):**
- โ
`install.ts` - Install Iris MCP and register with Claude CLI
- โ
`uninstall.ts` - Uninstall and cleanup
- โ
`add-team.ts` - Add team to configuration
**Planned (Ink-based Terminal UI):**
- ๐ฎ `iris teams list` - Show all teams with status
- ๐ฎ `iris tell <team> <message>` - Interactive tell with autocomplete
- ๐ฎ `iris monitor` - Live dashboard in terminal (Ink-based)
- ๐ฎ `iris cache inspect <sessionId>` - Interactive cache viewer
**Note:** Current CLI uses plain TypeScript. Ink 5 (React for terminals) integration is planned to reuse Dashboard components for TUI.
### Phase 5: Intelligence Layer ๐ฎ
**Location:** `src/intelligence/`
**Tech Stack:** TBD (ML/AI integration)
**Purpose:** Autonomous coordination
**Capabilities:**
- Pattern recognition from event streams
- Proactive team coordination
- Load balancing decisions
- Self-healing infrastructure
- Meta-cognitive reflection
**Foundation:** All events already emitted, observables already in place
---
## Configuration
**File:** `$IRIS_HOME/config.yaml` (or `~/.iris/config.yaml`)
**Key Settings:**
```yaml
settings:
sessionInitTimeout: 30000 # 30s for session file creation
responseTimeout: 120000 # 2min for process health (resets)
idleTimeout: 30000000 # 8.3hr before idle process cleanup
maxProcesses: 10 # LRU eviction limit
healthCheckInterval: 30000 # 30s health check frequency
teams:
team-name:
path: /absolute/path/to/project
description: Human-readable description
idleTimeout: 30000000 # Optional override
grantPermission: yes # Permission mode: yes/no/ask/forward
color: "#FF6B9D" # Hex color for future UI
```
**Hot-Reload:** Config watched with `fs.watchFile()`, reloads on changes
---
## Performance Characteristics
**Cold Start (No Pool):**
- Session file creation: ~7s
- Process spawn: ~7s per process
- 3 sequential messages: ~21s total
**Warm Start (With Pool):**
- Process reuse: ~2s per message
- 3 messages: ~11s total
- **52% faster!**
**Memory:**
- ~150MB per Claude process
- ~10MB for cache per session
- SQLite database < 1MB for 1000 sessions
**Scalability:**
- Max processes limited by config (default 10)
- LRU eviction prevents runaway memory
- SQLite handles 100K+ sessions easily
---
## Security Considerations
**Input Validation:**
- Team names validated against path traversal
- Messages sanitized (null bytes removed, length limits)
- Timeouts bounded (1s to 1hr)
**Process Isolation:**
- Each team runs in its own directory
- No shared state between processes
- Environment isolation via child_process
**Configuration:**
- Absolute paths required (no relative path traversal)
- Team paths validated on load
- Hot-reload with validation
---
## Logging
**Format:** Structured JSON to stderr
**Levels:** debug, info, warn, error
**Context:** Each logger scoped (e.g., `process:alpha`, `cache-manager`)
**Example:**
```json
{
"level": "info",
"context": "iris",
"message": "Tell completed successfully",
"sessionId": "uuid-123",
"cacheEntryType": "tell",
"messageCount": 5,
"timestamp": "2025-10-12T22:00:00.000Z"
}
```
**Why stderr?** Stdout reserved for MCP protocol (stdio transport)
---
## Testing Strategy
**Test Structure:**
- `tests/unit/` - Process pool, cache, validation
- `tests/integration/` - End-to-end MCP communication
- `tests/fixtures/` - Mock configurations
**Key Test Cases:**
- Two-timeout interaction (30s MCP, 120s response)
- Process recreation with cache preservation
- LRU eviction under load
- RxJS subscription cleanup
- SQLite schema migration
---
## Deployment
**Installation:**
```bash
npm install -g @iris-mcp/server
iris install # Creates config, registers with Claude CLI
```
**Running:**
```bash
iris start # Starts MCP server (stdio transport)
iris start --http # Future: HTTP transport
```
**Integration with Claude Code:**
- Registered in `~/.claude/config.yaml` as MCP server
- Auto-started by Claude CLI when tools invoked
- Process lifetime managed by Claude CLI
---
## Conclusion
The refactored Iris MCP architecture achieves:
โ
**Clean separation** - Dumb pipe vs. smart brain
โ
**Event-driven** - RxJS observables for reactivity
โ
**Resilient** - Cache survives process failures
โ
**Performant** - 52% faster with process pooling
โ
**Observable** - Rich event streams for monitoring
โ
**Extensible** - Foundation for 5 phases
**Next Steps:**
1. Complete integration testing
2. Production deployment
3. Begin Phase 2 dashboard development
---
## Tech Writer Notes
**Coverage Areas:**
- System architecture and component interaction patterns
- Two-timeout architecture (responseTimeout vs mcpTimeout)
- Data flow diagrams and state machines
- Event-driven communication with RxJS observables
- Cache hierarchy and process pool management
- Future phases (Dashboard, API, CLI, Intelligence Layer)
**Keywords:** architecture, system design, components, data flow, state machine, event-driven, RxJS, observables, cache hierarchy, process pool, two-timeout, responseTimeout, mcpTimeout, Iris orchestrator, ClaudeProcess, business logic layer, transport layer, storage layer
**Last Updated:** 2025-10-19
**Change Context:** MAJOR ARCHITECTURE DOCUMENTATION UPDATE (v3.0). Corrected Phase 2 status - Dashboard is fully implemented and production-ready (not future). Added comprehensive Transport Abstraction Layer section documenting LocalTransport/SSHTransport split. Fixed tool registration diagram (removed non-existent team_cache_read/team_cache_clear tools). Updated database schema diagram with all actual fields (launch_command, team_config_snapshot, etc.). Corrected Phase 3 & 4 status (HTTP/WS partially implemented, CLI partially implemented). Document now accurately reflects actual implementation state vs. planned features. Minor update 2025-10-19: Added get_agent tool to registration diagram (18 tools total).
**Changes from v2.1 โ v3.0:**
- โ
Added Transport Abstraction Layer section (fundamental architecture, was completely undocumented)
- โ
Updated Phase 2 status: Dashboard fully implemented (not future)
- โ
Updated Phase 3 status: HTTP/WS functionality exists in MCP server
- โ
Updated Phase 4 status: Basic CLI commands implemented (Ink integration pending)
- โ
Fixed tool registration: Removed team_cache_read/team_cache_clear (don't exist)
- โ
Updated database schema: Added all missing fields
- โ
Updated ClaudeProcess description: Now coordinator, not direct spawner
- โ
Added RxJS reactive streams documentation throughout
- โ
Added debug tooling documentation (getLaunchCommand, getTeamConfigSnapshot)
**Related Files:** ACTIONS.md (tool API), FEATURES.md (features), NOMENCLATURE.md (concepts), REMOTE.md (transport details), SESSION.md (session mgmt), DASHBOARD.md (dashboard docs), PERMISSIONS.md (permission system)
---
**Document Version:** 3.0
**Last Updated:** October 19, 2025
**Author:** Jenova (with Claude Code)