# System Architecture - Agent Orchestration Platform
## Overview
**Agent Orchestration Platform** architecture enabling sophisticated multi-agent Claude Code coordination through FastMCP and iTerm2 integration with maximum security isolation and persistent state management.
## Core Architecture Components
### **1. FastMCP Server Layer**
```
┌─────────────────────────────────────────────────────────────┐
│ FastMCP Server Core │
├─────────────────────────────────────────────────────────────┤
│ • Tool Registration & Validation │
│ • Authentication & Authorization │
│ • Request Routing & Response Handling │
│ • Error Handling & Recovery │
│ • Logging & Audit Trail │
└─────────────────────────────────────────────────────────────┘
```
**Key Responsibilities**:
- Expose 8 core MCP tools to Claude Desktop
- Validate all incoming requests with comprehensive security checks
- Route commands to appropriate system components
- Maintain audit logs with cryptographic integrity
- Handle authentication and authorization for agent operations
### **2. Agent Management Layer**
```
┌─────────────────────────────────────────────────────────────┐
│ Agent Lifecycle Manager │
├─────────────────────────────────────────────────────────────┤
│ • Agent Creation & Destruction │
│ • State Persistence & Recovery │
│ • Health Monitoring & Auto-restart │
│ • Resource Allocation & Limits │
│ • Inter-Agent Communication Coordination │
└─────────────────────────────────────────────────────────────┘
```
**Agent State Model**:
```python
@dataclass(frozen=True)
class AgentState:
agent_id: AgentId
session_id: SessionId
name: str # Agent_#
process_id: Optional[ProcessId]
iterm_tab_id: Optional[str]
status: AgentStatus # CREATED, STARTING, ACTIVE, IDLE, ERROR, TERMINATED
specialization: Optional[str]
system_prompt_suffix: str
claude_config: ClaudeConfig
last_heartbeat: datetime
conversation_history: List[Message]
resource_usage: ResourceMetrics
```
### **3. Session Management Layer**
```
┌─────────────────────────────────────────────────────────────┐
│ Session Coordinator │
├─────────────────────────────────────────────────────────────┤
│ • Session Creation & Destruction │
│ • Codebase Association & Security Boundaries │
│ • Agent Assignment & Load Balancing │
│ • Task File Monitoring & Synchronization │
│ • Performance Metrics & Optimization │
└─────────────────────────────────────────────────────────────┘
```
**Session State Model**:
```python
@dataclass(frozen=True)
class SessionState:
session_id: SessionId
name: str
root_path: Path
created_at: datetime
agents: Dict[AgentId, AgentState]
security_context: SecurityContext
task_files: TaskFileTracker
git_integration: GitState
performance_metrics: SessionMetrics
```
### **4. iTerm2 Integration Layer**
```
┌─────────────────────────────────────────────────────────────┐
│ iTerm2 API Manager │
├─────────────────────────────────────────────────────────────┤
│ • Tab Creation & Management │
│ • Session Monitoring & Health Checks │
│ • Text Injection & Output Capture │
│ • Window/Tab Layout Optimization │
│ • Crash Recovery & Restart Logic │
└─────────────────────────────────────────────────────────────┘
```
**iTerm2 Integration Patterns**:
- **Async Event Loop**: All iTerm2 operations via asyncio for performance
- **Tab-per-Agent**: Each agent gets dedicated iTerm2 tab
- **Health Monitoring**: Continuous tab and process health checks with `/debug` diagnostics
- **Recovery Logic**: Automatic tab recreation on iTerm crashes using `/logs` for debugging
- **Security Isolation**: Each tab runs in isolated directory context
- **Troubleshooting Integration**: Built-in debugging commands (`/debug`, `/logs`, `/mcp-debug`, `/config`)
### **5. Claude Code Process Management**
```
┌─────────────────────────────────────────────────────────────┐
│ Claude Code Orchestrator │
├─────────────────────────────────────────────────────────────┤
│ • Process Spawning & Configuration │
│ • ADDER+ Prompt Injection │
│ • CLI Command Integration │
│ • Output Parsing & Status Extraction │
│ • Resource Monitoring & Limits │
└─────────────────────────────────────────────────────────────┘
```
**Claude Code Integration**:
```python
@dataclass(frozen=True)
class ClaudeConfig:
model: str = "sonnet-3.5"
no_color: bool = True
skip_permissions: bool = False # Security-conscious default
verbose: bool = False
output_format: str = "text"
working_directory: Path = None # Session root path for "claude" activation
custom_commands: List[str] = field(default_factory=list)
resource_limits: ResourceLimits = field(default_factory=ResourceLimits)
def get_activation_command(self) -> str:
"""Generate Claude Code activation command for session directory."""
cmd = f"cd {self.working_directory} && claude"
if self.model != "sonnet-3.5":
cmd += f" --model {self.model}"
if self.no_color:
cmd += " --no-color"
return cmd
```
## MCP Tool Implementation Architecture
### **Tool 1: create_agent**
```python
@mcp.tool()
async def create_agent(
session_id: SessionId,
agent_name: str, # Must be Agent_#
specialization: Optional[str] = None,
system_prompt_suffix: str = "",
claude_config: Optional[ClaudeConfig] = None
) -> AgentCreationResult:
"""
Creates new Claude Code agent instance with iTerm2 tab and process.
Security:
- Validates agent_name format (Agent_#)
- Ensures unique agent numbers within session
- Applies resource limits and security constraints
Process:
1. Validate session exists and has capacity
2. Generate unique agent ID and verify naming
3. Create iTerm2 tab in session context
4. Spawn Claude Code process with configuration
5. Inject ADDER+ system prompt with agent name
6. Initialize agent state and health monitoring
7. Register agent in session and persist state
"""
```
### **Tool 2: delete_agent**
```python
@mcp.tool()
async def delete_agent(
agent_name: str,
force: bool = False
) -> AgentDeletionResult:
"""
Removes agent from system with cleanup and state persistence.
Security:
- Requires elevated permissions for force deletion
- Ensures clean shutdown and resource cleanup
- Maintains audit trail of agent lifecycle
Process:
1. Locate agent by name across all sessions
2. Send graceful shutdown signal to Claude Code process
3. Close associated iTerm2 tab
4. Clean up temporary files and resources
5. Remove agent from session state
6. Persist updated session state
7. Log deletion event with full audit trail
"""
```
### **Tool 3: create_session**
```python
@mcp.tool()
async def create_session(
root_path: Path,
session_name: str,
security_level: SecurityLevel = SecurityLevel.HIGH
) -> SessionCreationResult:
"""
Creates session tied to specific root filepath with security boundaries.
Security:
- Validates root_path accessibility and permissions
- Establishes filesystem boundaries for session
- Creates encrypted state storage
Process:
1. Validate root_path exists and is accessible
2. Create session directory structure if needed
3. Initialize Git integration and project analysis
4. Establish security context and file boundaries
5. Create session state storage with encryption
6. Initialize task file monitoring system
7. Register session in global state manager
"""
```
### **Tool 4: get_session_status**
```python
@mcp.tool()
async def get_session_status(
session_id: Optional[SessionId] = None
) -> SessionStatusResult:
"""
Returns comprehensive status of all agents in session(s).
Information Provided:
- Agent health and activity status
- Current task assignments and progress
- Resource usage and performance metrics
- iTerm2 tab status and connectivity
- Error states and recovery actions needed
Process:
1. Query all sessions (or specific session)
2. Poll agent health and process status
3. Check iTerm2 tab connectivity
4. Analyze task file states and progress
5. Calculate performance metrics
6. Aggregate status information
7. Return comprehensive status report
"""
```
### **Tool 5: delete_session**
```python
@mcp.tool()
async def delete_session(
session_id: SessionId,
cleanup_agents: bool = True,
preserve_work: bool = True
) -> SessionDeletionResult:
"""
Removes entire session and optionally all associated agents.
Security:
- Requires confirmation for destructive operations
- Preserves work products by default
- Maintains audit trail of session lifecycle
Process:
1. Validate session exists and permissions
2. Optionally shutdown all session agents gracefully
3. Close all associated iTerm2 tabs
4. Clean up session-specific resources
5. Preserve or clean work files based on flags
6. Remove session from global state
7. Log session destruction with full audit
"""
```
### **Tool 6: send_message_to_agent**
```python
@mcp.tool()
async def send_message_to_agent(
agent_name: str,
message: str,
prepend_adder: bool = True,
wait_for_response: bool = False
) -> MessageResult:
"""
Sends message to specific agent with automatic ADDER+ prepending.
Message Processing:
- Automatically prepends ADDER+ system prompt
- Injects agent name into system prompt
- Handles message formatting and escaping
- Manages iTerm2 text injection
Process:
1. Locate agent and validate it's active (use `/debug` if issues)
2. Construct full message with ADDER+ prompt
3. Inject agent name into system prompt
4. Send formatted message via iTerm2 text injection to session directory
5. Monitor agent response if requested (use `/logs` for debugging)
6. Log interaction with full message content
7. Return delivery confirmation and response
"""
```
### **Tool 7: clear_agent_conversation**
```python
@mcp.tool()
async def clear_agent_conversation(
agent_name: str,
preserve_state: bool = True
) -> ConversationClearResult:
"""
Closes current iTerm2 tab for agent (clears conversation).
State Management:
- Preserves agent configuration and task assignments
- Clears conversation history but maintains context
- Gracefully shuts down Claude Code process
Process:
1. Locate agent and verify it's active
2. Send shutdown signal to Claude Code process
3. Close associated iTerm2 tab
4. Clear conversation history from agent state
5. Preserve task context and configuration
6. Update agent status to IDLE
7. Log conversation clear event
"""
```
### **Tool 8: start_new_agent_conversation**
```python
@mcp.tool()
async def start_new_agent_conversation(
agent_name: str,
restore_context: bool = True
) -> ConversationStartResult:
"""
Opens new iTerm2 tab for agent (starts fresh conversation).
Context Management:
- Restores agent configuration and specialization
- Optionally restores recent task context
- Reinitializes Claude Code with preserved settings
Process:
1. Locate agent and verify it exists
2. Create new iTerm2 tab in session context
3. Spawn new Claude Code process with preserved config
4. Inject ADDER+ prompt with agent context
5. Optionally restore recent conversation context
6. Update agent state with new process/tab IDs
7. Log new conversation start event
"""
```
## Security Architecture
### **Multi-Layer Security Model**
```
┌─────────────────────────────────────────────────────────────┐
│ Network Layer │
│ • MCP Protocol Encryption │
│ • Request Authentication & Authorization │
│ • Rate Limiting & DDoS Protection │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Application Layer │
│ • Input Validation & Sanitization │
│ • Session Isolation & Boundaries │
│ • Agent Process Isolation │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ System Layer │
│ • Process Sandboxing │
│ • Filesystem Access Control │
│ • Resource Usage Limits │
└─────────────────────────────────────────────────────────────┘
```
### **Agent Isolation Model**
- **Process Separation**: Each agent runs in separate Claude Code process
- **Directory Jailing**: Agents restricted to session root directory
- **Resource Limits**: CPU, memory, and file descriptor limits per agent
- **Network Isolation**: No direct network access between agents
- **Communication Constraints**: Only through shared task files
### **State Encryption**
```python
@dataclass(frozen=True)
class SecurityContext:
session_encryption_key: bytes
agent_state_encryption: AESKey
audit_signing_key: ECDSAKey
filesystem_boundaries: Set[Path]
resource_limits: ResourceLimits
permission_model: PermissionSet
```
## Performance Architecture
### **Concurrency Model**
- **Event-Driven Architecture**: Full asyncio implementation
- **Non-Blocking I/O**: All iTerm2 and file operations async
- **Resource Pooling**: Shared iTerm2 connections and process pools
- **Intelligent Scheduling**: Load balancing across available agents
### **Scaling Constraints**
```python
@dataclass(frozen=True)
class SystemLimits:
max_sessions: int = 16
max_agents_per_session: int = 8
max_total_agents: int = 32
max_memory_per_agent: int = 512_MB
max_cpu_per_agent: float = 0.25 # 25% of one core
max_file_descriptors: int = 1024
max_concurrent_operations: int = 64
```
### **Performance Monitoring**
- **Real-time Metrics**: Agent CPU, memory, and I/O usage
- **Health Checks**: Continuous process and iTerm2 connectivity monitoring
- **Adaptive Throttling**: Dynamic rate limiting based on system load
- **Predictive Scaling**: Proactive resource allocation
## Integration Points
### **External Dependencies**
```
FastMCP Framework → High-performance MCP server
iTerm2 Python API → Terminal session management
Claude Code CLI → AI agent process orchestration
Asyncio Runtime → Event-driven concurrency
Cryptography Library → State encryption and audit signing
```
### **File System Architecture**
```
SESSION_ROOT/
├── .claude_session/ # Session metadata (encrypted)
│ ├── agents.json # Agent configurations
│ ├── security.json # Security context
│ └── audit.log # Encrypted audit trail
├── development/ # ADDER+ workflow files
│ ├── TODO.md # Master task tracker
│ ├── tasks/ # Individual task files
│ └── protocols/ # Development protocols
├── .git/ # Git integration
└── [codebase files] # Project source code
```
## Deployment Architecture
### **Development Setup**
```bash
# Install dependencies
uv sync
# Configure iTerm2 integration
python scripts/setup/setup_iterm.py
# Initialize MCP server
python src/main.py --mode development
# Register with Claude Desktop
python scripts/setup/register_mcp.py
```
### **Production Deployment**
- **Systemd Service**: Auto-restart and dependency management
- **Log Rotation**: Automated log cleanup and archival
- **Health Monitoring**: External health check endpoints
- **Backup Strategy**: Automated state backup and recovery
This architecture provides a robust, secure, and scalable foundation for sophisticated multi-agent Claude Code orchestration while maintaining strict security boundaries and comprehensive audit capabilities.