---
layout: modern
title: Architecture
---
# Architecture
Technical deep dive into the MCP-ComfyUI-FLUX system architecture.
## System Overview
```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Claude Desktop │────▶│ MCP Server │────▶│ ComfyUI │
│ (Client) │◀────│ (Node.js) │◀────│ (Python) │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │ │
│ │ │
User Input WebSocket GPU/CUDA
│ Protocol │
▼ ▼ ▼
Text Prompt JSON-RPC 2.0 FLUX Model
(11GB fp8)
```
## Component Architecture
### 1. MCP Server Container
**Base Image**: `node:20-alpine`
**Key Components**:
- MCP protocol implementation
- WebSocket client for ComfyUI
- Queue management system
- Auto-reconnection logic
**File Structure**:
```
/app/
├── dist/
│ └── index.js # Compiled MCP server
├── src/
│ ├── index.js # Main entry point
│ ├── comfyui-client.js # WebSocket client
│ ├── flux-workflow.js # Workflow templates
│ └── workflows/
│ ├── background-removal.js
│ └── upscaling.js
└── package.json
```
### 2. ComfyUI Container
**Base Image**: `nvidia/cuda:12.1.0-cudnn8-runtime-ubuntu22.04`
**Multi-stage Build**:
```dockerfile
# Stage 1: Python dependencies
FROM base AS python-deps
RUN --mount=type=cache,target=/root/.cache/pip \
pip install --no-cache-dir -r requirements.txt
# Stage 2: Model downloads
FROM python-deps AS models
RUN --mount=type=cache,target=/models/cache \
download_models.sh
# Stage 3: Runtime
FROM models AS runtime
CMD ["python", "main.py", "--highvram"]
```
**Optimization Features**:
- BuildKit cache mounts for pip/apt
- Layer caching for models
- Reduced image size (10.9GB)
- PyTorch 2.5.1 with CUDA 12.1
## Data Flow Architecture
### 1. Request Flow
```
User Prompt → Claude Desktop
↓
MCP Protocol (JSON-RPC)
↓
MCP Server (index.js)
↓
Workflow Generation (flux-workflow.js)
↓
WebSocket → ComfyUI API
↓
Queue System → Execution
```
### 2. Response Flow
```
GPU Generation → Image File
↓
ComfyUI Output Directory
↓
File System Mount
↓
MCP Server Retrieval
↓
File Path + Metadata
↓
Claude Desktop Display
```
## Network Architecture
### Docker Network
```yaml
networks:
mcp-network:
driver: bridge
ipam:
config:
- subnet: 172.28.0.0/16
```
### Service Communication
```
mcp-server → comfyui:8188 (internal)
host → localhost:8188 (web UI)
```
### WebSocket Protocol
```javascript
// Connection establishment
ws = new WebSocket('ws://comfyui:8188/ws');
// Message format
{
type: 'execute',
data: {
prompt_id: 'uuid-v4',
workflow: { /* ComfyUI nodes */ }
}
}
// Response events
{
type: 'execution_start',
type: 'execution_cached',
type: 'executing',
type: 'progress',
type: 'executed'
}
```
## Model Architecture
### FLUX Schnell FP8
**Model Files**:
```
models/
├── unet/
│ └── flux1-schnell-fp8-e4m3fn.safetensors (11GB)
├── clip/
│ ├── clip_l.safetensors (235MB)
│ └── t5xxl_fp8_e4m3fn_scaled.safetensors (4.9GB)
└── vae/
└── ae.safetensors (320MB)
```
**Quantization**:
- FP8 E4M3FN format
- 50% memory reduction
- 95% quality retention
- Native PyTorch 2.5.1 support
### Workflow Templates
**FLUX Generation**:
```javascript
{
"4": { // CLIPTextEncode
"inputs": {
"text": prompt,
"clip": ["30", 1] // DualCLIPLoader output
}
},
"13": { // UNETLoader
"inputs": {
"unet_name": "flux1-schnell-fp8-e4m3fn.safetensors"
}
},
"17": { // BasicScheduler
"inputs": {
"scheduler": "simple",
"steps": 4,
"denoise": 1
}
}
}
```
## Memory Management
### GPU Memory Allocation
```python
# PyTorch configuration
os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'max_split_size_mb:512'
# ComfyUI flags
--highvram # Keep models in VRAM
```
### Memory Usage Profile
| Component | VRAM Usage |
|-----------|------------|
| Base models loaded | ~10GB |
| Single generation | +1GB |
| Batch of 4 | +4GB |
| Upscaling | +2GB |
| Background removal | +1GB |
| Peak usage | ~16GB |
### Docker Memory
```yaml
services:
comfyui:
shm_size: "16g" # Shared memory
deploy:
resources:
limits:
memory: 20G
```
## Security Architecture
### Container Isolation
```yaml
security_opt:
- no-new-privileges:true
- seccomp:unconfined # Required for CUDA
```
### Volume Permissions
```dockerfile
# Non-root user
RUN useradd -m -u 1000 comfyui
USER comfyui
# Read-only mounts where possible
volumes:
- ./models:/models:ro
- ./output:/output:rw
```
### Network Security
```yaml
# Internal only communication
networks:
mcp-network:
internal: false # Allow external for Claude
# Port exposure
ports:
- "127.0.0.1:8188:8188" # Localhost only
```
## Performance Optimizations
### BuildKit Optimizations
```dockerfile
# Cache mount for pip
RUN --mount=type=cache,target=/root/.cache/pip \
pip install -r requirements.txt
# Cache mount for apt
RUN --mount=type=cache,target=/var/cache/apt \
apt-get update && apt-get install -y ...
```
### Model Loading
```python
# Lazy loading
model = load_checkpoint_guess_config(
ckpt_path,
output_vae=True,
output_clip=True,
embedding_directory=None
)
# Memory efficient loading
model_patcher = ModelPatcher(
model,
load_device=gpu,
offload_device=cpu
)
```
### Batch Processing
```javascript
// Native ComfyUI batching
"5": {
"class_type": "EmptyLatentImage",
"inputs": {
"batch_size": 4, // Process 4 images in parallel
"width": 1024,
"height": 1024
}
}
```
## Monitoring & Logging
### Container Logs
```bash
# MCP Server logs
docker logs mcp-comfyui-flux-mcp-server-1 -f
# ComfyUI logs
docker logs mcp-comfyui-comfyui-1 -f
```
### Health Checks
```yaml
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8188/system_stats"]
interval: 30s
timeout: 10s
retries: 3
```
### Performance Metrics
```javascript
// Execution timing
const startTime = Date.now();
await client.queuePrompt(workflow);
const executionTime = Date.now() - startTime;
// Memory monitoring
const stats = await client.getSystemStats();
console.log(`VRAM: ${stats.vram_used}/${stats.vram_total}`);
```
## Scaling Considerations
### Horizontal Scaling
```yaml
# Multiple ComfyUI instances
services:
comfyui-1:
image: mcp-comfyui-comfyui
devices:
- /dev/nvidia0
comfyui-2:
image: mcp-comfyui-comfyui
devices:
- /dev/nvidia1
```
### Load Balancing
```javascript
// Round-robin server selection
const servers = ['comfyui-1:8188', 'comfyui-2:8188'];
let currentServer = 0;
function getNextServer() {
const server = servers[currentServer];
currentServer = (currentServer + 1) % servers.length;
return server;
}
```
### Queue Management
```javascript
// Priority queue implementation
class PriorityQueue {
constructor() {
this.high = []; // Premium users
this.normal = []; // Regular users
}
enqueue(prompt, priority = 'normal') {
this[priority].push(prompt);
}
dequeue() {
return this.high.shift() || this.normal.shift();
}
}
```
## Development Workflow
### Local Development
```bash
# Mount source code for live reload
docker run -v ./src:/app/src \
mcp-comfyui-mcp-server npm run dev
```
### Testing
```bash
# Unit tests
npm test
# Integration tests
docker exec mcp-comfyui-flux-mcp-server-1 \
npm run test:integration
# Load testing
docker exec mcp-comfyui-flux-mcp-server-1 \
npm run test:load
```
### Debugging
```javascript
// Enable debug logging
DEBUG=mcp:* node dist/index.js
// Inspect WebSocket traffic
ws.on('message', (data) => {
console.log('WS:', JSON.parse(data));
});
```
## Future Enhancements
### Planned Features
1. **Model Management**
- Dynamic model loading
- Model versioning
- A/B testing frameworks
2. **Performance**
- Response caching
- Predictive pre-warming
- Adaptive batch sizing
3. **Scalability**
- Kubernetes deployment
- Auto-scaling policies
- Multi-GPU scheduling
4. **Monitoring**
- Prometheus metrics
- Grafana dashboards
- Distributed tracing
## Resources
- [ComfyUI Documentation](https://github.com/comfyanonymous/ComfyUI)
- [MCP Specification](https://modelcontextprotocol.io)
- [FLUX Model Paper](https://arxiv.org/abs/flux)
- [Docker Best Practices](https://docs.docker.com/develop/dev-best-practices/)