# ποΈ MCP Docker Hub - Architecture Overview
## System Architecture
```
βββββββββββββββββββββββββββββββββββββββ
β Internet / External Access β
ββββββββββββββββ¬βββββββββββββββββββββββ
β
ββββββββββββββββΌβββββββββββββββββββββββ
β NGINX Reverse Proxy β
β - Load Balancing β
β - SSL Termination β
β - Rate Limiting β
β - Gzip Compression β
ββββ¬βββββββββ¬βββββββ¬βββββββββββββββββββ
β β β
ββββββββββββββββ β ββββββββββββββββ
β β β
βββββββββββΌββββββββββ βββββββββββΌββββββββββ ββββββββββΌβββββββββ
β MCP SERVER β β GRAFANA β β ANALYTICS β
β Port: 3001 β β Port: 3000 β β Port: 3002 β
β β β β β β
β βββββββββββββββββ β β βββββββββββββββββ β β βββββββββββββββ β
β β HTTP Server β β β β Dashboards β β β β Metrics API β β
β β MCP Protocol β β β β Visualizationsβ β β β Tracking β β
β β 13 Tools β β β β Alerts β β β β Real-time β β
β βββββββββββββββββ β β βββββββββββββββββ β β βββββββββββββββ β
βββββββββββ¬ββββββββββ βββββββββββ¬ββββββββββ ββββββββββ¬βββββββββ
β β β
β β β
βββββββββββΌβββββββββββββββββββββββΌββββββββββββββββββββββΌββββββββββ
β DATA LAYER β
βββββββββββ¬βββββββββββββββββββββ¬ββββββββββββββββββββββ¬ββββββββββββ
β β β
βββββββββββΌββββββββββ βββββββββΌββββββββ βββββββββββΌββββββββββ
β REDIS β β PROMETHEUS β β PERSISTENT β
β Port: 6379 β β Port: 9090 β β VOLUMES β
β β β β β β
β βββββββββββββββββ β β βββββββββββββ β β βββββββββββββββββ β
β β Cache β β β β Metrics β β β β redis-data β β
β β Session Store β β β β Scraping β β β β prometheus- β β
β β Real-time β β β β Alerting β β β β data β β
β β Analytics β β β β Storage β β β β grafana-data β β
β βββββββββββββββββ β β βββββββββββββ β β β mcp-data β β
βββββββββββββββββββββ βββββββββ¬ββββββββ β βββββββββββββββββ β
β βββββββββββββββββββββ
ββββββββββββββββΌβββββββββββββββ
β MONITORING & EXPORTERS β
βββββββ¬ββββββββββββ¬βββββββββββββ
β β
βββββββββββββΌβββ ββββΌβββββββββββββ
β NODE EXPORTERβ β CADVISOR β
β Port: 9100 β β Port: 8080 β
β β β β
β System β β Container β
β Metrics β β Metrics β
ββββββββββββββββ βββββββββββββββββ
```
## Component Breakdown
### 1. **Nginx Reverse Proxy**
- **Purpose**: Entry point for all traffic
- **Responsibilities**:
- Route traffic to appropriate services
- Load balance across MCP server instances
- Rate limiting (100 req/s)
- Gzip compression
- SSL/TLS termination
- Health checks
- **Technology**: Nginx Alpine
- **Port**: 80 (HTTP), 443 (HTTPS)
### 2. **MCP Server**
- **Purpose**: Core application server
- **Responsibilities**:
- Execute MCP tools
- Serve HTTP API
- Provide stdio MCP protocol
- Track analytics
- **Technology**: Node.js 18 Alpine
- **Port**: 3001
- **Scaling**: Horizontal (3 replicas in prod)
### 3. **Analytics Service**
- **Purpose**: Real-time metrics and tracking
- **Responsibilities**:
- Track tool executions
- Monitor errors
- Calculate health scores
- Provide Prometheus metrics
- Store analytics in Redis
- **Technology**: Node.js 18 Alpine
- **Port**: 3002
- **Key Metrics**:
- `mcp_tool_executions_total`
- `mcp_tool_execution_duration_seconds`
- `mcp_active_users`
- `mcp_error_rate`
- `mcp_system_health`
- `mcp_tool_popularity`
### 4. **Grafana**
- **Purpose**: Visualization and dashboards
- **Responsibilities**:
- Display metrics dashboards
- Configure alerts
- Provide drill-down analytics
- User management
- **Technology**: Grafana Latest
- **Port**: 3000
- **Features**:
- Pre-configured Prometheus datasource
- MCP Overview dashboard
- Redis datasource support
### 5. **Prometheus**
- **Purpose**: Metrics storage and querying
- **Responsibilities**:
- Scrape metrics from all services
- Store time-series data
- Evaluate alert rules
- Provide query API
- **Technology**: Prometheus Latest
- **Port**: 9090
- **Configuration**:
- 15s scrape interval
- 30-90 day retention
- Alert rules for critical metrics
### 6. **Redis**
- **Purpose**: Cache and data store
- **Responsibilities**:
- Cache tool execution data
- Store analytics events
- Session management
- Real-time data
- **Technology**: Redis 7 Alpine
- **Port**: 6379
- **Configuration**:
- 256-512MB memory limit
- LRU eviction policy
- AOF persistence
- 30-day TTL on analytics
### 7. **Node Exporter**
- **Purpose**: System metrics collection
- **Responsibilities**:
- Collect host system metrics
- CPU, memory, disk, network stats
- Expose to Prometheus
- **Technology**: Prometheus Node Exporter
- **Port**: 9100
### 8. **cAdvisor**
- **Purpose**: Container metrics
- **Responsibilities**:
- Monitor container performance
- CPU, memory, network per container
- Resource usage tracking
- **Technology**: Google cAdvisor
- **Port**: 8080
## Data Flow
### Tool Execution Flow
```
User Request
β
Nginx (Load Balance)
β
MCP Server
β
Tool Execution
β
Analytics Tracking βββ Redis (Store)
β β
Return Result Prometheus (Scrape)
β β
User Response Grafana (Visualize)
```
### Metrics Collection Flow
```
Application Metrics
β
Prometheus Scrape (15s interval)
β
Time-Series Storage (30-90 days)
β
Grafana Query & Display
β
User Dashboard
```
### Health Check Flow
```
Docker Health Check (30s)
β
HTTP Health Endpoint
β
Service Status Check
β
Restart if Failed (3 retries)
```
## Network Architecture
### Network: `mcp-network`
- **Type**: Bridge network
- **Subnet**: 172.20.0.0/16
- **Purpose**: Internal service communication
### Service Discovery
All services use DNS names for communication:
- `mcp-server` β MCP Server instances
- `redis` β Redis cache
- `prometheus` β Prometheus server
- `grafana` β Grafana instance
- `analytics-service` β Analytics service
## Volume Architecture
### Persistent Volumes
```
mcp-data β Application data
βββ /app/data
redis-data β Redis persistence
βββ /data
βββ dump.rdb (snapshots)
βββ appendonly.aof (append-only file)
prometheus-data β Metrics storage
βββ /prometheus
βββ [time-series blocks]
grafana-data β Dashboards & config
βββ /var/lib/grafana
βββ dashboards/
βββ grafana.db
```
## Scaling Strategy
### Horizontal Scaling
```bash
# Scale MCP servers
docker-compose up -d --scale mcp-server=5
# Nginx automatically distributes load
```
### Vertical Scaling
Edit `docker-compose.prod.yml`:
```yaml
resources:
limits:
cpus: '2' # Increase CPU
memory: 2G # Increase memory
```
## Security Layers
1. **Network Isolation**
- Services communicate only within `mcp-network`
- Only Nginx exposed to public
2. **Resource Limits**
- CPU and memory limits per service
- Prevents resource exhaustion
3. **Health Checks**
- Automatic restart on failure
- 3 retries before failure
4. **Rate Limiting**
- Nginx limits: 100 req/s
- Connection limits: 10 concurrent
5. **Non-root Users**
- All services run as non-root
- UID 1001 for applications
## Performance Characteristics
### Latency Targets
- **Tool Execution**: < 1s (p95)
- **Analytics Tracking**: < 100ms
- **Metrics Scraping**: < 5s
- **Dashboard Loading**: < 2s
### Throughput
- **MCP Server**: 100-500 req/s per instance
- **Redis**: 10,000+ ops/s
- **Prometheus**: 1M+ samples/s
- **Analytics**: 1,000+ events/s
### Resource Usage (Typical)
```
Service CPU Memory Disk
βββββββββββββββββββββββββββββββββββββββββ
MCP Server 10-30% 128-256M 100M
Analytics 5-15% 64-128M 50M
Grafana 5-10% 128-256M 200M
Prometheus 10-20% 256-512M 1-10G
Redis 5-15% 128-256M 100M-1G
Nginx 2-5% 32-64M 10M
Node Exporter 1-2% 16-32M 5M
cAdvisor 2-5% 64-128M 10M
```
## Monitoring & Observability
### Metrics Collection
- **Application**: Custom Prometheus metrics
- **System**: Node Exporter metrics
- **Container**: cAdvisor metrics
- **Service**: Health check endpoints
### Logging
- **Format**: JSON structured logging
- **Rotation**: 10MB max, 3 files
- **Aggregation**: Docker log driver
- **Retention**: 7 days default
### Tracing
- Request tracking via Nginx logs
- Tool execution timing
- Error tracking with stack traces
## Disaster Recovery
### Backup Strategy
```bash
# Automated daily backups
make backup
# Creates:
- prometheus-YYYYMMDD-HHMMSS.tar.gz
- grafana-YYYYMMDD-HHMMSS.tar.gz
- redis-YYYYMMDD-HHMMSS.tar.gz
```
### Recovery
```bash
# Stop services
make down
# Restore volumes
docker run --rm -v mcp_prometheus-data:/data \
-v $(PWD)/backups:/backup alpine \
tar xzf /backup/prometheus-*.tar.gz -C /data
# Restart
make up
```
### High Availability
- Multiple MCP server instances
- Redis AOF + RDB persistence
- Prometheus local storage
- Grafana persistent database
## Deployment Patterns
### Development
```bash
make dev
# Features: Hot reload, debug logs, local mounts
```
### Staging
```bash
docker-compose up -d
# Features: Production-like, testing environment
```
### Production
```bash
docker-compose -f docker-compose.yml \
-f docker-compose.prod.yml up -d
# Features: Replicas, resource limits, optimized
```
## Future Enhancements
### Planned Features
- [ ] Distributed tracing with Jaeger
- [ ] Log aggregation with ELK stack
- [ ] Message queue with RabbitMQ
- [ ] Database with PostgreSQL
- [ ] Service mesh with Istio
- [ ] Auto-scaling with Kubernetes
- [ ] Multi-region deployment
- [ ] CDN integration
---
**Architecture Version**: 1.0.0
**Last Updated**: 2024
**Maintainer**: MCP Team