Skip to main content
Glama
DEPLOYMENT.md35 kB
# Production Deployment Guide Complete guide for deploying Aerospace MCP in production environments with security, scalability, and reliability best practices. ## 📋 Table of Contents - [Deployment Overview](#deployment-overview) - [Environment Configuration](#environment-configuration) - [Deployment Strategies](#deployment-strategies) - [Security Best Practices](#security-best-practices) - [Monitoring & Observability](#monitoring--observability) - [Scaling & Performance](#scaling--performance) - [Backup & Disaster Recovery](#backup--disaster-recovery) - [Maintenance & Updates](#maintenance--updates) - [Troubleshooting](#troubleshooting) ## 🏗️ Deployment Overview ### Architecture Options ```mermaid graph TB subgraph "Option 1: Single Server" LB1[Load Balancer] Server1[Aerospace MCP Server] DB1[(SQLite/File)] LB1 --> Server1 Server1 --> DB1 end subgraph "Option 2: Multi-Server" LB2[Load Balancer] Server2A[Server A] Server2B[Server B] Server2C[Server N] Cache[Redis Cache] DB2[(PostgreSQL)] LB2 --> Server2A LB2 --> Server2B LB2 --> Server2C Server2A --> Cache Server2B --> Cache Server2C --> Cache Server2A --> DB2 Server2B --> DB2 Server2C --> DB2 end subgraph "Option 3: Containerized" K8S[Kubernetes Cluster] Pods[Multiple Pods] PersistentStorage[Persistent Storage] K8S --> Pods Pods --> PersistentStorage end ``` ### Deployment Checklist **Pre-deployment:** - [ ] Environment variables configured - [ ] Dependencies installed and tested - [ ] SSL certificates obtained - [ ] Monitoring setup completed - [ ] Backup strategy implemented - [ ] Security review completed - [ ] Load testing performed **Deployment:** - [ ] Application deployed - [ ] Health checks passing - [ ] HTTPS configured - [ ] Rate limiting enabled - [ ] Authentication implemented - [ ] Logging configured - [ ] Monitoring active **Post-deployment:** - [ ] Smoke tests passed - [ ] Performance metrics baseline - [ ] Alerts configured - [ ] Documentation updated - [ ] Team notified ## ⚙️ Environment Configuration ### Environment Variables Create a comprehensive `.env` file for production: ```bash # Application Configuration APP_NAME=aerospace-mcp APP_VERSION=0.1.0 APP_ENV=production DEBUG=false # Server Configuration HOST=0.0.0.0 PORT=8000 WORKERS=4 WORKER_CLASS=uvicorn.workers.UvicornWorker MAX_REQUESTS=1000 MAX_REQUESTS_JITTER=50 # Security SECRET_KEY=your-super-secret-key-change-this API_KEY_HEADER=X-API-Key ALLOWED_HOSTS=yourdomain.com,api.yourdomain.com CORS_ORIGINS=https://yourdomain.com,https://app.yourdomain.com # Database (if using external database) DATABASE_URL=postgresql://user:password@localhost:5432/aerospace_mcp REDIS_URL=redis://localhost:6379/0 # OpenAP Configuration OPENAP_CACHE_SIZE=1000 OPENAP_TIMEOUT=30 # Rate Limiting RATE_LIMIT_ENABLED=true RATE_LIMIT_PER_MINUTE=60 RATE_LIMIT_BURST=10 # Monitoring LOG_LEVEL=info LOG_FORMAT=json METRICS_ENABLED=true METRICS_PORT=9090 # External Services WEATHER_API_KEY=your-weather-api-key NOTAM_API_URL=https://api.example.com/notam # TLS Configuration SSL_CERT_FILE=/etc/ssl/certs/aerospace-mcp.crt SSL_KEY_FILE=/etc/ssl/private/aerospace-mcp.key ``` ### Configuration Management **Using python-decouple:** ```python # config.py from decouple import config from typing import List class Settings: # Application APP_NAME: str = config("APP_NAME", default="aerospace-mcp") APP_ENV: str = config("APP_ENV", default="development") DEBUG: bool = config("DEBUG", default=False, cast=bool) # Server HOST: str = config("HOST", default="127.0.0.1") PORT: int = config("PORT", default=8000, cast=int) WORKERS: int = config("WORKERS", default=1, cast=int) # Security SECRET_KEY: str = config("SECRET_KEY") ALLOWED_HOSTS: List[str] = config("ALLOWED_HOSTS", default="", cast=lambda v: [i.strip() for i in v.split(',')] if v else []) # Database DATABASE_URL: str = config("DATABASE_URL", default="sqlite:///./aerospace.db") REDIS_URL: str = config("REDIS_URL", default="redis://localhost:6379/0") # Rate Limiting RATE_LIMIT_ENABLED: bool = config("RATE_LIMIT_ENABLED", default=True, cast=bool) RATE_LIMIT_PER_MINUTE: int = config("RATE_LIMIT_PER_MINUTE", default=60, cast=int) settings = Settings() ``` ## 🚀 Deployment Strategies ### 1. Docker Deployment **Dockerfile (Production):** ```dockerfile # Multi-stage build for smaller production image FROM python:3.11-slim as builder # Install system dependencies RUN apt-get update && apt-get install -y \ build-essential \ curl \ && rm -rf /var/lib/apt/lists/* # Install UV for fast dependency management RUN pip install uv # Copy dependency files WORKDIR /app COPY pyproject.toml uv.lock* ./ # Install dependencies RUN uv export --no-dev > requirements.txt RUN pip install --no-cache-dir -r requirements.txt # Production stage FROM python:3.11-slim # Create non-root user RUN groupadd -r appuser && useradd -r -g appuser appuser # Install runtime dependencies RUN apt-get update && apt-get install -y \ curl \ && rm -rf /var/lib/apt/lists/* # Copy dependencies from builder COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages COPY --from=builder /usr/local/bin /usr/local/bin # Set working directory WORKDIR /app # Copy application code COPY main.py ./ COPY aerospace_mcp/ ./aerospace_mcp/ COPY app/ ./app/ # Change ownership to non-root user RUN chown -R appuser:appuser /app USER appuser # Health check HEALTHCHECK --interval=30s --timeout=30s --start-period=5s --retries=3 \ CMD curl -f http://localhost:8000/health || exit 1 # Expose port EXPOSE 8000 # Use Gunicorn for production CMD ["gunicorn", "main:app", "-w", "4", "-k", "uvicorn.workers.UvicornWorker", "-b", "0.0.0.0:8000", "--preload", "--log-level", "info"] ``` **docker-compose.yml (Production):** ```yaml version: '3.8' services: aerospace-mcp: build: . ports: - "8000:8000" environment: - APP_ENV=production - DATABASE_URL=postgresql://aerospace:${DB_PASSWORD}@db:5432/aerospace_mcp - REDIS_URL=redis://redis:6379/0 - SECRET_KEY=${SECRET_KEY} depends_on: - db - redis restart: unless-stopped volumes: - ./logs:/app/logs healthcheck: test: ["CMD", "curl", "-f", "http://localhost:8000/health"] interval: 30s timeout: 10s retries: 3 start_period: 40s deploy: resources: limits: memory: 1G cpus: '0.5' reservations: memory: 512M nginx: image: nginx:alpine ports: - "80:80" - "443:443" volumes: - ./nginx.conf:/etc/nginx/nginx.conf:ro - ./ssl:/etc/ssl:ro - ./logs/nginx:/var/log/nginx depends_on: - aerospace-mcp restart: unless-stopped db: image: postgres:15-alpine environment: - POSTGRES_DB=aerospace_mcp - POSTGRES_USER=aerospace - POSTGRES_PASSWORD=${DB_PASSWORD} volumes: - postgres_data:/var/lib/postgresql/data - ./init.sql:/docker-entrypoint-initdb.d/init.sql:ro restart: unless-stopped healthcheck: test: ["CMD-SHELL", "pg_isready -U aerospace"] interval: 30s timeout: 10s retries: 5 redis: image: redis:7-alpine command: redis-server --appendonly yes volumes: - redis_data:/data restart: unless-stopped healthcheck: test: ["CMD", "redis-cli", "ping"] interval: 30s timeout: 10s retries: 3 prometheus: image: prom/prometheus:latest ports: - "9090:9090" volumes: - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro - prometheus_data:/prometheus command: - '--config.file=/etc/prometheus/prometheus.yml' - '--storage.tsdb.path=/prometheus' - '--web.console.libraries=/usr/share/prometheus/console_libraries' - '--web.console.templates=/usr/share/prometheus/consoles' restart: unless-stopped volumes: postgres_data: redis_data: prometheus_data: ``` ### 2. Kubernetes Deployment **k8s-deployment.yaml:** ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: aerospace-mcp labels: app: aerospace-mcp spec: replicas: 3 selector: matchLabels: app: aerospace-mcp template: metadata: labels: app: aerospace-mcp spec: containers: - name: aerospace-mcp image: aerospace-mcp:latest ports: - containerPort: 8000 env: - name: APP_ENV value: "production" - name: DATABASE_URL valueFrom: secretKeyRef: name: aerospace-secrets key: database-url - name: SECRET_KEY valueFrom: secretKeyRef: name: aerospace-secrets key: secret-key resources: requests: memory: "512Mi" cpu: "250m" limits: memory: "1Gi" cpu: "500m" livenessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 30 periodSeconds: 30 readinessProbe: httpGet: path: /health port: 8000 initialDelaySeconds: 5 periodSeconds: 10 volumeMounts: - name: config-volume mountPath: /app/config readOnly: true volumes: - name: config-volume configMap: name: aerospace-config --- apiVersion: v1 kind: Service metadata: name: aerospace-mcp-service spec: selector: app: aerospace-mcp ports: - protocol: TCP port: 80 targetPort: 8000 type: LoadBalancer --- apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: aerospace-mcp-ingress annotations: kubernetes.io/ingress.class: nginx cert-manager.io/cluster-issuer: letsencrypt-prod nginx.ingress.kubernetes.io/rate-limit: "100" spec: tls: - hosts: - api.aerospace-mcp.com secretName: aerospace-tls rules: - host: api.aerospace-mcp.com http: paths: - path: / pathType: Prefix backend: service: name: aerospace-mcp-service port: number: 80 ``` ### 3. Traditional Server Deployment **systemd service file (/etc/systemd/system/aerospace-mcp.service):** ```ini [Unit] Description=Aerospace MCP API Server After=network.target Requires=network.target [Service] Type=exec User=aerospace Group=aerospace WorkingDirectory=/opt/aerospace-mcp Environment=PATH=/opt/aerospace-mcp/.venv/bin EnvironmentFile=/opt/aerospace-mcp/.env ExecStart=/opt/aerospace-mcp/.venv/bin/gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000 ExecReload=/bin/kill -s HUP $MAINPID Restart=always RestartSec=3 StandardOutput=journal StandardError=journal SyslogIdentifier=aerospace-mcp [Install] WantedBy=multi-user.target ``` **Nginx configuration (/etc/nginx/sites-available/aerospace-mcp):** ```nginx upstream aerospace_backend { server 127.0.0.1:8000; keepalive 32; } # Rate limiting limit_req_zone $binary_remote_addr zone=api:10m rate=10r/s; limit_req_zone $binary_remote_addr zone=plan:10m rate=2r/s; server { listen 80; server_name api.aerospace-mcp.com; return 301 https://$server_name$request_uri; } server { listen 443 ssl http2; server_name api.aerospace-mcp.com; # SSL Configuration ssl_certificate /etc/ssl/certs/aerospace-mcp.crt; ssl_certificate_key /etc/ssl/private/aerospace-mcp.key; ssl_protocols TLSv1.2 TLSv1.3; ssl_ciphers ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384; ssl_prefer_server_ciphers off; ssl_session_cache shared:SSL:10m; ssl_session_timeout 24h; # Security Headers add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always; add_header X-Content-Type-Options nosniff always; add_header X-Frame-Options DENY always; add_header X-XSS-Protection "1; mode=block" always; add_header Referrer-Policy "strict-origin-when-cross-origin" always; # Gzip compression gzip on; gzip_vary on; gzip_min_length 1024; gzip_types text/plain text/css text/xml text/javascript application/json application/javascript application/xml+rss application/atom+xml; # Rate limiting location /health { limit_req zone=api burst=20 nodelay; proxy_pass http://aerospace_backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; } location /plan { limit_req zone=plan burst=5 nodelay; proxy_pass http://aerospace_backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_connect_timeout 60s; proxy_read_timeout 60s; } location / { limit_req zone=api burst=10 nodelay; proxy_pass http://aerospace_backend; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-Forwarded-Proto $scheme; proxy_connect_timeout 30s; proxy_read_timeout 30s; } # Logging access_log /var/log/nginx/aerospace-mcp.access.log; error_log /var/log/nginx/aerospace-mcp.error.log; } ``` ## 🔒 Security Best Practices ### Authentication & Authorization **API Key Authentication:** ```python # security.py from fastapi import HTTPException, Security from fastapi.security import HTTPBearer, HTTPAuthorizationCredentials from typing import Optional import hashlib import hmac import time security = HTTPBearer() class APIKeyAuth: def __init__(self, valid_keys: dict): self.valid_keys = valid_keys # {key_id: key_secret} def verify_api_key(self, credentials: HTTPAuthorizationCredentials = Security(security)): """Verify API key from Authorization header.""" try: # Expected format: "Bearer key_id:signature:timestamp" token = credentials.credentials parts = token.split(':') if len(parts) != 3: raise HTTPException(status_code=401, detail="Invalid token format") key_id, signature, timestamp = parts # Check if key exists if key_id not in self.valid_keys: raise HTTPException(status_code=401, detail="Invalid API key") # Check timestamp (prevent replay attacks) current_time = int(time.time()) request_time = int(timestamp) if abs(current_time - request_time) > 300: # 5 minutes raise HTTPException(status_code=401, detail="Token expired") # Verify signature secret = self.valid_keys[key_id] expected_sig = hmac.new( secret.encode(), f"{key_id}:{timestamp}".encode(), hashlib.sha256 ).hexdigest() if not hmac.compare_digest(signature, expected_sig): raise HTTPException(status_code=401, detail="Invalid signature") return {"key_id": key_id, "timestamp": timestamp} except Exception as e: raise HTTPException(status_code=401, detail="Authentication failed") # Usage in main.py from fastapi import Depends from config import settings api_auth = APIKeyAuth(settings.API_KEYS) @app.post("/plan", dependencies=[Depends(api_auth.verify_api_key)]) def plan_flight(req: PlanRequest): # Protected endpoint pass ``` **JWT Authentication:** ```python # jwt_auth.py from fastapi import HTTPException, Depends from fastapi.security import HTTPBearer from jose import JWTError, jwt from datetime import datetime, timedelta from typing import Optional security = HTTPBearer() SECRET_KEY = "your-secret-key" ALGORITHM = "HS256" ACCESS_TOKEN_EXPIRE_MINUTES = 30 def create_access_token(data: dict, expires_delta: Optional[timedelta] = None): """Create JWT access token.""" to_encode = data.copy() if expires_delta: expire = datetime.utcnow() + expires_delta else: expire = datetime.utcnow() + timedelta(minutes=15) to_encode.update({"exp": expire}) encoded_jwt = jwt.encode(to_encode, SECRET_KEY, algorithm=ALGORITHM) return encoded_jwt def verify_token(token: str = Depends(security)): """Verify JWT token.""" try: payload = jwt.decode(token.credentials, SECRET_KEY, algorithms=[ALGORITHM]) user_id: str = payload.get("sub") if user_id is None: raise HTTPException(status_code=401, detail="Invalid token") return {"user_id": user_id} except JWTError: raise HTTPException(status_code=401, detail="Invalid token") ``` ### Input Validation & Sanitization ```python # Enhanced request models with strict validation from pydantic import BaseModel, Field, validator import re class SecurePlanRequest(BaseModel): depart_city: str = Field(..., min_length=2, max_length=100, regex=r'^[a-zA-Z\s\-\.]+$') arrive_city: str = Field(..., min_length=2, max_length=100, regex=r'^[a-zA-Z\s\-\.]+$') depart_country: Optional[str] = Field(None, regex=r'^[A-Z]{2}$') arrive_country: Optional[str] = Field(None, regex=r'^[A-Z]{2}$') ac_type: str = Field(..., regex=r'^[A-Z0-9]{3,4}$') # ICAO format cruise_alt_ft: int = Field(35000, ge=8000, le=45000) mass_kg: Optional[float] = Field(None, gt=1000, lt=1000000) # Reasonable aircraft mass range route_step_km: float = Field(25.0, gt=1.0, le=1000.0) @validator('depart_city', 'arrive_city') def validate_city_names(cls, v): # Additional sanitization v = v.strip() if len(v) < 2: raise ValueError('City name too short') # Remove any potential injection attempts if any(char in v for char in ['<', '>', '"', "'"]): raise ValueError('Invalid characters in city name') return v ``` ### Rate Limiting ```python # rate_limiter.py from slowapi import Limiter, _rate_limit_exceeded_handler from slowapi.util import get_remote_address from slowapi.errors import RateLimitExceeded from fastapi import Request import redis # Redis-backed rate limiter for distributed deployments redis_client = redis.Redis(host='localhost', port=6379, decode_responses=True) def get_client_id(request: Request): """Get client identifier for rate limiting.""" # Try API key first, then IP auth_header = request.headers.get("authorization") if auth_header: # Extract key ID from token try: token = auth_header.replace("Bearer ", "") key_id = token.split(':')[0] return f"api_key:{key_id}" except: pass # Fall back to IP address forwarded_for = request.headers.get("X-Forwarded-For") if forwarded_for: return f"ip:{forwarded_for.split(',')[0].strip()}" return f"ip:{request.client.host}" limiter = Limiter(key_func=get_client_id, storage_uri="redis://localhost:6379") # Different limits for different endpoints @app.get("/health") @limiter.limit("100/minute") def health_check(request: Request): pass @app.post("/plan") @limiter.limit("10/minute") # More restrictive for compute-intensive operations def plan_flight(request: Request, req: PlanRequest): pass ``` ### SSL/TLS Configuration **Let's Encrypt with Certbot:** ```bash # Install certbot sudo apt-get install certbot python3-certbot-nginx # Obtain certificate sudo certbot --nginx -d api.aerospace-mcp.com # Test automatic renewal sudo certbot renew --dry-run # Set up automatic renewal cron job echo "0 12 * * * /usr/bin/certbot renew --quiet" | sudo crontab - ``` ## 📊 Monitoring & Observability ### Structured Logging ```python # logging_config.py import logging import sys from pythonjsonlogger import jsonlogger from config import settings def setup_logging(): """Configure structured JSON logging.""" # Create custom formatter logHandler = logging.StreamHandler(sys.stdout) formatter = jsonlogger.JsonFormatter( fmt='%(asctime)s %(name)s %(levelname)s %(message)s', datefmt='%Y-%m-%d %H:%M:%S' ) logHandler.setFormatter(formatter) # Configure root logger logger = logging.getLogger() logger.addHandler(logHandler) logger.setLevel(getattr(logging, settings.LOG_LEVEL.upper())) return logger # Usage in main.py import logging from contextlib import asynccontextmanager logger = logging.getLogger(__name__) @asynccontextmanager async def lifespan(app: FastAPI): # Startup logger.info("Starting Aerospace MCP server", extra={ "event": "startup", "version": settings.APP_VERSION, "environment": settings.APP_ENV }) yield # Shutdown logger.info("Shutting down Aerospace MCP server", extra={ "event": "shutdown" }) app = FastAPI(lifespan=lifespan) @app.middleware("http") async def log_requests(request: Request, call_next): """Log all HTTP requests.""" start_time = time.time() response = await call_next(request) process_time = time.time() - start_time logger.info("HTTP request processed", extra={ "method": request.method, "url": str(request.url), "status_code": response.status_code, "process_time": process_time, "user_agent": request.headers.get("user-agent"), "remote_addr": request.client.host }) return response ``` ### Metrics Collection ```python # metrics.py from prometheus_client import Counter, Histogram, Gauge, generate_latest from fastapi import Response import time # Define metrics REQUEST_COUNT = Counter('http_requests_total', 'Total HTTP requests', ['method', 'endpoint', 'status']) REQUEST_DURATION = Histogram('http_request_duration_seconds', 'HTTP request duration', ['method', 'endpoint']) FLIGHT_PLANS_TOTAL = Counter('flight_plans_total', 'Total flight plans generated', ['aircraft_type', 'status']) OPENAP_CALCULATIONS = Histogram('openap_calculation_seconds', 'OpenAP calculation time') ACTIVE_CONNECTIONS = Gauge('active_connections', 'Active connections') @app.middleware("http") async def metrics_middleware(request: Request, call_next): """Collect request metrics.""" start_time = time.time() response = await call_next(request) # Record metrics duration = time.time() - start_time REQUEST_COUNT.labels( method=request.method, endpoint=request.url.path, status=response.status_code ).inc() REQUEST_DURATION.labels( method=request.method, endpoint=request.url.path ).observe(duration) return response @app.get("/metrics") def get_metrics(): """Prometheus metrics endpoint.""" return Response(generate_latest(), media_type="text/plain") ``` ### Health Checks ```python # health.py from fastapi import HTTPException import psutil import time from typing import Dict, Any async def detailed_health_check() -> Dict[str, Any]: """Comprehensive health check.""" start_time = time.time() health_data = { "status": "ok", "timestamp": int(time.time()), "version": settings.APP_VERSION, "environment": settings.APP_ENV, "checks": {} } # Check OpenAP availability try: import openap health_data["checks"]["openap"] = {"status": "ok", "version": openap.__version__} except ImportError: health_data["checks"]["openap"] = {"status": "unavailable", "error": "Module not installed"} # Check database connectivity try: # Test database connection # db_check = await test_database_connection() health_data["checks"]["database"] = {"status": "ok", "response_time": 0.001} except Exception as e: health_data["checks"]["database"] = {"status": "error", "error": str(e)} health_data["status"] = "degraded" # Check Redis connectivity try: # Test Redis connection # redis_check = await test_redis_connection() health_data["checks"]["redis"] = {"status": "ok", "response_time": 0.001} except Exception as e: health_data["checks"]["redis"] = {"status": "error", "error": str(e)} health_data["status"] = "degraded" # System metrics health_data["system"] = { "cpu_percent": psutil.cpu_percent(), "memory_percent": psutil.virtual_memory().percent, "disk_percent": psutil.disk_usage('/').percent, "load_average": psutil.getloadavg() if hasattr(psutil, 'getloadavg') else None } # Response time health_data["response_time"] = time.time() - start_time # Overall status determination if any(check.get("status") == "error" for check in health_data["checks"].values()): health_data["status"] = "error" raise HTTPException(status_code=503, detail=health_data) return health_data @app.get("/health/detailed") async def detailed_health(): """Detailed health check endpoint.""" return await detailed_health_check() ``` ### Alerting Configuration **Prometheus alerts (alerts.yml):** ```yaml groups: - name: aerospace-mcp rules: - alert: HighErrorRate expr: rate(http_requests_total{status=~"5.."}[5m]) > 0.1 for: 5m labels: severity: warning annotations: summary: "High error rate detected" description: "Error rate is {{ $value }} per second" - alert: HighResponseTime expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2 for: 5m labels: severity: warning annotations: summary: "High response time" description: "95th percentile response time is {{ $value }}s" - alert: ServiceDown expr: up{job="aerospace-mcp"} == 0 for: 1m labels: severity: critical annotations: summary: "Service is down" description: "Aerospace MCP service is not responding" - alert: HighMemoryUsage expr: (process_resident_memory_bytes / 1024 / 1024) > 1000 for: 10m labels: severity: warning annotations: summary: "High memory usage" description: "Memory usage is {{ $value }}MB" ``` ## ⚡ Scaling & Performance ### Horizontal Scaling **Load Balancer Configuration (HAProxy):** ```haproxy global daemon maxconn 4096 defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000ms option httplog frontend aerospace_frontend bind *:80 bind *:443 ssl crt /etc/ssl/certs/aerospace-mcp.pem redirect scheme https if !{ ssl_fc } default_backend aerospace_backend backend aerospace_backend balance roundrobin option httpchk GET /health http-check expect status 200 server app1 10.0.1.10:8000 check inter 30s rise 2 fall 3 server app2 10.0.1.11:8000 check inter 30s rise 2 fall 3 server app3 10.0.1.12:8000 check inter 30s rise 2 fall 3 ``` ### Caching Strategy ```python # caching.py import redis import json import hashlib from typing import Any, Optional from functools import wraps redis_client = redis.Redis(host='localhost', port=6379, decode_responses=True) def cache_response(ttl: int = 3600): """Decorator to cache API responses.""" def decorator(func): @wraps(func) async def wrapper(*args, **kwargs): # Generate cache key from function name and arguments cache_key = generate_cache_key(func.__name__, args, kwargs) # Try to get from cache cached_result = redis_client.get(cache_key) if cached_result: return json.loads(cached_result) # Execute function result = await func(*args, **kwargs) # Cache the result redis_client.setex(cache_key, ttl, json.dumps(result, default=str)) return result return wrapper return decorator def generate_cache_key(func_name: str, args: tuple, kwargs: dict) -> str: """Generate a cache key from function name and arguments.""" key_data = { 'func': func_name, 'args': args, 'kwargs': kwargs } key_string = json.dumps(key_data, sort_keys=True) return f"cache:{hashlib.md5(key_string.encode()).hexdigest()}" # Usage @cache_response(ttl=1800) # Cache for 30 minutes async def cached_airport_search(city: str, country: Optional[str] = None): """Cached airport search.""" return _find_city_airports(city, country) ``` ### Database Optimization ```sql -- PostgreSQL schema for external database CREATE TABLE airports ( id SERIAL PRIMARY KEY, iata VARCHAR(3) UNIQUE NOT NULL, icao VARCHAR(4) UNIQUE, name VARCHAR(255) NOT NULL, city VARCHAR(100) NOT NULL, country VARCHAR(2) NOT NULL, latitude DECIMAL(10, 8) NOT NULL, longitude DECIMAL(11, 8) NOT NULL, timezone VARCHAR(50), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Indexes for fast lookups CREATE INDEX idx_airports_city ON airports (city); CREATE INDEX idx_airports_country ON airports (country); CREATE INDEX idx_airports_city_country ON airports (city, country); CREATE INDEX idx_airports_coordinates ON airports (latitude, longitude); -- Full-text search index CREATE INDEX idx_airports_search ON airports USING gin(to_tsvector('english', name || ' ' || city)); ``` ## 💾 Backup & Disaster Recovery ### Backup Strategy ```bash #!/bin/bash # backup.sh - Daily backup script DATE=$(date +%Y%m%d_%H%M%S) BACKUP_DIR="/backups/aerospace-mcp" LOG_FILE="$BACKUP_DIR/backup_$DATE.log" mkdir -p $BACKUP_DIR echo "Starting backup at $(date)" >> $LOG_FILE # Backup database if [ "$DATABASE_URL" != "" ]; then pg_dump $DATABASE_URL | gzip > $BACKUP_DIR/database_$DATE.sql.gz echo "Database backup completed" >> $LOG_FILE fi # Backup configuration tar -czf $BACKUP_DIR/config_$DATE.tar.gz /opt/aerospace-mcp/.env /opt/aerospace-mcp/config/ echo "Configuration backup completed" >> $LOG_FILE # Backup logs tar -czf $BACKUP_DIR/logs_$DATE.tar.gz /opt/aerospace-mcp/logs/ echo "Logs backup completed" >> $LOG_FILE # Upload to S3 (optional) if command -v aws &> /dev/null; then aws s3 sync $BACKUP_DIR s3://aerospace-mcp-backups/$(date +%Y/%m/%d)/ echo "S3 upload completed" >> $LOG_FILE fi # Clean up old backups (keep 30 days) find $BACKUP_DIR -name "*.gz" -mtime +30 -delete find $BACKUP_DIR -name "*.log" -mtime +30 -delete echo "Backup completed at $(date)" >> $LOG_FILE ``` ### Disaster Recovery Plan **Recovery Runbook:** 1. **Assessment** - Determine scope of failure - Check backup integrity - Notify stakeholders 2. **Infrastructure Recovery** ```bash # Restore from backup cd /opt/aerospace-mcp # Restore database gunzip -c /backups/database_latest.sql.gz | psql $DATABASE_URL # Restore configuration tar -xzf /backups/config_latest.tar.gz -C / # Restart services sudo systemctl restart aerospace-mcp sudo systemctl restart nginx ``` 3. **Verification** - Test health endpoints - Verify core functionality - Monitor error rates - Update status page ## 🔧 Maintenance & Updates ### Zero-Downtime Deployment ```bash #!/bin/bash # deploy.sh - Blue-green deployment script NEW_VERSION=$1 CURRENT_PORT=8000 NEW_PORT=8001 if [ -z "$NEW_VERSION" ]; then echo "Usage: $0 <version>" exit 1 fi echo "Deploying version $NEW_VERSION" # Build new version docker build -t aerospace-mcp:$NEW_VERSION . # Start new container on different port docker run -d -p $NEW_PORT:8000 --name aerospace-mcp-new aerospace-mcp:$NEW_VERSION # Wait for health check sleep 30 while ! curl -f http://localhost:$NEW_PORT/health; do echo "Waiting for new version to start..." sleep 5 done # Update load balancer to point to new version # (This depends on your load balancer configuration) update_load_balancer $NEW_PORT # Wait for connections to drain sleep 60 # Stop old container docker stop aerospace-mcp-current docker rm aerospace-mcp-current # Rename containers docker rename aerospace-mcp-new aerospace-mcp-current echo "Deployment completed successfully" ``` ### Update Procedures **Regular Updates:** ```bash # Weekly maintenance window sudo systemctl stop aerospace-mcp # Update application cd /opt/aerospace-mcp git pull origin main source .venv/bin/activate pip install --upgrade -r requirements.txt # Update system packages sudo apt update && sudo apt upgrade -y # Restart services sudo systemctl start aerospace-mcp sudo systemctl restart nginx # Verify deployment curl -f http://localhost/health ``` ## 🐛 Troubleshooting ### Common Production Issues #### High CPU Usage ```bash # Investigate high CPU top -p $(pgrep -f aerospace-mcp) htop # Check for infinite loops or inefficient queries strace -p $(pgrep -f aerospace-mcp) # Analyze with profiling py-spy top -p $(pgrep -f aerospace-mcp) ``` #### Memory Leaks ```bash # Monitor memory usage ps aux | grep aerospace-mcp cat /proc/$(pgrep -f aerospace-mcp)/status | grep VmRSS # Use memory profiler memory_profiler -p $(pgrep -f aerospace-mcp) ``` #### Database Connection Issues ```bash # Check connection pool psql $DATABASE_URL -c "SELECT count(*) FROM pg_stat_activity WHERE datname='aerospace_mcp';" # Monitor slow queries psql $DATABASE_URL -c "SELECT query, mean_time, calls FROM pg_stat_statements ORDER BY mean_time DESC LIMIT 10;" ``` ### Emergency Procedures **Service Recovery:** ```bash # Quick service restart sudo systemctl restart aerospace-mcp # Force kill if unresponsive sudo pkill -f aerospace-mcp sudo systemctl start aerospace-mcp # Rollback to previous version docker run -d -p 8000:8000 aerospace-mcp:previous-stable # Check logs for errors journalctl -u aerospace-mcp -f tail -f /var/log/aerospace-mcp/app.log ``` **Performance Issues:** ```bash # Temporary rate limiting iptables -A INPUT -p tcp --dport 8000 -m limit --limit 10/min -j ACCEPT iptables -A INPUT -p tcp --dport 8000 -j DROP # Scale up quickly (if using containers) docker-compose up --scale aerospace-mcp=3 ``` This deployment guide provides comprehensive coverage for running Aerospace MCP in production environments. Adapt the configurations based on your specific infrastructure requirements and compliance needs.

Latest Blog Posts

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/cheesejaguar/aerospace-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server