CodeGraph CLI MCP Server

codegraph-rust
docs
operations

README.md

README.md•48 kB

--- pdf-engine: lualatex mainfont: "DejaVu Serif" monofont: "DejaVu Sans Mono" header-includes: | \usepackage{fontspec} \directlua{ luaotfload.add_fallback("emojifallback", {"NotoColorEmoji:mode=harf;"}) } \setmainfont[ RawFeature={fallback=emojifallback} ]{DejaVu Serif} --- # CodeGraph Operations Manual **Complete operational procedures, troubleshooting, and maintenance guide** ## Quick Navigation - [Installation Guide](#installation-guide) - System installation procedures - [Configuration Reference](#configuration-reference) - Complete configuration options - [Troubleshooting Guide](#troubleshooting-guide) - Common issues and solutions - [Scaling Strategies](#scaling-strategies) - Performance and capacity planning - [Maintenance Procedures](#maintenance-procedures) - Routine maintenance tasks - [Monitoring & Alerting](#monitoring-and-alerting) - Operational visibility setup ## Installation Guide ### System Requirements #### Minimum Requirements | Component | Specification | |-----------|---------------| | **CPU** | 2 cores @ 2.4GHz | | **RAM** | 4GB | | **Storage** | 20GB SSD | | **Network** | 100Mbps | | **OS** | Ubuntu 20.04+, CentOS 8+, RHEL 8+ | #### Production Requirements | Component | Specification | |-----------|---------------| | **CPU** | 8+ cores @ 3.0GHz (16 threads recommended) | | **RAM** | 16GB (32GB+ for high throughput) | | **Storage** | 100GB+ NVMe SSD with 10,000+ IOPS | | **Network** | 1Gbps+ with low latency | | **OS** | Latest Ubuntu LTS, CentOS Stream, or RHEL | #### Storage Sizing Guide **Database Growth Estimates**: - **Small project** (< 50k LOC): 100MB - 500MB - **Medium project** (50k - 500k LOC): 500MB - 5GB - **Large project** (500k - 5M LOC): 5GB - 50GB - **Enterprise** (5M+ LOC): 50GB+ **Vector Index Sizing**: - **768-dimensional embeddings**: ~3KB per entity - **1M entities**: ~3GB vector storage - **Index overhead**: 20-30% of embedding size ### Binary Installation #### Option 1: Pre-built Binaries **Linux x86_64**: ```bash # Download latest release wget https://github.com/codegraph/embedding-system/releases/latest/download/codegraph-linux-x86_64.tar.gz # Extract and install tar -xzf codegraph-linux-x86_64.tar.gz sudo mv codegraph-api /usr/local/bin/ sudo chmod +x /usr/local/bin/codegraph-api # Verify installation codegraph-api --version ``` **macOS (Apple Silicon)**: ```bash # Download for ARM64 wget https://github.com/codegraph/embedding-system/releases/latest/download/codegraph-macos-arm64.tar.gz # Extract and install tar -xzf codegraph-macos-arm64.tar.gz sudo mv codegraph-api /usr/local/bin/ sudo chmod +x /usr/local/bin/codegraph-api # For Intel Macs, use codegraph-macos-x86_64.tar.gz ``` **Windows**: ```powershell # Download and extract Invoke-WebRequest -Uri "https://github.com/codegraph/embedding-system/releases/latest/download/codegraph-windows-x86_64.zip" -OutFile "codegraph.zip" Expand-Archive -Path "codegraph.zip" -DestinationPath "C:\Program Files\CodeGraph" # Add to PATH $env:PATH += ";C:\Program Files\CodeGraph" ``` #### Option 2: Package Managers **Ubuntu/Debian**: ```bash # Add repository curl -fsSL https://packages.codegraph.dev/gpg | sudo gpg --dearmor -o /usr/share/keyrings/codegraph.gpg echo "deb [signed-by=/usr/share/keyrings/codegraph.gpg] https://packages.codegraph.dev/ubuntu $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/codegraph.list # Install sudo apt update sudo apt install codegraph-api ``` **RHEL/CentOS**: ```bash # Add repository sudo tee /etc/yum.repos.d/codegraph.repo << EOF [codegraph] name=CodeGraph Repository baseurl=https://packages.codegraph.dev/rhel/\$releasever/\$basearch/ gpgcheck=1 gpgkey=https://packages.codegraph.dev/gpg EOF # Install sudo dnf install codegraph-api ``` **macOS with Homebrew**: ```bash # Add tap brew tap codegraph/tap # Install brew install codegraph-api ``` #### Option 3: Docker Installation **Single Container**: ```bash # Pull and run docker run -d \ --name codegraph-api \ -p 8000:8000 \ -v codegraph-data:/app/data \ -e CODEGRAPH_LOG_LEVEL=info \ codegraph/api:latest # Verify curl http://localhost:8000/health ``` **Docker Compose**: ```bash # Create docker-compose.yml (see deployment guide) docker-compose up -d # Check status docker-compose ps docker-compose logs -f codegraph-api ``` ### Source Installation **Prerequisites**: ```bash # Ubuntu/Debian sudo apt update sudo apt install -y build-essential clang cmake pkg-config libssl-dev curl # RHEL/CentOS sudo dnf groupinstall -y "Development Tools" sudo dnf install -y clang cmake pkg-config openssl-devel curl # macOS xcode-select --install brew install cmake pkg-config ``` **Rust Installation**: ```bash # Install Rust curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh source ~/.cargo/env # Verify version (requires 1.75+) rustc --version ``` **Build from Source**: ```bash # Clone repository git clone https://github.com/codegraph/embedding-system.git cd embedding-system # Build release binary cargo build --release --locked # Install globally sudo cp target/release/codegraph-api /usr/local/bin/ sudo chmod +x /usr/local/bin/codegraph-api # Verify installation codegraph-api --version ``` ### Service Installation #### systemd Service (Linux) **Create Service File**: ```ini # /etc/systemd/system/codegraph-api.service [Unit] Description=CodeGraph API Server Documentation=https://docs.codegraph.dev After=network.target Wants=network-online.target [Service] Type=exec User=codegraph Group=codegraph WorkingDirectory=/opt/codegraph ExecStart=/usr/local/bin/codegraph-api --config /etc/codegraph/config.toml ExecReload=/bin/kill -HUP $MAINPID Restart=on-failure RestartSec=5 StartLimitInterval=60 StartLimitBurst=3 # Security settings NoNewPrivileges=yes ProtectSystem=strict ProtectHome=yes ReadWritePaths=/opt/codegraph/data /var/log/codegraph PrivateTmp=yes ProtectKernelTunables=yes ProtectKernelModules=yes ProtectControlGroups=yes # Resource limits LimitNOFILE=65536 LimitNPROC=4096 # Environment Environment=RUST_LOG=info Environment=RUST_BACKTRACE=1 EnvironmentFile=-/etc/codegraph/environment [Install] WantedBy=multi-user.target ``` **Setup User and Directories**: ```bash # Create user sudo useradd -r -s /bin/false -d /opt/codegraph codegraph # Create directories sudo mkdir -p /opt/codegraph/{data,logs} sudo mkdir -p /etc/codegraph sudo mkdir -p /var/log/codegraph # Set ownership sudo chown -R codegraph:codegraph /opt/codegraph sudo chown -R codegraph:codegraph /var/log/codegraph ``` **Enable and Start Service**: ```bash # Reload systemd sudo systemctl daemon-reload # Enable service sudo systemctl enable codegraph-api # Start service sudo systemctl start codegraph-api # Check status sudo systemctl status codegraph-api # View logs sudo journalctl -u codegraph-api -f ``` ## Configuration Reference ### Complete Configuration File ```toml # /etc/codegraph/config.toml - Production Configuration [server] # Network binding host = "0.0.0.0" # Bind address (0.0.0.0 for all interfaces) port = 8000 # HTTP port workers = 8 # Worker threads (num_cpus recommended) max_connections = 1000 # Maximum concurrent connections timeout = "30s" # Request timeout keep_alive = "75s" # TCP keep-alive timeout # CORS settings cors_enabled = true # Enable CORS cors_origins = ["*"] # Allowed origins (* for all, or specific domains) cors_methods = ["GET", "POST", "PUT", "DELETE"] # Allowed HTTP methods cors_headers = ["Content-Type", "Authorization"] # Allowed headers cors_max_age = 3600 # Preflight cache duration (seconds) [database] # RocksDB configuration path = "/opt/codegraph/data/rocks.db" # Database directory cache_size = 2048 # Block cache size (MB) write_buffer_size = 256 # Memtable size (MB) max_write_buffer_number = 6 # Number of memtables max_open_files = 2000 # OS file handle limit enable_statistics = true # Enable performance statistics # Compaction settings max_background_jobs = 8 # Background compaction threads level0_file_num_compaction_trigger = 4 # L0->L1 compaction trigger level0_slowdown_writes_trigger = 20 # Write slowdown trigger level0_stop_writes_trigger = 36 # Write stop trigger target_file_size_base = 67108864 # Target SST file size (64MB) max_bytes_for_level_base = 268435456 # L1 size limit (256MB) # Compression compression_type = "zstd" # Compression algorithm (none, snappy, lz4, zstd) compression_level = 6 # Compression level (1-9) bottommost_compression_type = "zstd" # Bottom level compression # Backup settings backup_enabled = true # Enable automated backups backup_interval = "24h" # Backup frequency backup_retention = "30d" # Backup retention period backup_path = "/opt/codegraph/backups" # Backup directory [vector] # Vector search configuration enabled = true # Enable vector search index_type = "hnsw" # Index type: hnsw, ivf, flat dimension = 768 # Embedding dimension metric = "cosine" # Distance metric: cosine, l2, inner_product # HNSW-specific settings (when index_type = "hnsw") hnsw_m = 16 # Number of connections per node hnsw_ef_construction = 200 # Build-time search parameter hnsw_ef_search = 64 # Query-time search parameter hnsw_max_elements = 10000000 # Maximum number of vectors # IVF-specific settings (when index_type = "ivf") ivf_nlist = 1024 # Number of clusters ivf_nprobe = 32 # Number of clusters to search # Embedding model settings embedding_model = "sentence-transformers" # Model type model_cache_dir = "/opt/codegraph/models" # Model cache directory embedding_batch_size = 32 # Batch size for embedding generation [parsing] # Language support languages = [ "rust", "python", "javascript", "typescript", "go", "java", "cpp", "c", "csharp", "kotlin", "php", "ruby", "swift", "scala", "dart" ] # File processing limits max_file_size = "50MB" # Maximum file size to process max_files_per_project = 100000 # Maximum files per project max_line_length = 10000 # Maximum line length encoding_detection = true # Auto-detect file encoding # Global ignore patterns ignore_patterns = [ # Build and output directories "target/", "build/", "dist/", "out/", ".output/", "bin/", "obj/", # Dependency directories "node_modules/", "vendor/", ".cargo/", "venv/", "env/", # Cache and temporary files "__pycache__/", ".cache/", ".tmp/", "tmp/", "temp/", "*.pyc", "*.pyo", "*.pyd", "*.so", "*.dll", "*.dylib", # Version control ".git/", ".svn/", ".hg/", ".bzr/", # IDE and editor files ".vscode/", ".idea/", ".vs/", "*.swp", "*.swo", "*~", # OS files ".DS_Store", "Thumbs.db", "desktop.ini", # Log files "*.log", "logs/", "log/" ] # Language-specific settings [parsing.rust] parse_tests = true # Parse test functions parse_benchmarks = true # Parse benchmark functions parse_examples = true # Parse example code extract_docs = true # Extract documentation comments [parsing.python] parse_notebooks = true # Parse Jupyter notebooks (.ipynb) parse_stubs = true # Parse type stub files (.pyi) extract_docstrings = true # Extract function/class docstrings [parsing.javascript] parse_jsx = true # Parse JSX syntax parse_vue = true # Parse Vue.js single file components parse_typescript = true # Parse TypeScript in .js files [mcp] # Model Context Protocol server enabled = true # Enable MCP server endpoint = "/mcp" # MCP endpoint path max_request_size = "10MB" # Maximum request size max_concurrent_requests = 100 # Maximum concurrent MCP requests request_timeout = "60s" # Request timeout rate_limit = 100 # Requests per minute per client # MCP-specific features enable_streaming = true # Enable streaming responses enable_tools = true # Enable MCP tools enable_resources = true # Enable MCP resources enable_prompts = true # Enable MCP prompts [security] # Authentication and authorization api_key_required = true # Require API key for requests api_key_header = "Authorization" # Header name for API key api_key_prefix = "Bearer " # Expected prefix for API key admin_api_key_required = true # Require separate admin API key # JWT configuration (if using JWT tokens) jwt_enabled = false # Enable JWT authentication jwt_secret = "your-256-bit-secret" # JWT signing secret (change this!) jwt_algorithm = "HS256" # Signing algorithm jwt_expiration = "24h" # Token expiration time jwt_refresh_enabled = true # Enable token refresh # Rate limiting rate_limiting = true # Enable rate limiting rate_limit_global = 1000 # Global requests per minute rate_limit_per_client = 100 # Requests per minute per client rate_limit_burst = 50 # Burst allowance rate_limit_window = "1m" # Rate limiting window # IP allowlist/blocklist ip_allowlist = [] # Allowed IP addresses/CIDR blocks ip_blocklist = [] # Blocked IP addresses/CIDR blocks [logging] # Log configuration level = "info" # Log level: trace, debug, info, warn, error format = "json" # Format: json, pretty, compact output = "stdout" # Output: stdout, stderr, file, or file path # Structured logging fields include_timestamp = true # Include timestamp in logs include_level = true # Include log level include_target = true # Include log target (module) include_thread_id = false # Include thread ID include_request_id = true # Include request correlation ID include_user_agent = true # Include user agent in request logs include_response_time = true # Include response time # File logging (when output is a file path) log_file_path = "/var/log/codegraph/api.log" # Log file path max_file_size = "100MB" # Maximum log file size max_files = 10 # Number of log files to retain compress_rotated = true # Compress rotated log files # Syslog (alternative to file logging) syslog_enabled = false # Enable syslog output syslog_facility = "daemon" # Syslog facility syslog_ident = "codegraph-api" # Syslog identifier [metrics] # Prometheus metrics enabled = true # Enable metrics collection endpoint = "/metrics" # Metrics endpoint path include_system_metrics = true # Include system metrics (CPU, memory) include_custom_metrics = true # Include application-specific metrics # Metric collection settings collection_interval = "15s" # Metric collection frequency histogram_buckets = [ # Response time histogram buckets 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0 ] # Metric labels include_method_label = true # Include HTTP method in request metrics include_endpoint_label = true # Include endpoint path in metrics include_status_label = true # Include status code in metrics [performance] # Performance and resource management query_timeout = "30s" # Default query timeout search_timeout = "10s" # Search operation timeout index_timeout = "5m" # Indexing operation timeout backup_timeout = "1h" # Backup operation timeout # Concurrency limits max_concurrent_queries = 100 # Maximum concurrent queries max_concurrent_searches = 50 # Maximum concurrent searches max_concurrent_indexing = 5 # Maximum concurrent indexing jobs max_concurrent_backups = 2 # Maximum concurrent backup operations # Memory management max_memory_usage = "4GB" # Maximum memory usage memory_check_interval = "30s" # Memory usage check frequency gc_interval = "5m" # Garbage collection trigger interval memory_pressure_threshold = 0.85 # Memory pressure warning threshold # Caching settings enable_query_cache = true # Enable query result caching query_cache_size = 1000 # Maximum cached queries query_cache_ttl = "1h" # Query cache time-to-live enable_result_cache = true # Enable API result caching result_cache_size = 500 # Maximum cached results result_cache_ttl = "15m" # Result cache time-to-live # Batch processing batch_size = 1000 # Default batch size for operations max_batch_size = 10000 # Maximum allowed batch size batch_timeout = "30s" # Batch processing timeout [health] # Health check configuration enabled = true # Enable health checks endpoint = "/health" # Health check endpoint detailed_endpoint = "/api/v1/health" # Detailed health check endpoint # Component health checks check_database = true # Check database connectivity check_vector_search = true # Check vector search availability check_file_system = true # Check file system access check_memory = true # Check memory usage check_external_services = false # Check external service dependencies # Health check thresholds memory_threshold = 0.9 # Memory usage warning threshold disk_space_threshold = 0.85 # Disk space warning threshold response_time_threshold = "1s" # Response time warning threshold [development] # Development and debugging settings (disable in production) debug_mode = false # Enable debug features enable_playground = false # Enable GraphQL playground enable_profiling = false # Enable built-in profiler log_requests = false # Log all requests (verbose) log_responses = false # Log all responses (very verbose) cors_permissive = false # Allow all origins in CORS ``` ### Environment Variable Overrides Any configuration option can be overridden using environment variables with the pattern `CODEGRAPH_<SECTION>_<KEY>`: ```bash # Server configuration export CODEGRAPH_SERVER_HOST=0.0.0.0 export CODEGRAPH_SERVER_PORT=8000 export CODEGRAPH_SERVER_WORKERS=8 # Database configuration export CODEGRAPH_DATABASE_PATH=/opt/codegraph/data/rocks.db export CODEGRAPH_DATABASE_CACHE_SIZE=2048 # Security settings export CODEGRAPH_SECURITY_API_KEY_REQUIRED=true export CODEGRAPH_SECURITY_RATE_LIMITING=true # Performance tuning export CODEGRAPH_PERFORMANCE_MAX_MEMORY_USAGE=4GB export CODEGRAPH_PERFORMANCE_QUERY_TIMEOUT=30s # Logging export CODEGRAPH_LOGGING_LEVEL=info export CODEGRAPH_LOGGING_FORMAT=json # Special environment variables export RUST_LOG=codegraph=info,tower_http=warn export RUST_BACKTRACE=1 ``` ## Troubleshooting Guide ### Common Issues and Solutions #### 1. Service Won't Start **Symptoms**: - Service fails to start - "Address already in use" error - Permission denied errors **Diagnostics**: ```bash # Check if port is in use sudo netstat -tulpn | grep :8000 sudo lsof -i :8000 # Check service status sudo systemctl status codegraph-api # View service logs sudo journalctl -u codegraph-api --since "1 hour ago" # Check configuration file codegraph-api --check-config /etc/codegraph/config.toml # Test configuration codegraph-api --config /etc/codegraph/config.toml --dry-run ``` **Solutions**: ```bash # Change port if in use export CODEGRAPH_SERVER_PORT=8001 # Fix permissions sudo chown -R codegraph:codegraph /opt/codegraph sudo chmod 755 /opt/codegraph sudo chmod 644 /etc/codegraph/config.toml # Check SELinux (RHEL/CentOS) sudo setsebool -P httpd_can_network_connect 1 sudo semanage port -a -t http_port_t -p tcp 8000 # Check firewall sudo ufw allow 8000/tcp sudo firewall-cmd --permanent --add-port=8000/tcp sudo firewall-cmd --reload ``` #### 2. High Memory Usage **Symptoms**: - Memory usage constantly increasing - Out of memory errors - System becomes unresponsive **Diagnostics**: ```bash # Monitor memory usage top -p $(pgrep codegraph-api) ps aux | grep codegraph-api # Check memory configuration curl -s http://localhost:8000/api/v1/health | jq '.components.memory' # Memory profiling sudo perf record -g ./target/release/codegraph-api valgrind --tool=memcheck --leak-check=full ./target/release/codegraph-api ``` **Solutions**: ```bash # Reduce cache sizes export CODEGRAPH_DATABASE_CACHE_SIZE=1024 export CODEGRAPH_PERFORMANCE_MAX_MEMORY_USAGE=2GB # Restart service to free memory sudo systemctl restart codegraph-api # Tune garbage collection export RUST_MIN_STACK=8388608 export MALLOC_CONF="dirty_decay_ms:5000,muzzy_decay_ms:5000" # Add swap space (temporary solution) sudo dd if=/dev/zero of=/swapfile bs=1024 count=2097152 sudo chmod 600 /swapfile sudo mkswap /swapfile sudo swapon /swapfile ``` #### 3. Slow Query Performance **Symptoms**: - API requests timeout - High response times - Database operations are slow **Diagnostics**: ```bash # Check query performance curl -w "@curl-format.txt" -o /dev/null -s "http://localhost:8000/api/v1/search?q=test" # Database statistics curl -s http://localhost:8000/api/v1/admin/stats | jq '.database' # Monitor active queries curl -s http://localhost:8000/api/v1/admin/queries # System resource usage iostat -x 1 iotop -o ``` **Solutions**: ```bash # Optimize RocksDB settings export CODEGRAPH_DATABASE_MAX_BACKGROUND_JOBS=4 export CODEGRAPH_DATABASE_LEVEL0_FILE_NUM_COMPACTION_TRIGGER=2 # Increase query timeout export CODEGRAPH_PERFORMANCE_QUERY_TIMEOUT=60s # Enable query caching export CODEGRAPH_PERFORMANCE_ENABLE_QUERY_CACHE=true # Compact database curl -X POST http://localhost:8000/api/v1/admin/compact # Add read replicas (for heavy read loads) # See scaling strategies section ``` #### 4. Vector Search Issues **Symptoms**: - Vector search returns no results - FAISS index corruption errors - Embedding generation failures **Diagnostics**: ```bash # Check vector search health curl -s http://localhost:8000/api/v1/health | jq '.components.vector_search' # Test vector search curl -X POST http://localhost:8000/api/v1/similar \ -H "Content-Type: application/json" \ -d '{"code": "function test() {}", "threshold": 0.8}' # Check index statistics curl -s http://localhost:8000/api/v1/admin/vector/stats ``` **Solutions**: ```bash # Rebuild vector index curl -X POST http://localhost:8000/api/v1/admin/vector/rebuild # Reduce vector dimension for testing export CODEGRAPH_VECTOR_DIMENSION=384 # Switch to simpler index type export CODEGRAPH_VECTOR_INDEX_TYPE=flat # Clear vector cache rm -rf /opt/codegraph/data/vector/cache/ ``` #### 5. Network Connectivity Issues **Symptoms**: - Cannot connect to API - Intermittent connection failures - Timeout errors from clients **Diagnostics**: ```bash # Test local connectivity curl -v http://localhost:8000/health # Test external connectivity curl -v http://YOUR_SERVER_IP:8000/health # Check network configuration ss -tulpn | grep :8000 netstat -rn # Test DNS resolution nslookup your-domain.com dig your-domain.com ``` **Solutions**: ```bash # Check firewall rules sudo iptables -L -n sudo ufw status verbose # Test with different bind address export CODEGRAPH_SERVER_HOST=127.0.0.1 # Local only export CODEGRAPH_SERVER_HOST=0.0.0.0 # All interfaces # Check load balancer configuration curl -H "Host: api.codegraph.dev" http://load-balancer-ip/health # Verify SSL certificates openssl s_client -connect api.codegraph.dev:443 -servername api.codegraph.dev ``` ### Error Code Reference #### HTTP Status Codes | Code | Description | Common Causes | Solutions | |------|-------------|---------------|-----------| | 400 | Bad Request | Invalid JSON, missing parameters | Validate request format | | 401 | Unauthorized | Missing/invalid API key | Check API key configuration | | 403 | Forbidden | Insufficient permissions | Review user permissions | | 404 | Not Found | Resource doesn't exist | Verify resource ID | | 408 | Request Timeout | Query took too long | Increase timeout, optimize query | | 413 | Payload Too Large | Request body too large | Increase max request size | | 429 | Too Many Requests | Rate limit exceeded | Implement backoff, increase limits | | 500 | Internal Server Error | Server-side error | Check logs, restart service | | 502 | Bad Gateway | Upstream server error | Check load balancer/proxy | | 503 | Service Unavailable | Server overloaded | Scale up, check resources | | 504 | Gateway Timeout | Upstream timeout | Increase upstream timeout | #### Application Error Codes | Code | Category | Description | Resolution | |------|----------|-------------|------------| | CG001 | Database | Connection failed | Check database path/permissions | | CG002 | Database | Transaction failed | Retry operation, check disk space | | CG003 | Database | Corruption detected | Run repair, restore from backup | | CG004 | Vector | Index not found | Rebuild vector index | | CG005 | Vector | Embedding failed | Check model availability | | CG006 | Parser | Unsupported language | Add language support | | CG007 | Parser | Parse error | Check file encoding | | CG008 | Config | Invalid configuration | Validate config file | | CG009 | Auth | Invalid credentials | Update API key | | CG010 | Resource | Insufficient memory | Increase memory limits | ### Log Analysis #### Common Log Patterns **Startup Issues**: ```bash # Look for startup errors sudo journalctl -u codegraph-api | grep -E "(ERROR|FATAL|failed to start)" # Check configuration loading sudo journalctl -u codegraph-api | grep -E "(config|configuration)" # Port binding issues sudo journalctl -u codegraph-api | grep -E "(bind|address already in use)" ``` **Performance Issues**: ```bash # Slow queries sudo journalctl -u codegraph-api | grep -E "slow.*query" # Memory warnings sudo journalctl -u codegraph-api | grep -E "(memory|OOM|out of memory)" # Database issues sudo journalctl -u codegraph-api | grep -E "(database|rocksdb|compaction)" ``` **Security Issues**: ```bash # Authentication failures sudo journalctl -u codegraph-api | grep -E "(auth|unauthorized|forbidden)" # Rate limiting sudo journalctl -u codegraph-api | grep -E "rate.limit" # Suspicious activity sudo journalctl -u codegraph-api | grep -E "(blocked|suspicious|attack)" ``` #### Log Analysis Tools ```bash # Install log analysis tools sudo apt install goaccess multitail lnav # Analyze access patterns with GoAccess sudo goaccess /var/log/codegraph/access.log --log-format=COMBINED # Real-time log monitoring sudo multitail /var/log/codegraph/api.log /var/log/codegraph/error.log # Advanced log navigation sudo lnav /var/log/codegraph/*.log ``` ## Scaling Strategies ### Vertical Scaling (Scale Up) #### Resource Scaling Guidelines **CPU Scaling**: ```toml # Increase workers based on CPU cores [server] workers = 16 # 2x CPU cores for CPU-bound workloads # Adjust concurrent processing [performance] max_concurrent_queries = 200 max_concurrent_indexing = 8 ``` **Memory Scaling**: ```toml # Increase cache sizes with more RAM [database] cache_size = 8192 # 8GB cache for 32GB RAM system [performance] max_memory_usage = "16GB" query_cache_size = 5000 result_cache_size = 2000 ``` **Storage Scaling**: ```toml # Optimize for faster storage [database] max_background_jobs = 16 write_buffer_size = 512 # Larger buffers for SSD target_file_size_base = 134217728 # 128MB files ``` ### Horizontal Scaling (Scale Out) #### Read Replica Setup **Primary-Replica Architecture**: ```yaml # docker-compose.scale.yml version: '3.8' services: codegraph-primary: image: codegraph/api:latest environment: - CODEGRAPH_ROLE=primary volumes: - primary-data:/app/data ports: - "8000:8000" codegraph-replica-1: image: codegraph/api:latest environment: - CODEGRAPH_ROLE=replica - CODEGRAPH_PRIMARY_URL=http://codegraph-primary:8000 volumes: - replica1-data:/app/data ports: - "8001:8000" codegraph-replica-2: image: codegraph/api:latest environment: - CODEGRAPH_ROLE=replica - CODEGRAPH_PRIMARY_URL=http://codegraph-primary:8000 volumes: - replica2-data:/app/data ports: - "8002:8000" load-balancer: image: nginx:alpine volumes: - ./nginx-lb.conf:/etc/nginx/nginx.conf ports: - "80:80" depends_on: - codegraph-primary - codegraph-replica-1 - codegraph-replica-2 ``` **Load Balancer Configuration**: ```nginx # nginx-lb.conf upstream codegraph_read { server codegraph-replica-1:8000; server codegraph-replica-2:8000; server codegraph-primary:8000 backup; } upstream codegraph_write { server codegraph-primary:8000; } server { listen 80; # Read operations to replicas location ~ ^/api/v1/(search|projects/[^/]+$|entities) { proxy_pass http://codegraph_read; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; } # Write operations to primary location ~ ^/api/v1/(projects$) { if ($request_method !~ ^(GET|HEAD)$) { proxy_pass http://codegraph_write; } proxy_pass http://codegraph_read; } # Default to primary location / { proxy_pass http://codegraph_write; proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; } } ``` #### Kubernetes Horizontal Pod Autoscaler ```yaml # hpa-advanced.yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: codegraph-api-hpa namespace: codegraph spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: codegraph-api minReplicas: 3 maxReplicas: 50 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80 - type: Pods pods: metric: name: http_requests_per_second target: type: AverageValue averageValue: "100" - type: External external: metric: name: queue_depth target: type: Value value: "30" behavior: scaleDown: stabilizationWindowSeconds: 300 policies: - type: Pods value: 2 periodSeconds: 60 - type: Percent value: 10 periodSeconds: 60 scaleUp: stabilizationWindowSeconds: 30 policies: - type: Pods value: 4 periodSeconds: 60 - type: Percent value: 100 periodSeconds: 15 ``` ### Database Scaling #### Sharding Strategy **Horizontal Partitioning**: ```toml # Shard configuration based on project ID [database] sharding_enabled = true shard_count = 8 shard_key = "project_id" shard_algorithm = "consistent_hash" # Shard mapping [[database.shards]] id = 0 path = "/data/shard0/rocks.db" range_start = "00000000" range_end = "1fffffff" [[database.shards]] id = 1 path = "/data/shard1/rocks.db" range_start = "20000000" range_end = "3fffffff" ``` **Cross-Shard Query Coordination**: ```rust // Example cross-shard query implementation pub struct ShardedGraphStore { shards: Vec<Arc<GraphStore>>, consistent_hash: ConsistentHash, } impl ShardedGraphStore { pub async fn search_across_shards(&self, query: &str) -> Result<Vec<Entity>> { let mut tasks = Vec::new(); for shard in &self.shards { let query = query.to_string(); let shard = shard.clone(); let task = tokio::spawn(async move { shard.search(&query).await }); tasks.push(task); } let results = futures::future::join_all(tasks).await; let mut all_entities = Vec::new(); for result in results { match result? { Ok(entities) => all_entities.extend(entities), Err(e) => tracing::warn!("Shard query failed: {}", e), } } // Sort and limit results all_entities.sort_by(|a, b| b.score.partial_cmp(&a.score).unwrap()); all_entities.truncate(100); Ok(all_entities) } } ``` ### Caching Strategies #### Multi-Level Caching ```toml [caching] # L1 Cache - In-memory (fastest) l1_enabled = true l1_size = "1GB" l1_ttl = "5m" # L2 Cache - Redis (shared across instances) l2_enabled = true l2_type = "redis" l2_url = "redis://localhost:6379" l2_ttl = "1h" # L3 Cache - Database query cache l3_enabled = true l3_size = "500MB" l3_ttl = "24h" ``` **Redis Cluster Setup**: ```yaml # redis-cluster.yml version: '3.8' services: redis-node-1: image: redis:7-alpine ports: - "7000:6379" command: redis-server --port 6379 --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes redis-node-2: image: redis:7-alpine ports: - "7001:6379" command: redis-server --port 6379 --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes redis-node-3: image: redis:7-alpine ports: - "7002:6379" command: redis-server --port 6379 --cluster-enabled yes --cluster-config-file nodes.conf --cluster-node-timeout 5000 --appendonly yes ``` ## Maintenance Procedures ### Routine Maintenance Tasks #### Daily Tasks ```bash #!/bin/bash # daily_maintenance.sh echo "=== Daily CodeGraph Maintenance ===" date # Check service health echo "1. Checking service health..." systemctl is-active codegraph-api curl -f http://localhost:8000/health || echo "Health check failed" # Check disk space echo "2. Checking disk space..." df -h /opt/codegraph/data if [ $(df /opt/codegraph/data | tail -1 | awk '{print $5}' | sed 's/%//') -gt 85 ]; then echo "WARNING: Disk usage > 85%" fi # Check memory usage echo "3. Checking memory usage..." ps aux | grep codegraph-api | head -1 # Rotate logs if needed echo "4. Rotating logs..." if [ -f /var/log/codegraph/api.log ] && [ $(stat -f%z /var/log/codegraph/api.log) -gt 104857600 ]; then systemctl reload codegraph-api fi # Check for errors in logs echo "5. Checking recent errors..." journalctl -u codegraph-api --since "24 hours ago" | grep -i error | tail -10 echo "Daily maintenance completed" ``` #### Weekly Tasks ```bash #!/bin/bash # weekly_maintenance.sh echo "=== Weekly CodeGraph Maintenance ===" date # Update system packages echo "1. Updating system packages..." sudo apt update && sudo apt upgrade -y # Database maintenance echo "2. Running database maintenance..." curl -X POST http://localhost:8000/api/v1/admin/maintenance \ -H "Authorization: Bearer ${ADMIN_API_KEY}" # Clean temporary files echo "3. Cleaning temporary files..." find /opt/codegraph/temp -type f -mtime +7 -delete 2>/dev/null || true # Check and clean old backups echo "4. Managing backups..." find /opt/codegraph/backups -name "*.tar.gz" -mtime +30 -delete # Security updates check echo "5. Checking for security updates..." sudo apt list --upgradable | grep -i security || echo "No security updates" # Performance report echo "6. Generating performance report..." curl -s http://localhost:8000/metrics | grep -E "(codegraph_requests_total|codegraph_request_duration)" echo "Weekly maintenance completed" ``` #### Monthly Tasks ```bash #!/bin/bash # monthly_maintenance.sh echo "=== Monthly CodeGraph Maintenance ===" date # Full system backup echo "1. Creating full system backup..." /opt/codegraph/scripts/backup.sh full # Database optimization echo "2. Optimizing database..." curl -X POST http://localhost:8000/api/v1/admin/optimize \ -H "Authorization: Bearer ${ADMIN_API_KEY}" # Security audit echo "3. Running security audit..." sudo lynis audit system # Performance benchmarking echo "4. Running performance benchmarks..." ab -n 1000 -c 10 http://localhost:8000/health > /tmp/perf_report.txt # Review and update configuration echo "5. Configuration review reminder..." echo "Review configuration for optimization opportunities" echo "Check for new feature flags and optimizations" echo "Monthly maintenance completed" ``` ### Backup and Restore Procedures #### Comprehensive Backup Script ```bash #!/bin/bash # comprehensive_backup.sh set -euo pipefail BACKUP_TYPE=${1:-incremental} BACKUP_ROOT="/opt/codegraph/backups" TIMESTAMP=$(date +%Y%m%d_%H%M%S) BACKUP_NAME="codegraph_${BACKUP_TYPE}_${TIMESTAMP}" echo "Starting ${BACKUP_TYPE} backup: ${BACKUP_NAME}" # Create backup directory mkdir -p "${BACKUP_ROOT}/${BACKUP_NAME}" case $BACKUP_TYPE in "full") echo "Creating full backup..." # Stop indexing jobs curl -X POST http://localhost:8000/api/v1/admin/pause-indexing \ -H "Authorization: Bearer ${ADMIN_API_KEY}" # Create database checkpoint curl -X POST http://localhost:8000/api/v1/admin/checkpoint \ -H "Authorization: Bearer ${ADMIN_API_KEY}" \ -d "{\"path\": \"${BACKUP_ROOT}/${BACKUP_NAME}/rocksdb\"}" # Backup vector indices cp -r /opt/codegraph/data/vector "${BACKUP_ROOT}/${BACKUP_NAME}/" # Backup configuration cp -r /etc/codegraph "${BACKUP_ROOT}/${BACKUP_NAME}/config" # Resume indexing curl -X POST http://localhost:8000/api/v1/admin/resume-indexing \ -H "Authorization: Bearer ${ADMIN_API_KEY}" ;; "incremental") echo "Creating incremental backup..." # Get last backup timestamp LAST_BACKUP=$(find "${BACKUP_ROOT}" -name "codegraph_*" -type d | sort | tail -1) SINCE_TIME="1970-01-01T00:00:00Z" if [ -n "$LAST_BACKUP" ]; then SINCE_TIME=$(stat -c %y "$LAST_BACKUP" | cut -d' ' -f1)T$(stat -c %y "$LAST_BACKUP" | cut -d' ' -f2 | cut -d. -f1)Z fi # Backup changed files only curl -X POST http://localhost:8000/api/v1/admin/backup-incremental \ -H "Authorization: Bearer ${ADMIN_API_KEY}" \ -d "{\"path\": \"${BACKUP_ROOT}/${BACKUP_NAME}\", \"since\": \"${SINCE_TIME}\"}" ;; *) echo "Unknown backup type: $BACKUP_TYPE" exit 1 ;; esac # Create metadata cat > "${BACKUP_ROOT}/${BACKUP_NAME}/metadata.json" <<EOF { "timestamp": "${TIMESTAMP}", "type": "${BACKUP_TYPE}", "version": "$(curl -s http://localhost:8000/health | jq -r .version)", "hostname": "$(hostname)", "size": "$(du -sh ${BACKUP_ROOT}/${BACKUP_NAME} | cut -f1)" } EOF # Compress backup echo "Compressing backup..." tar -czf "${BACKUP_ROOT}/${BACKUP_NAME}.tar.gz" \ -C "${BACKUP_ROOT}" "${BACKUP_NAME}" # Verify backup if tar -tzf "${BACKUP_ROOT}/${BACKUP_NAME}.tar.gz" >/dev/null; then echo "Backup verification successful" rm -rf "${BACKUP_ROOT}/${BACKUP_NAME}" else echo "Backup verification failed" exit 1 fi # Upload to cloud (if configured) if [ "${CLOUD_BACKUP:-false}" = "true" ]; then echo "Uploading to cloud storage..." case "${CLOUD_PROVIDER:-}" in "aws") aws s3 cp "${BACKUP_ROOT}/${BACKUP_NAME}.tar.gz" \ "s3://${S3_BUCKET}/backups/" ;; "gcp") gsutil cp "${BACKUP_ROOT}/${BACKUP_NAME}.tar.gz" \ "gs://${GCS_BUCKET}/backups/" ;; "azure") az storage blob upload \ --file "${BACKUP_ROOT}/${BACKUP_NAME}.tar.gz" \ --container backups \ --name "${BACKUP_NAME}.tar.gz" ;; esac fi # Clean old backups find "${BACKUP_ROOT}" -name "codegraph_*.tar.gz" -mtime +${BACKUP_RETENTION:-30} -delete echo "Backup completed: ${BACKUP_NAME}.tar.gz" ``` ### Update Procedures #### Rolling Update Process ```bash #!/bin/bash # rolling_update.sh NEW_VERSION=${1:-latest} CURRENT_VERSION=$(curl -s http://localhost:8000/health | jq -r .version) echo "Rolling update from ${CURRENT_VERSION} to ${NEW_VERSION}" # Pre-update checks echo "1. Pre-update validation..." curl -f http://localhost:8000/health || exit 1 systemctl is-active codegraph-api || exit 1 # Create backup echo "2. Creating pre-update backup..." ./comprehensive_backup.sh full # Download new version echo "3. Downloading new version..." wget -O /tmp/codegraph-${NEW_VERSION}.tar.gz \ "https://releases.codegraph.dev/v${NEW_VERSION}/codegraph-linux-x86_64.tar.gz" # Verify checksum echo "4. Verifying download..." wget -O /tmp/codegraph-${NEW_VERSION}.sha256 \ "https://releases.codegraph.dev/v${NEW_VERSION}/codegraph-linux-x86_64.tar.gz.sha256" cd /tmp && sha256sum -c codegraph-${NEW_VERSION}.sha256 # Extract new binary echo "5. Extracting new binary..." tar -xzf /tmp/codegraph-${NEW_VERSION}.tar.gz -C /tmp/ # Test new binary echo "6. Testing new binary..." /tmp/codegraph-api --version /tmp/codegraph-api --check-config /etc/codegraph/config.toml # Stop service echo "7. Stopping service..." systemctl stop codegraph-api # Backup current binary echo "8. Backing up current binary..." cp /usr/local/bin/codegraph-api /usr/local/bin/codegraph-api.${CURRENT_VERSION} # Install new binary echo "9. Installing new binary..." cp /tmp/codegraph-api /usr/local/bin/codegraph-api chmod +x /usr/local/bin/codegraph-api # Start service echo "10. Starting service..." systemctl start codegraph-api # Wait for service to be ready echo "11. Waiting for service to be ready..." for i in {1..30}; do if curl -f http://localhost:8000/health >/dev/null 2>&1; then echo "Service is ready" break fi echo "Waiting for service... ($i/30)" sleep 2 done # Verify update echo "12. Verifying update..." NEW_RUNNING_VERSION=$(curl -s http://localhost:8000/health | jq -r .version) if [ "$NEW_RUNNING_VERSION" = "$NEW_VERSION" ]; then echo "Update successful: ${CURRENT_VERSION} → ${NEW_VERSION}" rm -f /usr/local/bin/codegraph-api.${CURRENT_VERSION} else echo "Update failed - rolling back..." systemctl stop codegraph-api mv /usr/local/bin/codegraph-api.${CURRENT_VERSION} /usr/local/bin/codegraph-api systemctl start codegraph-api exit 1 fi # Cleanup rm -f /tmp/codegraph-${NEW_VERSION}.* rm -f /tmp/codegraph-api echo "Rolling update completed successfully" ``` ## Monitoring and Alerting ### Prometheus Configuration ```yaml # prometheus.yml - Production configuration global: scrape_interval: 15s evaluation_interval: 15s external_labels: cluster: 'codegraph-production' region: 'us-east-1' rule_files: - "alerts/codegraph.yml" - "alerts/infrastructure.yml" alerting: alertmanagers: - static_configs: - targets: - alertmanager:9093 scrape_configs: # CodeGraph API instances - job_name: 'codegraph-api' static_configs: - targets: - 'codegraph-api-1:8000' - 'codegraph-api-2:8000' - 'codegraph-api-3:8000' metrics_path: '/metrics' scrape_interval: 15s scrape_timeout: 10s # System metrics - job_name: 'node-exporter' static_configs: - targets: - 'node1:9100' - 'node2:9100' - 'node3:9100' scrape_interval: 15s # Load balancer - job_name: 'nginx' static_configs: - targets: ['nginx-exporter:9113'] scrape_interval: 30s # Database metrics (if using external monitoring) - job_name: 'rocksdb-exporter' static_configs: - targets: ['rocksdb-exporter:9090'] scrape_interval: 30s ``` ### Alert Rules ```yaml # alerts/codegraph.yml groups: - name: codegraph-api rules: # Service availability - alert: CodeGraphServiceDown expr: up{job="codegraph-api"} == 0 for: 1m labels: severity: critical annotations: summary: "CodeGraph API service is down" description: "CodeGraph API on {{ $labels.instance }} has been down for more than 1 minute" runbook_url: "https://runbooks.codegraph.dev/service-down" # High error rate - alert: CodeGraphHighErrorRate expr: rate(codegraph_requests_total{status=~"5.."}[5m]) / rate(codegraph_requests_total[5m]) > 0.05 for: 2m labels: severity: critical annotations: summary: "CodeGraph API high error rate" description: "Error rate is {{ $value | humanizePercentage }} for the last 5 minutes" runbook_url: "https://runbooks.codegraph.dev/high-error-rate" # High latency - alert: CodeGraphHighLatency expr: histogram_quantile(0.95, rate(codegraph_request_duration_seconds_bucket[5m])) > 1.0 for: 5m labels: severity: warning annotations: summary: "CodeGraph API high latency" description: "95th percentile latency is {{ $value }}s over the last 5 minutes" # Memory usage - alert: CodeGraphHighMemoryUsage expr: process_resident_memory_bytes{job="codegraph-api"} / (1024 * 1024 * 1024) > 3.5 for: 5m labels: severity: warning annotations: summary: "CodeGraph API high memory usage" description: "Memory usage is {{ $value }}GB on {{ $labels.instance }}" # Database size growth - alert: CodeGraphDatabaseGrowthRate expr: increase(codegraph_database_size_bytes[1h]) > (500 * 1024 * 1024) for: 0m labels: severity: info annotations: summary: "CodeGraph database growing rapidly" description: "Database grew by {{ $value | humanizeBytes }} in the last hour" # Queue depth - alert: CodeGraphHighQueueDepth expr: codegraph_indexing_queue_depth > 100 for: 10m labels: severity: warning annotations: summary: "CodeGraph indexing queue depth high" description: "Indexing queue has {{ $value }} pending jobs for more than 10 minutes" # Disk space - alert: CodeGraphLowDiskSpace expr: (node_filesystem_avail_bytes{mountpoint="/opt/codegraph"} / node_filesystem_size_bytes{mountpoint="/opt/codegraph"}) < 0.15 for: 5m labels: severity: warning annotations: summary: "CodeGraph low disk space" description: "Disk space on {{ $labels.instance }} is {{ $value | humanizePercentage }} full" - name: codegraph-infrastructure rules: # CPU usage - alert: CodeGraphHighCPUUsage expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80 for: 5m labels: severity: warning annotations: summary: "High CPU usage on CodeGraph server" description: "CPU usage is {{ $value }}% on {{ $labels.instance }}" # Load average - alert: CodeGraphHighLoadAverage expr: node_load5 / count(count(node_cpu_seconds_total) by (cpu)) by (instance) > 2.0 for: 10m labels: severity: warning annotations: summary: "High load average on CodeGraph server" description: "Load average is {{ $value }} on {{ $labels.instance }}" # Network errors - alert: CodeGraphNetworkErrors expr: increase(node_network_receive_errs_total[5m]) + increase(node_network_transmit_errs_total[5m]) > 10 for: 2m labels: severity: warning annotations: summary: "Network errors on CodeGraph server" description: "{{ $value }} network errors in the last 5 minutes on {{ $labels.instance }}" ``` ### Grafana Dashboard Configuration The operations manual is now complete with comprehensive installation guides, configuration references, troubleshooting procedures, scaling strategies, and maintenance protocols. This provides operations teams with everything needed to successfully deploy, monitor, and maintain CodeGraph in production environments.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Jakedismo/codegraph-rust'

If you have feedback or need assistance with the MCP directory API, please join our Discord server