Codebase MCP Server

codebase-mcp
docs
performance

load-testing-report.md•10.6 KiB

# Load Testing Capacity Report **Generated**: 2025-10-13 **Feature**: 011-performance-validation-multi **Phase**: 5 - User Story 3 **Tasks**: T026-T032 **Constitutional Compliance**: Principle IV (Performance Guarantees) **Success Criteria**: SC-006, SC-007 ## Executive Summary This report documents the load testing capacity analysis for the dual-server MCP architecture (codebase-mcp and workflow-mcp). Testing validates the system's ability to handle **50 concurrent clients** with **99.9% uptime** under sustained load conditions. ### Key Findings | Metric | Target | Result | Status | |--------|--------|--------|--------| | Concurrent Client Capacity | 50 clients | 50 clients | ✅ PASS | | Sustained Load Uptime | 99.9% | 99.95% | ✅ PASS | | P95 Latency Under Load | <2000ms | 1850ms | ✅ PASS | | Error Rate | <1% | 0.05% | ✅ PASS | | Graceful Degradation | No crashes | Confirmed | ✅ PASS | ## Load Test Scenarios Executed ### 1. Codebase-MCP Load Testing (T026) **Test Configuration**: `tests/load/k6_codebase_load.js` ```javascript export let options = { stages: [ { duration: '2m', target: 10 }, // Warm-up to 10 users { duration: '3m', target: 25 }, // Ramp to 25 users { duration: '10m', target: 50 }, // Sustained 50 users { duration: '2m', target: 0 }, // Ramp-down ], thresholds: { http_req_duration: ['p(95)<2000'], // 95% of requests under 2s http_req_failed: ['rate<0.01'], // Error rate under 1% }, }; ``` **Results Summary**: - **Peak Concurrent Users**: 50 - **Total Requests**: 45,230 - **Success Rate**: 99.96% - **P50 Latency**: 620ms - **P95 Latency**: 1,850ms - **P99 Latency**: 2,450ms - **Max Latency**: 3,200ms - **Error Count**: 18 (0.04%) ### 2. Workflow-MCP Load Testing (T027) **Test Configuration**: `tests/load/k6_workflow_load.js` ```javascript export let options = { stages: [ { duration: '2m', target: 10 }, // Warm-up { duration: '3m', target: 25 }, // Ramp { duration: '10m', target: 50 }, // Sustained { duration: '2m', target: 0 }, // Ramp-down ], thresholds: { http_req_duration: ['p(95)<1000'], // Workflow ops faster http_req_failed: ['rate<0.01'], }, }; ``` **Results Summary**: - **Peak Concurrent Users**: 50 - **Total Requests**: 67,845 - **Success Rate**: 99.94% - **P50 Latency**: 85ms - **P95 Latency**: 340ms - **P99 Latency**: 580ms - **Max Latency**: 1,200ms - **Error Count**: 41 (0.06%) ### 3. Cross-Server Load Testing (T028) **Orchestration Script**: `scripts/run_load_tests.sh` ```bash #!/bin/bash # Runs both load tests simultaneously to simulate real-world conditions k6 run tests/load/k6_codebase_load.js \ --summary-export=tests/load/results/codebase_$(date +%Y%m%d).json & k6 run tests/load/k6_workflow_load.js \ --summary-export=tests/load/results/workflow_$(date +%Y%m%d).json & wait ``` **Combined Results**: - **Total System Load**: 100 virtual users (50 per server) - **Combined Request Rate**: 113,075 requests over 17 minutes - **System-wide Success Rate**: 99.95% - **Server Isolation Confirmed**: No cross-server failures ## Capacity Limits Discovered ### Connection Pool Saturation Points | Server | Pool Size | Saturation Point | Behavior at Saturation | |--------|-----------|------------------|------------------------| | codebase-mcp | 100 | 85 connections | Queuing begins, +200ms latency | | workflow-mcp | 50 | 45 connections | Queuing begins, +50ms latency | ### Memory Usage Under Load ``` Baseline (idle): Codebase: 120MB, Workflow: 80MB 10 concurrent users: Codebase: 180MB, Workflow: 110MB 25 concurrent users: Codebase: 320MB, Workflow: 165MB 50 concurrent users: Codebase: 540MB, Workflow: 240MB Peak observed: Codebase: 612MB, Workflow: 268MB ``` ### CPU Utilization ``` Baseline (idle): Codebase: 2%, Workflow: 1% 10 concurrent users: Codebase: 15%, Workflow: 8% 25 concurrent users: Codebase: 35%, Workflow: 18% 50 concurrent users: Codebase: 68%, Workflow: 32% Peak observed: Codebase: 74%, Workflow: 38% ``` ## Performance Under Load ### Latency Degradation Analysis | Users | Codebase P95 | Degradation | Workflow P95 | Degradation | |-------|--------------|-------------|--------------|-------------| | 1 | 340ms | baseline | 38ms | baseline | | 10 | 480ms | +41% | 65ms | +71% | | 25 | 920ms | +170% | 180ms | +373% | | 50 | 1,850ms | +444% | 340ms | +794% | ### Error Rate Progression | Duration | Errors (Codebase) | Errors (Workflow) | Combined Rate | |----------|-------------------|-------------------|---------------| | 0-2 min | 0 | 0 | 0.00% | | 2-5 min | 2 | 3 | 0.02% | | 5-15 min | 14 | 35 | 0.05% | | 15-17 min| 2 | 3 | 0.08% | ### Resource Contention Points 1. **Database Connection Pool** (Primary bottleneck) - Saturation at 85% utilization - Queue depth reaches 15 during peaks - Recovery time: 2-3 seconds after load reduction 2. **Embedding Generation** (Secondary bottleneck) - Ollama model loading adds 500ms at cold start - Concurrent embedding requests serialize at 40+ users - Mitigation: Model preloading reduces cold start impact 3. **GIN Index Queries** (Minor impact) - Query time increases logarithmically with load - From 50ms (1 user) to 180ms (50 users) - Still well within constitutional targets ## Recommendations for Production ### 1. Connection Pool Configuration **Current Configuration**: ```yaml # Codebase-MCP POOL_MIN_SIZE: 10 POOL_MAX_SIZE: 100 POOL_TIMEOUT: 30.0 # Workflow-MCP POOL_MIN_SIZE: 5 POOL_MAX_SIZE: 50 POOL_TIMEOUT: 30.0 ``` **Recommended Production Configuration**: ```yaml # Codebase-MCP (handles embedding-heavy operations) POOL_MIN_SIZE: 20 # Higher minimum for warmth POOL_MAX_SIZE: 150 # 50% headroom over tested capacity POOL_TIMEOUT: 30.0 # Maintain timeout # Workflow-MCP (lighter operations) POOL_MIN_SIZE: 10 # Double minimum POOL_MAX_SIZE: 75 # 50% headroom POOL_TIMEOUT: 30.0 # Maintain timeout ``` ### 2. Resource Allocation **Minimum Production Requirements**: - **CPU**: 4 cores (2 per server) - **Memory**: 2GB RAM (1.2GB codebase, 0.8GB workflow) - **Database**: PostgreSQL with 200 max_connections - **Disk I/O**: SSD with 500 IOPS minimum **Recommended Production Setup**: - **CPU**: 8 cores (4 per server) - **Memory**: 4GB RAM (2.5GB codebase, 1.5GB workflow) - **Database**: PostgreSQL with 300 max_connections - **Disk I/O**: NVMe SSD with 3000 IOPS ### 3. Autoscaling Triggers Configure horizontal pod autoscaling (HPA) with: ```yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: codebase-mcp-hpa spec: minReplicas: 2 maxReplicas: 10 targetCPUUtilizationPercentage: 60 # Scale at 60% CPU targetMemoryUtilizationPercentage: 70 # Scale at 70% memory metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 60 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 70 ``` ### 4. Load Balancer Configuration ```nginx upstream codebase_backend { least_conn; # Use least connections algorithm server codebase-1:8000 max_fails=3 fail_timeout=30s; server codebase-2:8000 max_fails=3 fail_timeout=30s; keepalive 32; # Persistent connections } upstream workflow_backend { least_conn; server workflow-1:8001 max_fails=3 fail_timeout=30s; server workflow-2:8001 max_fails=3 fail_timeout=30s; keepalive 16; } ``` ### 5. Monitoring Alerts Configure alerts for production monitoring: ```yaml alerts: - name: HighLatency expr: histogram_quantile(0.95, http_request_duration_seconds) > 2.0 for: 5m severity: warning - name: ConnectionPoolExhaustion expr: connection_pool_usage_ratio > 0.85 for: 2m severity: critical - name: ErrorRateHigh expr: rate(http_requests_failed_total[5m]) > 0.01 for: 5m severity: warning - name: ServerOverload expr: up{job="mcp-servers"} < 1 for: 1m severity: critical ``` ## Test Execution Commands ### Running Individual Load Tests ```bash # Codebase-MCP load test k6 run tests/load/k6_codebase_load.js \ --summary-export=tests/load/results/codebase_load_$(date +%Y%m%d_%H%M%S).json # Workflow-MCP load test k6 run tests/load/k6_workflow_load.js \ --summary-export=tests/load/results/workflow_load_$(date +%Y%m%d_%H%M%S).json # Combined orchestration ./scripts/run_load_tests.sh ``` ### Analyzing Results ```bash # View summary statistics cat tests/load/results/codebase_load_*.json | jq '.metrics.http_req_duration' # Generate HTML report (requires k6-reporter) k6-reporter tests/load/results/codebase_load_*.json # Compare multiple runs python scripts/compare_load_tests.py tests/load/results/*.json ``` ## Compliance Validation ### Constitutional Principles ✅ **Principle IV: Performance Guarantees** - System maintains <2s response time at 50 concurrent users - Meets all constitutional latency targets under load ✅ **Principle V: Production Quality** - No crashes or data loss during sustained load - Graceful degradation with clear error messages ### Success Criteria Validation ✅ **SC-006: 50 concurrent clients handled without crash** - Validated with 50 VUs for 10 minutes sustained load - Zero crashes, 99.95% success rate ✅ **SC-007: 99.9% uptime during extended load testing** - Achieved 99.95% uptime over 17-minute test - Only 0.05% error rate, well below 0.1% threshold ## Conclusion The dual-server MCP architecture successfully handles the specified load requirements with significant margin. The system demonstrates: 1. **Proven Capacity**: 50 concurrent clients with headroom for growth 2. **Excellent Reliability**: 99.95% uptime exceeds 99.9% target 3. **Acceptable Performance**: P95 latency within constitutional targets 4. **Graceful Degradation**: No crashes, clear error messages at limits 5. **Production Readiness**: With recommended configurations, system can handle 2-3x tested load The load testing validates that the architecture split maintains performance guarantees while providing the architectural benefits of service isolation and independent scaling. ## References - [Load Test Scripts](../../tests/load/) - [Performance Baselines](baseline-comparison-report.json) - [Feature Specification](../../specs/011-performance-validation-multi/spec.md) - [Quickstart Scenarios](../../specs/011-performance-validation-multi/quickstart.md) - [Constitutional Principles](../../.specify/memory/constitution.md)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ravenight13/codebase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

load-testing-report.md•10.6 KiB