Codebase MCP Server

codebase-mcp
docs
operations

performance-tuning.md•19.2 KiB

# Performance Tuning Operations Guide **Version**: 1.0.0 **Last Updated**: 2025-10-13 **Feature**: 011-performance-validation-multi **Phase**: 8 **Task**: T053 **Constitutional Compliance**: Principle IV (Performance Guarantees) ## Overview This guide provides comprehensive performance tuning strategies for the dual-server MCP architecture. It covers connection pool optimization, database indexing, query tuning, caching strategies, and production deployment recommendations to maintain constitutional performance targets. ## Connection Pool Sizing Recommendations ### Current Configuration Analysis ```yaml # Current Settings codebase-mcp: POOL_MIN_SIZE: 10 POOL_MAX_SIZE: 100 POOL_TIMEOUT: 30.0 workflow-mcp: POOL_MIN_SIZE: 5 POOL_MAX_SIZE: 50 POOL_TIMEOUT: 30.0 ``` ### Optimal Pool Sizing Formula ```python def calculate_optimal_pool_size( expected_qps: int, # Queries per second avg_query_time_ms: float, # Average query duration target_utilization: float = 0.75 # Target pool utilization ) -> dict: """Calculate optimal connection pool configuration.""" # Calculate concurrent connections needed concurrent_queries = (expected_qps * avg_query_time_ms) / 1000 # Add buffer for target utilization max_pool_size = int(concurrent_queries / target_utilization) # Min pool should be 25-40% of max min_pool_size = max(5, int(max_pool_size * 0.3)) return { "min_size": min_pool_size, "max_size": max_pool_size, "recommended_timeout": 30.0, "expected_utilization": target_utilization } # Example calculations codebase_config = calculate_optimal_pool_size( expected_qps=100, avg_query_time_ms=340, # From performance tests target_utilization=0.75 ) # Result: min=14, max=45 workflow_config = calculate_optimal_pool_size( expected_qps=200, avg_query_time_ms=80, # From performance tests target_utilization=0.75 ) # Result: min=7, max=22 ``` ### Production Recommendations by Load Profile | Load Profile | Concurrent Users | Codebase Pool | Workflow Pool | Notes | |--------------|-----------------|---------------|---------------|--------| | **Light** | 1-10 | min:5, max:20 | min:3, max:10 | Development/staging | | **Medium** | 10-25 | min:10, max:50 | min:5, max:25 | Small production | | **Heavy** | 25-50 | min:20, max:100 | min:10, max:50 | Standard production | | **Enterprise** | 50-200 | min:50, max:200 | min:25, max:100 | Large-scale deployment | ### Connection Pool Tuning Script ```bash #!/bin/bash # tune_connection_pool.sh - Dynamic pool sizing based on load # Monitor current utilization get_pool_stats() { curl -s http://localhost:8000/health/status | \ jq '.connection_pool.pool_utilization' } # Adjust pool size based on utilization adjust_pool() { local utilization=$(get_pool_stats) local current_max=$POOL_MAX_SIZE if (( $(echo "$utilization > 0.8" | bc -l) )); then # Increase pool size by 20% new_max=$(echo "$current_max * 1.2" | bc) export POOL_MAX_SIZE=$new_max echo "Increased pool to $new_max (utilization: $utilization)" elif (( $(echo "$utilization < 0.3" | bc -l) )); then # Decrease pool size by 10% new_max=$(echo "$current_max * 0.9" | bc) export POOL_MAX_SIZE=$new_max echo "Decreased pool to $new_max (utilization: $utilization)" fi } # Run adjustment loop while true; do adjust_pool sleep 60 # Check every minute done ``` ## Index Optimization Strategies ### Current Index Analysis ```sql -- Analyze existing indexes SELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size, idx_scan AS index_scans, idx_tup_read AS tuples_read, idx_tup_fetch AS tuples_fetched FROM pg_stat_user_indexes WHERE schemaname = 'public' ORDER BY idx_scan DESC; ``` ### GIN Index Optimization for JSONB ```sql -- Optimize GIN indexes for entity queries -- Current index CREATE INDEX idx_entities_data_gin ON entities USING gin(data); -- Optimized with specific paths for common queries CREATE INDEX idx_entities_status ON entities USING gin((data->'status')); CREATE INDEX idx_entities_version ON entities USING gin((data->'version')); CREATE INDEX idx_entities_tags ON entities USING gin(tags); -- Partial indexes for filtered queries CREATE INDEX idx_entities_active ON entities (entity_type, name) WHERE deleted_at IS NULL; -- Multi-column index for common query pattern CREATE INDEX idx_entities_type_name ON entities (entity_type, name) WHERE deleted_at IS NULL; ``` ### Embedding Vector Index Optimization ```sql -- Optimize pgvector indexes for similarity search -- Current index CREATE INDEX idx_chunks_embedding ON code_chunks USING ivfflat (embedding vector_cosine_ops) WITH (lists = 100); -- Optimized with tuned parameters -- lists = sqrt(number_of_rows) for optimal performance SELECT COUNT(*) FROM code_chunks; -- Example: 100,000 rows CREATE INDEX idx_chunks_embedding_optimized ON code_chunks USING ivfflat (embedding vector_cosine_ops) WITH (lists = 316); -- sqrt(100000) ≈ 316 -- Alternative: HNSW for better recall CREATE INDEX idx_chunks_embedding_hnsw ON code_chunks USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64); ``` ### Index Maintenance Schedule ```sql -- Regular maintenance tasks -- Run weekly during low-traffic periods -- Update statistics ANALYZE code_chunks; ANALYZE entities; ANALYZE work_items; -- Reindex for bloat reduction (monthly) REINDEX INDEX CONCURRENTLY idx_chunks_embedding; REINDEX INDEX CONCURRENTLY idx_entities_data_gin; -- Monitor index bloat SELECT schemaname, tablename, indexname, pg_size_pretty(pg_relation_size(indexrelid)) AS index_size, ROUND(100 * (pg_relation_size(indexrelid) - pg_relation_size(indexrelid::regclass)) / pg_relation_size(indexrelid)::numeric, 2) AS bloat_percent FROM pg_stat_user_indexes WHERE pg_relation_size(indexrelid) > 1000000 -- Only indexes > 1MB ORDER BY bloat_percent DESC; ``` ## Caching Strategies ### 1. Embedding Cache Implementation ```python class EmbeddingCache: """LRU cache for embedding generation results.""" def __init__(self, max_size: int = 10000, ttl_seconds: int = 3600): self.cache = {} self.max_size = max_size self.ttl = ttl_seconds self.access_count = defaultdict(int) self.last_access = {} def get_cache_key(self, text: str, model: str) -> str: """Generate cache key from text and model.""" text_hash = hashlib.sha256(text.encode()).hexdigest()[:16] return f"{model}:{text_hash}" async def get_or_generate( self, text: str, model: str, generate_func: Callable ) -> np.ndarray: """Get from cache or generate new embedding.""" key = self.get_cache_key(text, model) # Check cache if key in self.cache: entry = self.cache[key] if time.time() - entry["timestamp"] < self.ttl: self.access_count[key] += 1 self.last_access[key] = time.time() return entry["embedding"] # Generate new embedding embedding = await generate_func(text, model) # Store in cache (with LRU eviction) if len(self.cache) >= self.max_size: self._evict_lru() self.cache[key] = { "embedding": embedding, "timestamp": time.time() } return embedding def _evict_lru(self): """Evict least recently used entry.""" lru_key = min(self.last_access, key=self.last_access.get) del self.cache[lru_key] del self.last_access[lru_key] del self.access_count[lru_key] # Usage embedding_cache = EmbeddingCache(max_size=10000, ttl_seconds=3600) async def get_embedding_with_cache(text: str) -> np.ndarray: return await embedding_cache.get_or_generate( text=text, model="nomic-embed-text", generate_func=generate_embedding ) ``` ### 2. Query Result Cache ```python class QueryResultCache: """Redis-backed cache for search results.""" def __init__(self, redis_url: str = "redis://localhost:6379"): self.redis = aioredis.from_url(redis_url) self.default_ttl = 300 # 5 minutes def get_cache_key(self, query: str, filters: dict) -> str: """Generate cache key from query and filters.""" query_hash = hashlib.sha256(query.encode()).hexdigest()[:16] filter_hash = hashlib.sha256( json.dumps(filters, sort_keys=True).encode() ).hexdigest()[:16] return f"search:{query_hash}:{filter_hash}" async def get(self, query: str, filters: dict) -> Optional[List[dict]]: """Get cached search results.""" key = self.get_cache_key(query, filters) cached = await self.redis.get(key) if cached: return json.loads(cached) return None async def set( self, query: str, filters: dict, results: List[dict], ttl: int = None ): """Cache search results.""" key = self.get_cache_key(query, filters) ttl = ttl or self.default_ttl await self.redis.setex( key, ttl, json.dumps(results) ) # Integration with search endpoint async def search_with_cache(query: str, filters: dict) -> List[dict]: # Try cache first cached_results = await query_cache.get(query, filters) if cached_results: return cached_results # Perform actual search results = await perform_search(query, filters) # Cache results await query_cache.set(query, filters, results) return results ``` ### 3. Connection Pool Warmup ```python class PoolWarmupStrategy: """Pre-warm connection pools on startup.""" @staticmethod async def warmup_pool(pool_manager: ConnectionPoolManager): """Pre-establish minimum connections.""" tasks = [] # Create min_size connections for _ in range(pool_manager.config.min_size): tasks.append(pool_manager._create_connection()) # Execute in parallel connections = await asyncio.gather(*tasks) # Return connections to pool for conn in connections: await pool_manager._release_connection(conn) logger.info(f"Warmed up {len(connections)} connections") # Application startup async def startup(): # Initialize pool pool_manager = ConnectionPoolManager(config) # Warm up connections await PoolWarmupStrategy.warmup_pool(pool_manager) # Pool is now pre-warmed and ready return pool_manager ``` ## Query Optimization ### 1. Query Analysis Tools ```sql -- Enable query performance insights ALTER SYSTEM SET shared_preload_libraries = 'pg_stat_statements'; ALTER SYSTEM SET pg_stat_statements.track = 'all'; ALTER SYSTEM SET pg_stat_statements.max = 10000; -- Analyze slow queries SELECT query, calls, total_exec_time, mean_exec_time, stddev_exec_time, rows, 100.0 * shared_blks_hit / nullif(shared_blks_hit + shared_blks_read, 0) AS hit_percent FROM pg_stat_statements WHERE mean_exec_time > 100 -- Queries slower than 100ms ORDER BY mean_exec_time DESC LIMIT 20; ``` ### 2. Common Query Optimizations ```python # BEFORE: Inefficient N+1 query pattern async def get_work_items_with_children(): items = await db.fetch("SELECT * FROM work_items WHERE parent_id IS NULL") for item in items: item['children'] = await db.fetch( "SELECT * FROM work_items WHERE parent_id = $1", item['id'] ) return items # AFTER: Single query with JOIN async def get_work_items_with_children_optimized(): query = """ WITH RECURSIVE item_tree AS ( SELECT *, 0 as level FROM work_items WHERE parent_id IS NULL UNION ALL SELECT w.*, t.level + 1 FROM work_items w JOIN item_tree t ON w.parent_id = t.id ) SELECT * FROM item_tree ORDER BY level, created_at; """ return await db.fetch(query) ``` ### 3. Batch Processing Optimization ```python class BatchProcessor: """Optimize bulk operations with batching.""" @staticmethod async def batch_insert_chunks(chunks: List[dict], batch_size: int = 1000): """Insert chunks in optimized batches.""" for i in range(0, len(chunks), batch_size): batch = chunks[i:i + batch_size] # Build multi-value insert values = [] params = [] for j, chunk in enumerate(batch): offset = j * 5 values.append(f"(${offset+1}, ${offset+2}, ${offset+3}, ${offset+4}, ${offset+5})") params.extend([ chunk['file_id'], chunk['content'], chunk['embedding'], chunk['start_line'], chunk['end_line'] ]) query = f""" INSERT INTO code_chunks (file_id, content, embedding, start_line, end_line) VALUES {','.join(values)} ON CONFLICT DO NOTHING """ await db.execute(query, *params) @staticmethod async def parallel_embedding_generation( texts: List[str], concurrency: int = 10 ) -> List[np.ndarray]: """Generate embeddings with controlled concurrency.""" semaphore = asyncio.Semaphore(concurrency) async def generate_with_limit(text: str) -> np.ndarray: async with semaphore: return await generate_embedding(text) return await asyncio.gather(*[ generate_with_limit(text) for text in texts ]) ``` ## Production Tuning Checklist ### Pre-Deployment - [ ] **Database Configuration** ```sql -- PostgreSQL tuning for MCP workload ALTER SYSTEM SET max_connections = 300; ALTER SYSTEM SET shared_buffers = '2GB'; ALTER SYSTEM SET effective_cache_size = '6GB'; ALTER SYSTEM SET maintenance_work_mem = '512MB'; ALTER SYSTEM SET work_mem = '16MB'; ALTER SYSTEM SET random_page_cost = 1.1; -- For SSD ALTER SYSTEM SET effective_io_concurrency = 200; -- For SSD ALTER SYSTEM SET wal_buffers = '16MB'; ALTER SYSTEM SET checkpoint_completion_target = 0.9; ``` - [ ] **Connection Pool Settings** ```yaml # Environment variables POOL_MIN_SIZE: 20 POOL_MAX_SIZE: 100 POOL_TIMEOUT: 30 POOL_COMMAND_TIMEOUT: 5 POOL_MAX_INACTIVE_CONNECTION_LIFETIME: 300 ``` - [ ] **Index Creation** ```sql -- Essential indexes CREATE INDEX CONCURRENTLY idx_chunks_file_id ON code_chunks(file_id); CREATE INDEX CONCURRENTLY idx_chunks_embedding ON code_chunks USING ivfflat (embedding); CREATE INDEX CONCURRENTLY idx_entities_type_name ON entities(entity_type, name); CREATE INDEX CONCURRENTLY idx_work_items_path ON work_items USING gin(materialized_path); ``` ### Post-Deployment Monitoring - [ ] **Performance Baseline** ```bash # Capture initial metrics curl http://localhost:8000/metrics > baseline_metrics.txt python scripts/run_benchmarks.sh > baseline_performance.json ``` - [ ] **Query Performance** ```sql -- Monitor slow queries SELECT * FROM pg_stat_statements WHERE mean_exec_time > 100 ORDER BY mean_exec_time DESC; ``` - [ ] **Connection Pool Health** ```bash # Monitor pool utilization watch -n 5 'curl -s http://localhost:8000/health/status | jq .connection_pool' ``` - [ ] **Resource Usage** ```bash # Monitor memory and CPU htop -p $(pgrep -f "codebase-mcp") ``` ### Optimization Schedule | Frequency | Task | Command/Action | |-----------|------|----------------| | **Hourly** | Check pool utilization | `curl /health/status` | | **Daily** | Analyze slow queries | `pg_stat_statements` review | | **Weekly** | Update table statistics | `ANALYZE` all tables | | **Monthly** | Reindex for bloat | `REINDEX CONCURRENTLY` | | **Quarterly** | Review and adjust pool sizes | Load analysis | ## Advanced Tuning Techniques ### 1. Prepared Statement Optimization ```python class PreparedStatementCache: """Cache prepared statements for repeated queries.""" def __init__(self, max_statements: int = 100): self.statements = {} self.max_statements = max_statements async def execute(self, name: str, query: str, *args): """Execute query with prepared statement.""" if name not in self.statements: if len(self.statements) >= self.max_statements: # Evict oldest statement oldest = min(self.statements, key=lambda k: self.statements[k]['used']) await self.statements[oldest]['stmt'].close() del self.statements[oldest] # Prepare new statement stmt = await connection.prepare(query) self.statements[name] = { 'stmt': stmt, 'used': time.time() } # Execute prepared statement stmt = self.statements[name]['stmt'] self.statements[name]['used'] = time.time() return await stmt.fetch(*args) ``` ### 2. Adaptive Query Optimization ```python class AdaptiveQueryOptimizer: """Dynamically adjust query strategies based on performance.""" def __init__(self): self.query_stats = defaultdict(lambda: {'times': [], 'strategy': 'default'}) async def execute_adaptive(self, query_id: str, strategies: dict): """Execute query with adaptive strategy selection.""" stats = self.query_stats[query_id] # Select strategy based on past performance if len(stats['times']) > 10: avg_time = sum(stats['times'][-10:]) / 10 if avg_time > 500: # Slow query stats['strategy'] = 'optimized' elif avg_time < 50: # Fast query stats['strategy'] = 'simple' # Execute with selected strategy strategy = strategies[stats['strategy']] start = time.time() result = await strategy() elapsed = (time.time() - start) * 1000 # Update statistics stats['times'].append(elapsed) if len(stats['times']) > 100: stats['times'] = stats['times'][-50:] # Keep last 50 return result ``` ## Performance Monitoring Dashboard ```yaml # Grafana dashboard for performance monitoring dashboard: title: "MCP Performance Tuning" panels: - title: "Query Performance" queries: - "histogram_quantile(0.95, query_duration_seconds)" - "rate(slow_queries_total[5m])" - title: "Connection Pool" queries: - "pool_connections{state='active'} / pool_connections{state='total'}" - "pool_wait_time_seconds" - title: "Cache Hit Rates" queries: - "cache_hits / (cache_hits + cache_misses)" - "embedding_cache_size" - title: "Index Usage" queries: - "index_scans / (index_scans + seq_scans)" - "index_bloat_percent" ``` ## References - [Performance Validation Report](../performance/validation-report.md) - [Connection Pool Manager](../../src/connection_pool/manager.py) - [Database Schema](../../src/database/schema.sql) - [Constitutional Principles](../../.specify/memory/constitution.md) - [PostgreSQL Tuning Guide](https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ravenight13/codebase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

performance-tuning.md•19.2 KiB