Skip to main content
Glama

Codebase MCP Server

by Ravenight13
spec.md21.1 kB
# Feature Specification: Production-Grade Connection Management **Feature Branch**: `009-v2-connection-mgmt` **Created**: 2025-10-13 **Status**: Draft **Input**: User description: "Phase 04: Connection Management - Implement connection pooling, lifecycle management, and health monitoring for MCP server connections" ## Original User Description "Phase 04: Connection Management - Implement connection pooling, lifecycle management, and health monitoring for MCP server connections" The connection management system must handle: - Connection pool lifecycle (initialization, health checks, graceful shutdown) - Automatic reconnection with exponential backoff - Connection validation and health monitoring - Pool statistics and metrics for observability - Resource cleanup and leak prevention - Configuration validation and defaults This is a foundational infrastructure component that Phase-05 (Repository Indexing) and Phase-06 (Semantic Search) will directly depend on. ## User Scenarios & Testing *(mandatory)* ### Primary Workflow **Normal Operation Sequence**: 1. **System Administrator** configures DATABASE_URL environment variable with PostgreSQL connection string 2. **Developer** starts MCP server via command line 3. **System** initializes connection pool with min_size connections within 2 seconds 4. **System** validates all connections via lightweight health query 5. **System** transitions to "healthy" state, ready to accept MCP requests 6. **MCP Client** sends database query request (e.g., search_code, index_repository) 7. **System** acquires connection from pool (or creates new connection up to max_size) 8. **System** executes query on borrowed connection 9. **System** releases connection back to pool for reuse 10. **System Administrator** monitors pool statistics via health check endpoint 11. **System Administrator** triggers graceful shutdown when maintenance needed 12. **System** waits for active queries to complete (up to 30s timeout), then terminates --- ### User Story 1 - Server Startup with Valid Configuration (Priority: P1) As a developer, when I start the MCP server with a valid database configuration, I expect the connection pool to initialize successfully and be ready to handle requests within 2 seconds. **Why this priority**: This is the most critical functionality - without successful pool initialization, no database operations can occur. This is the foundation for all other connection management features. **Independent Test**: Can be fully tested by starting the MCP server with valid DATABASE_URL and observing initialization logs. Delivers immediate value by ensuring the server can connect to the database. **Acceptance Scenarios**: 1. **Given** DATABASE_URL=postgresql+asyncpg://localhost/codebase_mcp is set, **When** server starts with `python -m codebase_mcp.server`, **Then** server logs show "Connection pool initialized: min_size=2, max_size=10" within 2 seconds 2. **Given** connection pool is initialized, **When** health check endpoint is queried, **Then** response shows status "healthy" with connections {"total": 2, "idle": 2, "active": 0} 3. **Given** pool is initialized, **When** first database query is executed, **Then** query completes in under 100ms with no cold start delay 4. **Given** pool initialization is attempted, **When** all connections pass validation queries (SELECT 1), **Then** server becomes ready for MCP requests --- ### User Story 2 - Graceful Handling of Database Outage (Priority: P1) As a system administrator, when the database server becomes unavailable during operation, I expect the MCP server to attempt automatic reconnection without crashing, and resume operations when the database recovers. **Why this priority**: Resilience to database outages is critical for production systems. Without automatic reconnection, every temporary network issue would require manual server restart. **Independent Test**: Can be fully tested by stopping PostgreSQL during operation and observing automatic recovery behavior. Delivers value by maintaining service availability during transient failures. **Acceptance Scenarios**: 1. **Given** MCP server is running with active pool, **When** query executes successfully (search_code), **Then** results are returned within expected time 2. **Given** server is operational, **When** PostgreSQL is stopped, **Then** subsequent queries return DATABASE_ERROR without server crash 3. **Given** database is unavailable, **When** server attempts reconnection, **Then** logs show exponential backoff attempts (1s, 2s, 4s, 8s, 16s) 4. **Given** reconnection attempts are in progress, **When** PostgreSQL restarts, **Then** pool automatically recovers and logs "Connection pool recovered: 2/10 connections available" 5. **Given** pool has recovered, **When** queries are retried, **Then** operations succeed and return expected results --- ### User Story 3 - Pool Statistics and Observability (Priority: P2) As a system administrator monitoring production systems, I need to query connection pool statistics to understand database load and identify performance bottlenecks. **Why this priority**: Essential for production operations and capacity planning, but system can function without detailed statistics. Enables proactive monitoring before issues occur. **Independent Test**: Can be fully tested by querying pool statistics API during various load conditions. Delivers value by providing visibility into system performance. **Acceptance Scenarios**: 1. **Given** server just started, **When** statistics are queried, **Then** response shows total_connections=2, idle=2, active=0, total_acquisitions=0 2. **Given** 5 concurrent queries are executing, **When** statistics are queried during execution, **Then** response shows total_connections=5, active=5, idle=0, waiting_requests=0 3. **Given** queries have completed, **When** statistics are queried again, **Then** response shows total_acquisitions=5, total_releases=5, avg_acquisition_time_ms is recorded 4. **Given** pool has been idle for 60 seconds, **When** statistics are queried, **Then** pool has shrunk back to min_size=2 connections 5. **Given** pool is under load, **When** health check endpoint is called, **Then** real-time metrics are returned without stale data --- ### User Story 4 - Connection Leak Detection and Prevention (Priority: P3) As a developer, if my code accidentally holds database connections without releasing them, I expect the connection pool to detect this and log warnings to help me identify the leak. **Why this priority**: Important for debugging and preventing resource exhaustion, but not critical for basic functionality. System can operate without leak detection, though it helps prevent issues. **Independent Test**: Can be fully tested by simulating a connection held beyond timeout and verifying warning logs appear. Delivers value by helping developers identify problematic code patterns. **Acceptance Scenarios**: 1. **Given** pool configured with max_size=5 and leak_detection_timeout=30s, **When** 3 queries complete normally, **Then** pool shows idle=5, active=0 2. **Given** leak_detection_timeout=30s, **When** connection is held for 40 seconds without being released, **Then** server logs "WARNING: Potential connection leak detected. Connection held for 30s" with stack trace 3. **Given** leak warning was logged, **When** connection is eventually returned to pool, **Then** pool returns to normal state (idle=5, active=0) 4. **Given** long-running legitimate operation, **When** operation exceeds timeout, **Then** no false positives occur (timeout can be configured per-operation) --- ### Edge Cases - What happens when database server restarts during active queries? → Active queries fail with ConnectionResetError, pool detects failure, closes all connections, triggers automatic reconnection, returns to healthy state - How does system handle connection exhaustion under high load? → Requests exceeding max_size queue in FIFO order, waiting_requests counter increments, timeout error raised after 30s with pool statistics - What happens when min_size cannot be achieved during initialization? → Server startup fails after 3 retry attempts with detailed error message (fail-fast approach) - How does system handle partial database connectivity? → Pool operates in degraded mode with reduced capacity, health status shows "degraded", continues attempting to create additional connections - What happens when configuration conflicts exist (min_size > max_size)? → Configuration validation fails before pool creation, ValueError raised with actionable suggestions, server fails to start - How does system handle long-running queries (>30 seconds)? → Connection remains active during query, leak detection warning logged (if enabled), query completes successfully, connection returned to pool - What happens during graceful shutdown with active queries? → Pool status changes to shutting_down, new requests rejected, waits up to 30s for active connections, force-closes remaining connections with warnings - How does system handle connections that never return to pool? → Connection eventually recycled after max_connection_lifetime (1 hour) ### Alternative Paths **IF** database connectivity lost during operation: 1. Active queries fail with connection errors 2. System enters reconnection mode with exponential backoff (1s, 2s, 4s, 8s, 16s) 3. New requests return DATABASE_ERROR until recovery 4. **WHEN** database recovers → pool automatically reconnects 5. System resumes normal operation without restart **IF** connection acquisition exceeds max_size: 1. Requests queue in FIFO order 2. waiting_requests counter increments 3. **WHEN** connection available → first queued request acquires it 4. **WHEN** timeout (30s) exceeded → request fails with PoolTimeoutError **IF** connection held longer than leak_detection_timeout: 1. System logs warning with connection details and stack trace 2. Connection remains active (no forced termination) 3. **WHEN** connection eventually released → normal operation resumes 4. Leak detection helps identify problematic code patterns **IF** graceful shutdown initiated with active queries: 1. Pool status transitions to shutting_down 2. New connection requests rejected 3. System waits up to 30s for active connections to complete 4. **WHEN** timeout exceeded → force-close remaining connections with warnings 5. All resources released, shutdown completes --- ## Requirements *(mandatory)* ### User Types & Permissions **Note**: Connection management is infrastructure-level functionality without user-facing authentication. All users mentioned in scenarios are operators of the MCP server process, not end-users of the system. - **System Administrator**: Operator with OS-level permissions to start/stop MCP server process, access server logs, and configure environment variables. Can monitor pool statistics and trigger graceful shutdown. - **Developer**: Operator running MCP server in development environment with access to configuration files, debug output, and local database. Can modify pool configuration and observe connection behavior. - **MCP Client**: External process or application communicating via MCP protocol. Treated as authenticated by virtue of server accepting the connection. No direct access to connection pool internals. --- ### Functional Requirements - **FR-001**: System MUST initialize a connection pool on server startup with configurable minimum (min_size, default 2) and maximum (max_size, default 10) connection limits, completing within 2 seconds or logging detailed error *Traces to: User Story 1 (Server Startup with Valid Configuration)* - **FR-002**: System MUST validate all connections before use by executing lightweight validation query (SELECT 1), completing in under 5ms, and creating new connection if validation fails *Traces to: User Story 1 (Connection validation in acceptance scenarios)* - **FR-003**: System MUST attempt automatic reconnection when database connectivity is lost, using exponential backoff intervals (1s, 2s, 4s, 8s, 16s) with maximum 5 initial attempts, continuing every 16s thereafter *Traces to: User Story 2 (Graceful Handling of Database Outage)* - **FR-004**: System MUST expose real-time connection pool statistics including total connections, idle, active, waiting requests, acquisition times, and historical peak metrics queryable via internal API without database query *Traces to: User Story 3 (Pool Statistics and Observability)* - **FR-005**: System MUST implement graceful connection pool shutdown that waits up to 30 seconds for active connections to finish queries, then force-closes remaining connections with warning logs *Traces to: User Story 2 (Graceful shutdown with active queries in edge cases)* - **FR-006**: System MUST detect when connections are held longer than configurable timeout (default 30s) and log warnings with duration, stack trace of acquisition, and connection ID to help identify leaks *Traces to: User Story 4 (Connection Leak Detection and Prevention)* - **FR-007**: System MUST validate all connection pool configuration parameters before pool initialization and fail fast with detailed error messages including actionable suggestions for invalid input *Traces to: User Story 1 (Configuration validation during initialization)* - **FR-008**: System MUST integrate with server's health check endpoint to report database connectivity status (healthy/degraded/unhealthy) based on connection availability, completing in under 10ms *Traces to: User Story 3 (Health check integration for monitoring)* - **FR-009**: System MUST periodically recycle connections based on query count (max_queries, default 50,000), lifetime (max_connection_lifetime, default 3600s), and idle time (max_idle_time, default 60s) *Traces to: User Story 3 (Connection recycling for resource management)* - **FR-010**: System MUST ensure all pool operations are safe for concurrent access from multiple async tasks using async locks or atomic operations, with FIFO connection acquisition queuing *Traces to: User Story 2 (Concurrent access under load scenarios)* ### Key Entities *(include if feature involves data)* - **ConnectionPoolManager**: Top-level manager responsible for connection pool lifecycle (initialization, health monitoring, graceful shutdown), exposes pool statistics, coordinates reconnection attempts, tracks lifecycle states (initializing, healthy, degraded, unhealthy, recovering, shutting_down, terminated) - **ConnectionPool**: Connection pool abstraction that manages database connection lifecycle, acquiring and releasing connections on demand, validating connections before use, and tracking connection metrics - **PoolStatistics**: Immutable snapshot of connection pool state containing real-time metrics (total_connections, idle_connections, active_connections, waiting_requests, total_acquisitions, total_releases, avg_acquisition_time_ms) and historical peaks (peak_active_connections, peak_wait_time_ms) for monitoring and capacity planning - **PoolConfig**: Configuration structure containing pool parameters (min_size, max_size, max_queries, max_idle_time, timeout, command_timeout, max_connection_lifetime, leak_detection_timeout, enable_leak_detection) with validation rules and environment variable support - **HealthStatus**: Health check response object containing status (healthy/degraded/unhealthy), timestamp, database connectivity information, pool statistics, latency metrics, and last error details ### Assumptions - **Single database instance**: System connects to one PostgreSQL database per MCP server instance. Multi-database scenarios require multiple server instances or separate connection pools. - **PostgreSQL-compatible protocol**: Connection pool is designed for PostgreSQL wire protocol. Other databases (MySQL, SQLite, MongoDB) are not supported. - **Unix-like signal handling**: Graceful shutdown relies on SIGTERM/SIGINT signals. Windows service stop commands may require adaptation. - **Async runtime available**: Connection pool assumes async/await execution model. Synchronous blocking execution contexts are not supported. - **Database schema pre-initialized**: Connection pool assumes database and schema already exist. Schema migrations and database creation are handled separately. - **Network reliability**: Exponential backoff assumes transient network failures. Permanent network partitions may require operator intervention. - **Single server instance**: Connection pool statistics are per-server-instance. Multi-instance deployments require aggregation at load balancer or monitoring layer. ## Success Criteria *(mandatory)* ### Measurable Outcomes - **SC-001**: Pool initialization completes in under 2 seconds (p95 latency) from server startup command - **SC-002**: Connection acquisition overhead is under 10ms (p95 latency) including validation query execution - **SC-003**: Health check endpoint responds in under 10ms (p99 latency) without blocking operations - **SC-004**: Automatic reconnection succeeds within 30 seconds (p95) after database recovery from outage - **SC-005**: Graceful shutdown completes within 30 seconds (p99) with all resources properly released - **SC-006**: No connection leaks detected over 24-hour continuous operation (verified with memory profiler) - **SC-007**: Pool statistics show real-time updates with under 1ms data staleness during operation - **SC-008**: Server handles 100 concurrent database requests without deadlock or timeout errors - **SC-009**: No server crashes or MCP protocol violations occur during database outages or recovery - **SC-010**: All connection lifecycle events (acquire, release, recycle, error) are logged with clear, actionable messages - **SC-011**: Connection pool consumes under 100MB memory for max_size=10 pool under normal operation - **SC-012**: Pool initialization adds under 200ms to total server startup time - **SC-013**: Health check endpoint sustains throughput above 1000 requests/second without degradation ## Review & Acceptance Checklist ### Content Quality - [x] No implementation details (languages, frameworks, databases) - [x] Focused on user value and business needs - [x] Written for non-technical stakeholders - [x] All mandatory sections complete - [x] Terminology consistent throughout ### Requirement Completeness - [x] No [NEEDS CLARIFICATION] markers remain - [x] All requirements testable and unambiguous - [x] User types and permissions explicitly defined - [x] Edge cases and error scenarios included - [x] Success criteria measurable and specific - [x] Each requirement traces to user story - [x] Non-functional requirements captured (performance, scale, security) ### Scope Clarity - [x] Feature boundaries clearly defined - [x] Out-of-scope items explicitly listed - [x] Dependencies on other features identified - [x] Integration points specified - [x] Assumptions documented ### Constitutional Compliance - [x] Aligns with code quality standards from constitution.md - [x] Follows testing requirements (TDD-friendly with clear acceptance criteria) - [x] Respects security/compliance rules - [x] Adheres to architectural constraints (FastMCP, Python 3.11+, PostgreSQL) ## Non-Goals (Explicitly Out of Scope) ### Cloud-Based Connection Pooling This feature focuses on local PostgreSQL connection management. Cloud-specific connection poolers (AWS RDS Proxy, Azure Connection Pooler, Google Cloud SQL Proxy) are out of scope. ### Distributed Transaction Coordination Connection pool manages single-database connections. Multi-database transactions, two-phase commit, or distributed transaction coordinators (XA transactions) are not included. ### Read Replica Load Balancing Connection pool assumes single primary database endpoint. Automatic routing of read queries to read replicas or write queries to primary is deferred to future phases. ### Connection Multiplexing/Session Pooling Feature implements connection-level pooling where each client request uses a dedicated connection. Session-level multiplexing (PgBouncer-style transaction pooling) is out of scope. ### Advanced Monitoring Dashboards Feature exposes pool statistics via API and health check endpoint. Grafana dashboards, Prometheus exporters, and visualization tools are deferred to future observability features. ### Connection Encryption Management Feature assumes DATABASE_URL includes appropriate SSL/TLS settings. Certificate rotation, encryption negotiation, and SSL configuration management are handled by PostgreSQL client library, not by the pool. ### Geographic Load Balancing Connection pool connects to single database endpoint. Multi-region database deployments, geographic failover, and latency-based routing are not included. ## Clarifications *This section will be populated by the `/speckit.clarify` command to resolve any [NEEDS CLARIFICATION] markers and record design decisions.* --- *No clarification sessions recorded yet.*

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ravenight13/codebase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server