Codebase MCP Server

codebase-mcp
specs
009-v2-connection-mgmt

spec.md•20.6 KiB

# Feature Specification: Production-Grade Connection Management **Feature Branch**: `009-v2-connection-mgmt` **Created**: 2025-10-13 **Status**: Draft **Input**: User description: "Phase 04: Connection Management - Implement connection pooling, lifecycle management, and health monitoring for MCP server connections" ## Original User Description "Phase 04: Connection Management - Implement connection pooling, lifecycle management, and health monitoring for MCP server connections" The connection management system must handle: - Connection pool lifecycle (initialization, health checks, graceful shutdown) - Automatic reconnection with exponential backoff - Connection validation and health monitoring - Pool statistics and metrics for observability - Resource cleanup and leak prevention - Configuration validation and defaults This is a foundational infrastructure component that Phase-05 (Repository Indexing) and Phase-06 (Semantic Search) will directly depend on. ## User Scenarios & Testing *(mandatory)* ### Primary Workflow **Normal Operation Sequence**: 1. **System Administrator** configures DATABASE_URL environment variable with PostgreSQL connection string 2. **Developer** starts MCP server via command line 3. **System** initializes connection pool with min_size connections within 2 seconds 4. **System** validates all connections via lightweight health query 5. **System** transitions to "healthy" state, ready to accept MCP requests 6. **MCP Client** sends database query request (e.g., search_code, index_repository) 7. **System** acquires connection from pool (or creates new connection up to max_size) 8. **System** executes query on borrowed connection 9. **System** releases connection back to pool for reuse 10. **System Administrator** monitors pool statistics via health check endpoint 11. **System Administrator** triggers graceful shutdown when maintenance needed 12. **System** waits for active queries to complete (up to 30s timeout), then terminates --- ### User Story 1 - Server Startup with Valid Configuration (Priority: P1) As a developer, when I start the MCP server with a valid database configuration, I expect the connection pool to initialize successfully and be ready to handle requests within 2 seconds. **Why this priority**: This is the most critical functionality - without successful pool initialization, no database operations can occur. This is the foundation for all other connection management features. **Independent Test**: Can be fully tested by starting the MCP server with valid DATABASE_URL and observing initialization logs. Delivers immediate value by ensuring the server can connect to the database. **Acceptance Scenarios**: 1. **Given** DATABASE_URL=postgresql+asyncpg://localhost/codebase_mcp is set, **When** server starts with `python -m codebase_mcp.server`, **Then** server logs show "Connection pool initialized: min_size=2, max_size=10" within 2 seconds 2. **Given** connection pool is initialized, **When** health check endpoint is queried, **Then** response shows status "healthy" with connections {"total": 2, "idle": 2, "active": 0} 3. **Given** pool is initialized, **When** first database query is executed, **Then** query completes in under 100ms with no cold start delay 4. **Given** pool initialization is attempted, **When** all connections pass validation queries (SELECT 1), **Then** server becomes ready for MCP requests --- ### User Story 2 - Graceful Handling of Database Outage (Priority: P1) As a system administrator, when the database server becomes unavailable during operation, I expect the MCP server to attempt automatic reconnection without crashing, and resume operations when the database recovers. **Why this priority**: Resilience to database outages is critical for production systems. Without automatic reconnection, every temporary network issue would require manual server restart. **Independent Test**: Can be fully tested by stopping PostgreSQL during operation and observing automatic recovery behavior. Delivers value by maintaining service availability during transient failures. **Acceptance Scenarios**: 1. **Given** MCP server is running with active pool, **When** query executes successfully (search_code), **Then** results are returned within expected time 2. **Given** server is operational, **When** PostgreSQL is stopped, **Then** subsequent queries return DATABASE_ERROR without server crash 3. **Given** database is unavailable, **When** server attempts reconnection, **Then** logs show exponential backoff attempts (1s, 2s, 4s, 8s, 16s) 4. **Given** reconnection attempts are in progress, **When** PostgreSQL restarts, **Then** pool automatically recovers and logs "Connection pool recovered: 2/10 connections available" 5. **Given** pool has recovered, **When** queries are retried, **Then** operations succeed and return expected results --- ### User Story 3 - Pool Statistics and Observability (Priority: P2) As a system administrator monitoring production systems, I need to query connection pool statistics to understand database load and identify performance bottlenecks. **Why this priority**: Essential for production operations and capacity planning, but system can function without detailed statistics. Enables proactive monitoring before issues occur. **Independent Test**: Can be fully tested by querying pool statistics API during various load conditions. Delivers value by providing visibility into system performance. **Acceptance Scenarios**: 1. **Given** server just started, **When** statistics are queried, **Then** response shows total_connections=2, idle=2, active=0, total_acquisitions=0 2. **Given** 5 concurrent queries are executing, **When** statistics are queried during execution, **Then** response shows total_connections=5, active=5, idle=0, waiting_requests=0 3. **Given** queries have completed, **When** statistics are queried again, **Then** response shows total_acquisitions=5, total_releases=5, avg_acquisition_time_ms is recorded 4. **Given** pool has been idle for 60 seconds, **When** statistics are queried, **Then** pool has shrunk back to min_size=2 connections 5. **Given** pool is under load, **When** health check endpoint is called, **Then** real-time metrics are returned without stale data --- ### User Story 4 - Connection Leak Detection and Prevention (Priority: P3) As a developer, if my code accidentally holds database connections without releasing them, I expect the connection pool to detect this and log warnings to help me identify the leak. **Why this priority**: Important for debugging and preventing resource exhaustion, but not critical for basic functionality. System can operate without leak detection, though it helps prevent issues. **Independent Test**: Can be fully tested by simulating a connection held beyond timeout and verifying warning logs appear. Delivers value by helping developers identify problematic code patterns. **Acceptance Scenarios**: 1. **Given** pool configured with max_size=5 and leak_detection_timeout=30s, **When** 3 queries complete normally, **Then** pool shows idle=5, active=0 2. **Given** leak_detection_timeout=30s, **When** connection is held for 40 seconds without being released, **Then** server logs "WARNING: Potential connection leak detected. Connection held for 30s" with stack trace 3. **Given** leak warning was logged, **When** connection is eventually returned to pool, **Then** pool returns to normal state (idle=5, active=0) 4. **Given** long-running legitimate operation, **When** operation exceeds timeout, **Then** no false positives occur (timeout can be configured per-operation) --- ### Edge Cases - What happens when database server restarts during active queries? → Active queries fail with ConnectionResetError, pool detects failure, closes all connections, triggers automatic reconnection, returns to healthy state - How does system handle connection exhaustion under high load? → Requests exceeding max_size queue in FIFO order, waiting_requests counter increments, timeout error raised after 30s with pool statistics - What happens when min_size cannot be achieved during initialization? → Server startup fails after 3 retry attempts with detailed error message (fail-fast approach) - How does system handle partial database connectivity? → Pool operates in degraded mode with reduced capacity, health status shows "degraded", continues attempting to create additional connections - What happens when configuration conflicts exist (min_size > max_size)? → Configuration validation fails before pool creation, ValueError raised with actionable suggestions, server fails to start - How does system handle long-running queries (>30 seconds)? → Connection remains active during query, leak detection warning logged (if enabled), query completes successfully, connection returned to pool - What happens during graceful shutdown with active queries? → Pool status changes to shutting_down, new requests rejected, waits up to 30s for active connections, force-closes remaining connections with warnings - How does system handle connections that never return to pool? → Connection eventually recycled after max_connection_lifetime (1 hour) ### Alternative Paths **IF** database connectivity lost during operation: 1. Active queries fail with connection errors 2. System enters reconnection mode with exponential backoff (1s, 2s, 4s, 8s, 16s) 3. New requests return DATABASE_ERROR until recovery 4. **WHEN** database recovers → pool automatically reconnects 5. System resumes normal operation without restart **IF** connection acquisition exceeds max_size: 1. Requests queue in FIFO order 2. waiting_requests counter increments 3. **WHEN** connection available → first queued request acquires it 4. **WHEN** timeout (30s) exceeded → request fails with PoolTimeoutError **IF** connection held longer than leak_detection_timeout: 1. System logs warning with connection details and stack trace 2. Connection remains active (no forced termination) 3. **WHEN** connection eventually released → normal operation resumes 4. Leak detection helps identify problematic code patterns **IF** graceful shutdown initiated with active queries: 1. Pool status transitions to shutting_down 2. New connection requests rejected 3. System waits up to 30s for active connections to complete 4. **WHEN** timeout exceeded → force-close remaining connections with warnings 5. All resources released, shutdown completes --- ## Requirements *(mandatory)* ### User Types & Permissions **Note**: Connection management is infrastructure-level functionality without user-facing authentication. All users mentioned in scenarios are operators of the MCP server process, not end-users of the system. - **System Administrator**: Operator with OS-level permissions to start/stop MCP server process, access server logs, and configure environment variables. Can monitor pool statistics and trigger graceful shutdown. - **Developer**: Operator running MCP server in development environment with access to configuration files, debug output, and local database. Can modify pool configuration and observe connection behavior. - **MCP Client**: External process or application communicating via MCP protocol. Treated as authenticated by virtue of server accepting the connection. No direct access to connection pool internals. --- ### Functional Requirements - **FR-001**: System MUST initialize a connection pool on server startup with configurable minimum (min_size, default 2) and maximum (max_size, default 10) connection limits, completing within 2 seconds or logging detailed error *Traces to: User Story 1 (Server Startup with Valid Configuration)* - **FR-002**: System MUST validate all connections before use by executing lightweight validation query (SELECT 1), completing in under 5ms, and creating new connection if validation fails *Traces to: User Story 1 (Connection validation in acceptance scenarios)* - **FR-003**: System MUST attempt automatic reconnection when database connectivity is lost, using exponential backoff intervals (1s, 2s, 4s, 8s, 16s) with maximum 5 initial attempts, continuing every 16s thereafter *Traces to: User Story 2 (Graceful Handling of Database Outage)* - **FR-004**: System MUST expose real-time connection pool statistics including total connections, idle, active, waiting requests, acquisition times, and historical peak metrics queryable via internal API without database query *Traces to: User Story 3 (Pool Statistics and Observability)* - **FR-005**: System MUST implement graceful connection pool shutdown that waits up to 30 seconds for active connections to finish queries, then force-closes remaining connections with warning logs *Traces to: User Story 2 (Graceful shutdown with active queries in edge cases)* - **FR-006**: System MUST detect when connections are held longer than configurable timeout (default 30s) and log warnings with duration, stack trace of acquisition, and connection ID to help identify leaks *Traces to: User Story 4 (Connection Leak Detection and Prevention)* - **FR-007**: System MUST validate all connection pool configuration parameters before pool initialization and fail fast with detailed error messages including actionable suggestions for invalid input *Traces to: User Story 1 (Configuration validation during initialization)* - **FR-008**: System MUST integrate with server's health check endpoint to report database connectivity status (healthy/degraded/unhealthy) based on connection availability, completing in under 10ms *Traces to: User Story 3 (Health check integration for monitoring)* - **FR-009**: System MUST periodically recycle connections based on query count (max_queries, default 50,000), lifetime (max_connection_lifetime, default 3600s), and idle time (max_idle_time, default 60s) *Traces to: User Story 3 (Connection recycling for resource management)* - **FR-010**: System MUST ensure all pool operations are safe for concurrent access from multiple async tasks using async locks or atomic operations, with FIFO connection acquisition queuing *Traces to: User Story 2 (Concurrent access under load scenarios)* ### Key Entities *(include if feature involves data)* - **ConnectionPoolManager**: Top-level manager responsible for connection pool lifecycle (initialization, health monitoring, graceful shutdown), exposes pool statistics, coordinates reconnection attempts, tracks lifecycle states (initializing, healthy, degraded, unhealthy, recovering, shutting_down, terminated) - **ConnectionPool**: Connection pool abstraction that manages database connection lifecycle, acquiring and releasing connections on demand, validating connections before use, and tracking connection metrics - **PoolStatistics**: Immutable snapshot of connection pool state containing real-time metrics (total_connections, idle_connections, active_connections, waiting_requests, total_acquisitions, total_releases, avg_acquisition_time_ms) and historical peaks (peak_active_connections, peak_wait_time_ms) for monitoring and capacity planning - **PoolConfig**: Configuration structure containing pool parameters (min_size, max_size, max_queries, max_idle_time, timeout, command_timeout, max_connection_lifetime, leak_detection_timeout, enable_leak_detection) with validation rules and environment variable support - **HealthStatus**: Health check response object containing status (healthy/degraded/unhealthy), timestamp, database connectivity information, pool statistics, latency metrics, and last error details ### Assumptions - **Single database instance**: System connects to one PostgreSQL database per MCP server instance. Multi-database scenarios require multiple server instances or separate connection pools. - **PostgreSQL-compatible protocol**: Connection pool is designed for PostgreSQL wire protocol. Other databases (MySQL, SQLite, MongoDB) are not supported. - **Unix-like signal handling**: Graceful shutdown relies on SIGTERM/SIGINT signals. Windows service stop commands may require adaptation. - **Async runtime available**: Connection pool assumes async/await execution model. Synchronous blocking execution contexts are not supported. - **Database schema pre-initialized**: Connection pool assumes database and schema already exist. Schema migrations and database creation are handled separately. - **Network reliability**: Exponential backoff assumes transient network failures. Permanent network partitions may require operator intervention. - **Single server instance**: Connection pool statistics are per-server-instance. Multi-instance deployments require aggregation at load balancer or monitoring layer. ## Success Criteria *(mandatory)* ### Measurable Outcomes - **SC-001**: Pool initialization completes in under 2 seconds (p95 latency) from server startup command - **SC-002**: Connection acquisition overhead is under 10ms (p95 latency) including validation query execution - **SC-003**: Health check endpoint responds in under 10ms (p99 latency) without blocking operations - **SC-004**: Automatic reconnection succeeds within 30 seconds (p95) after database recovery from outage - **SC-005**: Graceful shutdown completes within 30 seconds (p99) with all resources properly released - **SC-006**: No connection leaks detected over 24-hour continuous operation (verified with memory profiler) - **SC-007**: Pool statistics show real-time updates with under 1ms data staleness during operation - **SC-008**: Server handles 100 concurrent database requests without deadlock or timeout errors - **SC-009**: No server crashes or MCP protocol violations occur during database outages or recovery - **SC-010**: All connection lifecycle events (acquire, release, recycle, error) are logged with clear, actionable messages - **SC-011**: Connection pool consumes under 100MB memory for max_size=10 pool under normal operation - **SC-012**: Pool initialization adds under 200ms to total server startup time - **SC-013**: Health check endpoint sustains throughput above 1000 requests/second without degradation ## Review & Acceptance Checklist ### Content Quality - [x] No implementation details (languages, frameworks, databases) - [x] Focused on user value and business needs - [x] Written for non-technical stakeholders - [x] All mandatory sections complete - [x] Terminology consistent throughout ### Requirement Completeness - [x] No [NEEDS CLARIFICATION] markers remain - [x] All requirements testable and unambiguous - [x] User types and permissions explicitly defined - [x] Edge cases and error scenarios included - [x] Success criteria measurable and specific - [x] Each requirement traces to user story - [x] Non-functional requirements captured (performance, scale, security) ### Scope Clarity - [x] Feature boundaries clearly defined - [x] Out-of-scope items explicitly listed - [x] Dependencies on other features identified - [x] Integration points specified - [x] Assumptions documented ### Constitutional Compliance - [x] Aligns with code quality standards from constitution.md - [x] Follows testing requirements (TDD-friendly with clear acceptance criteria) - [x] Respects security/compliance rules - [x] Adheres to architectural constraints (FastMCP, Python 3.11+, PostgreSQL) ## Non-Goals (Explicitly Out of Scope) ### Cloud-Based Connection Pooling This feature focuses on local PostgreSQL connection management. Cloud-specific connection poolers (AWS RDS Proxy, Azure Connection Pooler, Google Cloud SQL Proxy) are out of scope. ### Distributed Transaction Coordination Connection pool manages single-database connections. Multi-database transactions, two-phase commit, or distributed transaction coordinators (XA transactions) are not included. ### Read Replica Load Balancing Connection pool assumes single primary database endpoint. Automatic routing of read queries to read replicas or write queries to primary is deferred to future phases. ### Connection Multiplexing/Session Pooling Feature implements connection-level pooling where each client request uses a dedicated connection. Session-level multiplexing (PgBouncer-style transaction pooling) is out of scope. ### Advanced Monitoring Dashboards Feature exposes pool statistics via API and health check endpoint. Grafana dashboards, Prometheus exporters, and visualization tools are deferred to future observability features. ### Connection Encryption Management Feature assumes DATABASE_URL includes appropriate SSL/TLS settings. Certificate rotation, encryption negotiation, and SSL configuration management are handled by PostgreSQL client library, not by the pool. ### Geographic Load Balancing Connection pool connects to single database endpoint. Multi-region database deployments, geographic failover, and latency-based routing are not included. ## Clarifications *This section will be populated by the `/speckit.clarify` command to resolve any [NEEDS CLARIFICATION] markers and record design decisions.* --- *No clarification sessions recorded yet.*

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ravenight13/codebase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

spec.md•20.6 KiB