Codebase MCP Server

codebase-mcp
docs
mcp-split-plan
phases
phase-06-performance

specify-prompt-enhanced.md•34 KiB

# AI Agent Guide: Generating spec.md for Phase 06 - Performance Validation & Testing ## Your Mission Transform this phase-specific validation requirement into a clear, complete feature specification that follows SpecKit principles while capturing the critical validation work needed to ensure the split MCP servers meet all performance targets. ## Pre-Flight Checks (MANDATORY) Before you begin writing the spec, verify these conditions: 1. **Constitution Familiarity**: Review `.specify/memory/constitution.md` to understand project principles 2. **Template Structure**: Load `.specify/templates/spec-template.md` for the canonical structure 3. **Phase Context**: Read `docs/mcp-split-plan/phases/phase-06-performance/analysis.md` to understand validation requirements 4. **Dependency Awareness**: Confirm Phases 01-05 are complete (workflow-mcp functional, both servers integrated) 5. **Original Split Context**: Review `specs/008-mcp-server-split/spec.md` for the parent feature requirements If any check fails, STOP and request clarification before proceeding. ## Phase-Specific Context ### What This Phase Delivers Phase 06 is the **validation and hardening phase** that ensures both `workflow-mcp` and `codebase-mcp` meet performance, reliability, and production-readiness requirements after the split. This phase focuses on: 1. **Performance Validation**: Verify both servers meet original performance targets (indexing <60s, search <500ms p95) 2. **Integration Testing**: Comprehensive client-side integration tests with both servers running simultaneously 3. **Load & Stress Testing**: Validate behavior under high concurrent load (50+ clients) 4. **Observability Enhancement**: Metrics, logging, and debugging capabilities for both servers 5. **Production Readiness**: Connection pooling, error recovery, edge case handling, documentation ### Split Architecture Recap This phase validates the architecture created in Phases 01-05: ``` ┌─────────────────────────────────────────────────────────────────┐ │ MCP Client (Claude Code) │ │ Uses BOTH servers simultaneously │ └────────────────┬────────────────────────────────┬────────────────┘ │ │ ┌────────▼─────────┐ ┌────────▼─────────┐ │ workflow-mcp │ │ codebase-mcp │ │ Port: 8010 │ │ Port: 8020 │ │ SSE Transport │ │ SSE Transport │ └────────┬─────────┘ └────────┬─────────┘ │ │ ┌────────▼─────────┐ ┌────────▼─────────┐ │ Registry DB │ │ Indexed Code DB │ │ workflow_registry│ │ codebase_mcp │ │ + Per-project DB │ │ (single database)│ └──────────────────┘ └──────────────────┘ ``` ### Critical Validation Areas **Performance Targets** (from constitution): - Indexing: <60 seconds for 10,000 files - Search: <500ms p95 latency - Project switching: <50ms - Entity queries: <100ms **Integration Scenarios**: - Cross-server workflows (search code → create work item referencing found entity) - Concurrent operations (index repository while querying work items) - Multi-project isolation (workflow project switching doesn't affect codebase indexing) **Production Readiness**: - Connection pool behavior under load - Error recovery (DB disconnect, port conflicts, resource exhaustion) - Graceful degradation (one server fails, other continues) - Client reconnection handling ## Anti-Patterns to Avoid ### ❌ Implementation Leakage **DON'T**: "Add pytest fixtures in `tests/integration/test_split_performance.py`" **DO**: "FR-003: Provide performance benchmarking that validates indexing completes in <60s for 10,000 files" ### ❌ Copy-Paste from analysis.md **DON'T**: "Phase 06 focuses on validation (see analysis.md for details)" **DO**: Rewrite in spec language focusing on user needs and success criteria ### ❌ Missing Alternative Paths **DON'T**: User Scenario 1 with only happy path **DO**: Include "What if performance targets aren't met?" and "What if servers fail under load?" alternatives ### ❌ Vague Functional Requirements **DON'T**: "FR-001: Performance should be good" **DO**: "FR-001: Validate codebase-mcp indexing performance meets <60s for 10,000 files with <5% variance across 5 runs" ### ❌ Ignoring Constitutional Principles **DON'T**: Skip validation of Principle IV (Performance Guarantees) **DO**: FR-001 through FR-010 explicitly map to constitutional performance targets ## The spec.md You Will Generate Below is the COMPLETE structure with guidance for each section. Replace `[PLACEHOLDERS]` with phase-specific content. --- # Feature Specification: [FEATURE_NAME] **Feature ID**: `[FEATURE_NUM]` **Status**: Draft **Created**: [YYYY-MM-DD] **Last Updated**: [YYYY-MM-DD] **Parent Feature**: `specs/008-mcp-server-split/` **Phase**: 06 - Performance Validation & Testing --- ## Feature Metadata | Field | Value | |-------|-------| | **Feature Name** | [FEATURE_NAME] | | **Feature ID** | [FEATURE_NUM] | | **Complexity** | [Low/Medium/High] | | **Priority** | [Critical/High/Medium/Low] | | **Estimated Effort** | [Hours/Days] | | **Phase** | 06 - Performance Validation & Testing | | **Dependencies** | Phase 05 (Both servers integrated and functional) | | **Stakeholders** | Development team, End users (Claude Code, Cursor IDE users) | **Complexity Guidance**: - **Low**: Validation of existing functionality (e.g., running existing performance tests) - **Medium**: New integration test suites, load testing infrastructure - **High**: Significant observability enhancements, complex error recovery scenarios --- ## Original User Description > "Phase 06 focuses on validating that the split MCP servers (`workflow-mcp` and `codebase-mcp`) meet all performance targets, handle concurrent load, recover from errors gracefully, and are production-ready. This phase ensures the architecture delivers on its promises of <60s indexing, <500ms search latency, and robust operation under real-world conditions." ### User Intent (Inferred) **What the user really needs**: 1. **Confidence**: Proof that the split didn't degrade performance or introduce new failure modes 2. **Production Readiness**: Both servers can handle real-world load (50+ concurrent clients, large repositories) 3. **Observability**: Ability to debug issues when they occur (metrics, logs, health checks) 4. **Reliability**: Graceful error recovery, no cascading failures between servers **Why this matters**: - **Risk Mitigation**: Catch performance regressions before production deployment - **User Experience**: Ensure Claude Code and Cursor IDE users see fast, reliable responses - **Maintainability**: Observability tools reduce time-to-resolution for production incidents --- ## User Scenarios & Testing ### Scenario 1: Performance Engineer Validates Split Architecture **Primary Path**: 1. Engineer runs performance benchmark suite against both servers 2. Codebase-mcp indexes 10,000-file repository in <60s (baseline: 45s pre-split) 3. Workflow-mcp project switching completes in <50ms (baseline: 30ms pre-split) 4. Search queries return in <500ms p95 (baseline: 250ms p95 pre-split) 5. Engineer generates performance report comparing pre-split vs post-split metrics 6. ✅ All targets met, split approved for production **Alternative Path 1: Performance Regression Detected**: - Step 3 fails: Project switching now takes 150ms (3x slower) - System identifies bottleneck: Database connection pool exhaustion - Engineer adjusts pool settings, re-runs benchmark - Performance restored to <50ms - 🔧 Configuration tuning required before production **Alternative Path 2: Load Test Failure**: - Step 4 fails under 50 concurrent clients: Search latency spikes to 2000ms p95 - System logs show connection pool starvation - Engineer implements connection queuing with backpressure - Re-test shows graceful degradation: 600ms p95 (acceptable under extreme load) - ✅ Acceptable with documentation of limits **Success Criteria**: - Performance targets met within 10% variance of pre-split baselines - Regressions identified with clear root cause analysis - Tuning recommendations documented for production deployment --- ### Scenario 2: DevOps Engineer Tests Error Recovery **Primary Path**: 1. Engineer simulates PostgreSQL disconnect during active indexing 2. Codebase-mcp detects connection loss, logs error, pauses indexing 3. DB reconnects after 5 seconds (simulated network partition) 4. Codebase-mcp resumes indexing from last checkpoint 5. Workflow-mcp continues operating (unaffected by codebase-mcp DB issue) 6. ✅ Graceful recovery, no data loss, services isolated **Alternative Path 1: Cascading Failure**: - Step 5 fails: Workflow-mcp also reports errors (unexpected) - Investigation reveals shared connection pool misconfiguration - Engineer fixes isolation between server DB connections - Re-test confirms servers operate independently - 🔧 Architecture flaw fixed **Alternative Path 2: Partial Recovery**: - Step 4 fails: Indexing resumes but duplicates chunks (idempotency bug) - System detects duplicate embeddings in database - Engineer adds deduplication logic to recovery path - Re-test confirms clean recovery without duplicates - 🔧 Recovery logic hardened **Success Criteria**: - Both servers recover from transient DB failures within 10 seconds - No cascading failures between workflow-mcp and codebase-mcp - Data integrity maintained (no duplicates, no data loss) --- ### Scenario 3: End User (Claude Code) Performs Cross-Server Workflow **Primary Path**: 1. User runs `/search_code` to find authentication logic 2. Claude Code sends request to codebase-mcp, receives 10 code chunks 3. User identifies entity (AuthService class) and runs workflow command 4. Claude Code sends `create_work_item` to workflow-mcp with entity reference 5. Work item created with embedded code context from codebase-mcp results 6. User queries work items, sees inline code snippets from original search 7. ✅ Seamless cross-server integration, natural user experience **Alternative Path 1: Codebase-MCP Unavailable**: - Step 2 fails: codebase-mcp returns 503 Service Unavailable - Claude Code shows error: "Code search unavailable, workflow commands still functional" - User proceeds with workflow commands (create project, work items) - Codebase-mcp restarts, user retries search successfully - ✅ Graceful degradation, clear error messaging **Alternative Path 2: High-Latency Search**: - Step 2 completes in 1500ms (exceeds 500ms p95 target) - Claude Code shows progress indicator: "Searching large codebase..." - Results eventually return, user proceeds with workflow - System logs performance anomaly for investigation - 🔧 Acceptable UX, flagged for optimization **Alternative Path 3: Invalid Cross-Server Reference**: - Step 4 fails: Work item references non-existent entity_id from codebase-mcp - Workflow-mcp validates reference, returns error: "Entity not found in codebase index" - Claude Code prompts user to re-run search or use different entity - User corrects entity reference, work item created successfully - ✅ Data integrity preserved, helpful error messages **Success Criteria**: - Cross-server workflows complete end-to-end without manual configuration - Server failures don't block unrelated operations (isolation) - Error messages guide users to resolution - Performance meets targets under typical concurrent load (10 clients) --- ## Functional Requirements ### FR-001: Codebase-MCP Performance Validation **Description**: Validate that codebase-mcp meets indexing and search performance targets after the split. **Acceptance Criteria**: - Indexing 10,000 files completes in <60 seconds (p95) across 5 benchmark runs - Search queries return in <500ms (p95) under load of 10 concurrent clients - Performance variance <10% compared to pre-split baseline measurements - Benchmark suite generates report with latency histograms and p50/p95/p99 percentiles **Constitutional Mapping**: Principle IV (Performance Guarantees - indexing <60s, search <500ms) **Edge Cases**: - Repository with 50,000 files (test scalability beyond target) - Search queries against freshly indexed repository (cold cache) - Concurrent indexing + search operations (resource contention) --- ### FR-002: Workflow-MCP Performance Validation **Description**: Validate that workflow-mcp meets project switching and entity query performance targets. **Acceptance Criteria**: - Project switching completes in <50ms (p95) across 20 test switches - Entity queries return in <100ms (p95) with 1000 entities in database - Work item hierarchy queries (<10ms) unaffected by entity table size - Performance report compares pre-split vs post-split metrics **Constitutional Mapping**: Principle IV (Performance Guarantees - project switch <50ms, entity query <100ms) **Edge Cases**: - Project with 10,000 work items (stress test hierarchy queries) - Entity query with complex JSONB filters (test GIN index performance) - Rapid project switching (10 switches in 1 second) --- ### FR-003: Integration Test Suite for Dual-Server Workflows **Description**: Provide comprehensive integration tests that validate both servers working together in realistic workflows. **Acceptance Criteria**: - Test suite covers 10+ cross-server scenarios (search code → create work item, index repo → query deployments, etc.) - Tests run both servers in isolated environments (Docker or subprocess) - Assertions validate end-to-end behavior (not just individual server responses) - Tests detect common failure modes (port conflicts, DB connection issues, timeout errors) **Constitutional Mapping**: Principle VII (Test-Driven Development - protocol compliance validation) **Edge Cases**: - Codebase-mcp unavailable during workflow operations (test isolation) - Workflow-mcp unavailable during code search (test isolation) - Both servers restart simultaneously (test client reconnection) --- ### FR-004: Load and Stress Testing **Description**: Validate server behavior under high concurrent load and resource exhaustion scenarios. **Acceptance Criteria**: - Both servers handle 50 concurrent clients without crashing - Search latency remains <1000ms (p95) under 50-client load (acceptable degradation) - Connection pools configured to prevent resource exhaustion - Load test report documents breaking points (max clients before failure) **Constitutional Mapping**: Principle V (Production Quality - comprehensive error handling) **Edge Cases**: - 100 concurrent indexing requests (test queue/backpressure mechanisms) - Rapid connect/disconnect cycles (test connection pool churn) - Memory-intensive operations (large repository indexing + heavy queries) --- ### FR-005: Error Recovery and Resilience Testing **Description**: Validate graceful error recovery for transient failures (DB disconnect, network partition, resource exhaustion). **Acceptance Criteria**: - Both servers detect and log DB connection failures within 5 seconds - Servers retry failed operations with exponential backoff (max 3 retries) - Operations in progress pause/resume without data loss - Servers recover automatically when DB reconnects **Constitutional Mapping**: Principle V (Production Quality - comprehensive error handling) **Edge Cases**: - DB disconnect during critical section (mid-transaction) - Connection pool exhausted (all connections in use) - Disk full during indexing (codebase-mcp specific) --- ### FR-006: Observability Enhancements **Description**: Provide metrics, logging, and health checks for debugging and monitoring in production. **Acceptance Criteria**: - Health check endpoints (`/health`) return detailed status (DB connection, pool stats, uptime) - Metrics endpoint (`/metrics`) exposes Prometheus-compatible metrics (request count, latency histograms, error rates) - Structured logging (JSON format) includes request IDs, operation names, latency - Logs include performance warnings (queries >1s, connection pool >80% utilization) **Constitutional Mapping**: Principle V (Production Quality - comprehensive error handling), Principle III (Protocol Compliance - no stdout pollution) **Edge Cases**: - Metrics endpoint under high load (must not impact main operations) - Log volume during error storm (rate limiting to prevent disk exhaustion) - Health check when DB is down (should report degraded, not crash) --- ### FR-007: Connection Pool Optimization **Description**: Optimize PostgreSQL connection pool settings for both servers to prevent resource exhaustion under load. **Acceptance Criteria**: - Workflow-mcp pool: min_size=2, max_size=10 (supports registry + active project DBs) - Codebase-mcp pool: min_size=5, max_size=20 (handles concurrent search queries) - Pool metrics exposed via `/health` endpoint (size, free connections, wait time) - Documentation describes tuning guidelines for production deployment **Constitutional Mapping**: Principle V (Production Quality - resource management) **Edge Cases**: - Pool exhaustion under 50 concurrent clients (test queue behavior) - Long-running queries (indexing) holding connections (test timeout settings) - Connection leak detection (max_overflow triggers alerts) --- ### FR-008: Client Reconnection Handling **Description**: Validate that MCP clients (Claude Code, Cursor IDE) can reconnect to servers after transient failures. **Acceptance Criteria**: - Servers support SSE reconnection with `Last-Event-ID` header - In-progress operations resume after reconnect (no duplicate work) - Client libraries (Python SDK) handle reconnection automatically - Integration tests simulate disconnect/reconnect scenarios **Constitutional Mapping**: Principle III (Protocol Compliance - SSE transport) **Edge Cases**: - Reconnect during long-running operation (indexing 10,000 files) - Multiple rapid reconnects (test exponential backoff) - Server restart while client connected (clean shutdown with 503 response) --- ### FR-009: Performance Regression Detection **Description**: Establish baseline performance metrics and automated tests to detect future regressions. **Acceptance Criteria**: - Benchmark suite runs in CI/CD pipeline on every commit - CI fails if performance degrades >20% from baseline (configurable threshold) - Performance report archived with each build (historical trend analysis) - Documentation describes how to update baselines after intentional changes **Constitutional Mapping**: Principle IV (Performance Guarantees - proactive regression prevention) **Edge Cases**: - Hardware variance in CI runners (normalize metrics to baseline hardware) - Cold start vs warm cache (run benchmarks multiple times, report average) - Database state dependency (reset DB to known state before each benchmark) --- ### FR-010: Production Readiness Documentation **Description**: Provide comprehensive documentation for deploying, monitoring, and troubleshooting both servers in production. **Acceptance Criteria**: - Deployment guide covers Docker, systemd, and manual installation - Monitoring guide describes key metrics to watch (latency, error rate, connection pool) - Troubleshooting guide covers common issues (port conflicts, DB connection errors, performance degradation) - Runbook includes emergency procedures (server restart, DB failover, log analysis) **Constitutional Mapping**: Principle V (Production Quality - operational excellence) **Edge Cases**: - Multi-instance deployment (load balancing, shared DB) - Blue-green deployment (zero-downtime upgrades) - Disaster recovery (backup/restore procedures) --- ## Success Criteria ### Quantitative Metrics 1. **Performance Targets Met**: - Codebase-mcp indexing: <60s for 10,000 files (p95) - Codebase-mcp search: <500ms (p95) under 10-client load - Workflow-mcp project switch: <50ms (p95) - Workflow-mcp entity query: <100ms (p95) 2. **Load Capacity**: - Both servers handle 50 concurrent clients without crashing - Graceful degradation under load (latency <1000ms p95 at 50 clients) 3. **Reliability**: - 99.9% uptime during 24-hour stress test - Automatic recovery from transient DB failures within 10 seconds - Zero data loss or corruption during error scenarios 4. **Observability**: - 100% of operations logged with latency and outcome - Health checks respond <50ms - Metrics endpoint captures all key performance indicators ### Qualitative Criteria - Integration tests cover realistic cross-server workflows - Error messages guide users/operators to resolution - Documentation enables new team members to deploy and monitor servers - Performance regression tests run automatically in CI/CD ### Constitutional Compliance - ✅ Principle IV: Performance targets validated with <10% variance - ✅ Principle V: Error recovery and observability validated - ✅ Principle VII: Comprehensive integration test suite - ✅ Principle III: Protocol compliance (SSE, no stdout pollution) verified --- ## Key Entities ### Performance Benchmark Result **Description**: Record of a single performance benchmark run for regression detection. **Attributes**: - `benchmark_id` (UUID): Unique identifier - `server` (enum): `codebase-mcp` | `workflow-mcp` - `operation` (string): `index_repository` | `search_code` | `switch_project` | `query_entities` - `run_timestamp` (datetime): When benchmark executed - `latency_p50` (float): 50th percentile latency in milliseconds - `latency_p95` (float): 95th percentile latency in milliseconds - `latency_p99` (float): 99th percentile latency in milliseconds - `sample_size` (int): Number of operations measured - `test_parameters` (JSON): Configuration (file count, client count, etc.) - `passed` (boolean): Whether benchmark met target thresholds **Relationships**: - Belongs to test suite run (batch of benchmarks) - References baseline benchmark for comparison **Validation Rules**: - `latency_p95` must be >= `latency_p50` - `latency_p99` must be >= `latency_p95` - `sample_size` must be >= 10 for statistical validity --- ### Integration Test Case **Description**: Definition of a cross-server workflow test scenario. **Attributes**: - `test_id` (UUID): Unique identifier - `test_name` (string): Descriptive name (e.g., "Search code → Create work item") - `description` (string): Detailed scenario explanation - `servers_required` (list): `['codebase-mcp', 'workflow-mcp']` - `setup_steps` (JSON): Pre-test configuration (start servers, load test data) - `test_steps` (JSON): Sequence of API calls with expected responses - `assertions` (JSON): Validation checks (status codes, response schemas, latency) - `teardown_steps` (JSON): Cleanup actions - `last_run_status` (enum): `passed` | `failed` | `skipped` - `last_run_timestamp` (datetime): Most recent execution **Relationships**: - Part of integration test suite - May depend on other tests (setup order) **Validation Rules**: - `servers_required` must not be empty - `test_steps` must have at least 1 step - `assertions` must validate both success and error cases --- ### Load Test Result **Description**: Record of a load/stress test execution. **Attributes**: - `load_test_id` (UUID): Unique identifier - `server` (enum): `codebase-mcp` | `workflow-mcp` - `concurrent_clients` (int): Number of simulated clients - `test_duration_seconds` (int): How long test ran - `total_requests` (int): Total operations attempted - `successful_requests` (int): Requests that completed successfully - `failed_requests` (int): Requests that errored or timed out - `avg_latency_ms` (float): Mean latency across all requests - `p95_latency_ms` (float): 95th percentile latency - `max_latency_ms` (float): Worst-case latency observed - `errors` (JSON): Breakdown of error types and counts - `resource_usage` (JSON): CPU, memory, connection pool stats **Relationships**: - References load test configuration - May trigger performance regression alerts **Validation Rules**: - `successful_requests` + `failed_requests` must equal `total_requests` - `p95_latency_ms` must be <= `max_latency_ms` - `concurrent_clients` must be > 0 --- ## Edge Cases & Error Handling ### Edge Case 1: Port Conflict on Startup **Scenario**: Codebase-mcp attempts to start on port 8020, but port is already in use. **Expected Behavior**: - Server logs error: "Port 8020 already in use, cannot start" - Server exits with non-zero status code - Health check script detects startup failure **Error Message**: `ERROR: Address already in use (port 8020). Stop conflicting process or configure alternative port via CODEBASE_MCP_PORT environment variable.` **Recovery**: User stops conflicting process or changes port configuration. --- ### Edge Case 2: Database Schema Version Mismatch **Scenario**: Workflow-mcp starts with Alembic migration version 003, but DB is at version 002. **Expected Behavior**: - Server detects schema version mismatch during startup health check - Server logs error with current version (002) and expected version (003) - Server exits gracefully, prompting user to run migrations **Error Message**: `ERROR: Database schema version mismatch. Current: 002, Expected: 003. Run 'alembic upgrade head' before starting server.` **Recovery**: User runs Alembic migration, restarts server. --- ### Edge Case 3: Connection Pool Exhaustion Under Load **Scenario**: 50 concurrent clients query workflow-mcp, exhausting connection pool (max_size=10). **Expected Behavior**: - Requests 11-50 wait in queue (asyncpg pool behavior) - If wait time exceeds timeout (30s), requests fail with 503 Service Unavailable - Server logs warning: "Connection pool saturation (10/10 connections in use)" - Health check endpoint reports degraded status **Error Message** (client-side): `503 Service Unavailable: Connection pool exhausted. Retry after 5 seconds.` **Recovery**: Server processes queued requests as connections free up. User retries failed requests. --- ### Edge Case 4: Large Repository Indexing Timeout **Scenario**: Codebase-mcp attempts to index 50,000-file repository, exceeding 60s target. **Expected Behavior**: - Indexing continues beyond 60s (no hard timeout) - Server logs performance warning: "Indexing exceeded target (60s): completed in 120s" - Metrics endpoint records latency (120s) for regression detection - Client receives success response once indexing completes **Error Message** (log): `WARN: Indexing performance degraded. Target: 60s, Actual: 120s. Consider batch size tuning or hardware upgrade.` **Recovery**: Acceptable for repositories exceeding design target. Document performance characteristics in runbook. --- ### Edge Case 5: Stale Cross-Server Reference **Scenario**: Workflow-mcp work item references `entity_id=abc123` from codebase-mcp, but entity was deleted/re-indexed. **Expected Behavior**: - Workflow-mcp stores reference as opaque string (no foreign key constraint) - When user queries work item with entity context, codebase-mcp returns empty result for `abc123` - Client displays work item with placeholder: "Referenced entity not found (may have been re-indexed)" **Error Message** (client-side): `WARN: Entity abc123 referenced in work item but not found in current codebase index. Entity may have been deleted or re-indexed.` **Recovery**: User can update work item with new entity reference or ignore stale reference. --- ### Edge Case 6: Both Servers Restart Simultaneously **Scenario**: Docker Compose restarts both codebase-mcp and workflow-mcp at the same time (e.g., after configuration change). **Expected Behavior**: - Clients (Claude Code) detect SSE connection loss for both servers - Clients retry connections with exponential backoff (1s, 2s, 4s, max 30s) - Servers start up, pass health checks, begin accepting connections - Clients reconnect, resume operations seamlessly **Error Message** (client-side): `Connection lost to workflow-mcp and codebase-mcp. Retrying... (attempt 2/5)` **Recovery**: Automatic client reconnection once servers are healthy. --- ## Review & Acceptance Checklist Use this checklist to verify the specification is complete before moving to `/plan`: ### Completeness - [ ] All 10 Functional Requirements (FR-001 to FR-010) are clearly defined with acceptance criteria - [ ] All 3 User Scenarios include primary path + 2 alternative paths each - [ ] Edge Cases section covers 6+ realistic failure scenarios with recovery steps - [ ] Key Entities section defines 3 critical data structures (Benchmark Result, Test Case, Load Test Result) ### Clarity - [ ] User scenarios describe observable user actions (not implementation steps) - [ ] Functional requirements use active voice ("Validate that...", "Provide...") - [ ] Success criteria are measurable (quantitative targets: <60s, <500ms, 99.9% uptime) - [ ] Technical jargon is explained (e.g., "p95 latency = 95th percentile response time") ### Constitutional Alignment - [ ] Each FR maps to at least one constitutional principle (Principles III, IV, V, VII) - [ ] Performance targets (FR-001, FR-002) align with Principle IV (Performance Guarantees) - [ ] Error recovery (FR-005) aligns with Principle V (Production Quality) - [ ] Integration testing (FR-003) aligns with Principle VII (Test-Driven Development) ### Completeness vs. Brevity - [ ] Spec avoids implementation details (no pytest fixture names, no file paths) - [ ] Spec focuses on "what" and "why", not "how" - [ ] Spec is scannable (clear headings, tables, bullet points) - [ ] Spec is complete enough for `/plan` to generate design without ambiguity ### Phase-Specific Validation - [ ] Spec acknowledges dependencies on Phases 01-05 (both servers must be functional) - [ ] Spec references parent feature `specs/008-mcp-server-split/spec.md` for context - [ ] Spec addresses validation requirements from `docs/mcp-split-plan/phases/phase-06-performance/analysis.md` --- ## Clarifications *This section will be populated during the `/clarify` phase if any ambiguities are identified.* ### Identified Ambiguities 1. **[NEEDS CLARIFICATION]**: What is the acceptable performance degradation under 50-client load? (Current: <1000ms p95, but original target is <500ms) 2. **[NEEDS CLARIFICATION]**: Should load tests run continuously (24-hour soak test) or in bounded intervals (1-hour stress test)? 3. **[NEEDS CLARIFICATION]**: How should the benchmark suite handle CI/CD hardware variance? (Normalize to baseline runner specs or allow variance?) ### Resolved Clarifications *Populate this after running `/clarify` to document Q&A* --- ## Non-Goals Explicitly out of scope for this phase: 1. **Production Deployment**: This phase validates readiness but does not deploy servers to production environments 2. **Multi-Instance Load Balancing**: Testing focuses on single-instance performance (load balancing is future work) 3. **Advanced Monitoring (Grafana, Prometheus)**: Basic metrics exposed, but full monitoring stack setup is post-Phase 06 4. **Client Library Updates**: MCP Python SDK integration is tested as-is (no SDK modifications in this phase) 5. **Performance Optimization Beyond Targets**: If targets are met, no further optimization work (diminishing returns) 6. **Cross-Platform Testing**: Testing on macOS/Linux (primary development platforms), Windows support deferred --- ## Additional Context ### Phase Dependencies - **Phase 01-05 Complete**: Both `workflow-mcp` and `codebase-mcp` must be functional and integrated - **Original Specification**: Reference `specs/008-mcp-server-split/spec.md` for parent feature requirements - **Baseline Metrics**: Pre-split performance data from monolith server (for regression comparison) ### Related Work - **Migration Testing**: Phase 02 validated data migration; this phase validates performance post-migration - **Integration Layer**: Phase 05 client integration informs the cross-server test scenarios in FR-003 ### Future Enhancements (Post-Phase 06) 1. **Advanced Monitoring**: Integrate Prometheus + Grafana for real-time dashboards 2. **Multi-Region Deployment**: Test latency and failover for geographically distributed servers 3. **Chaos Engineering**: Introduce random failures (network partitions, process kills) to validate resilience 4. **Automated Performance Tuning**: ML-based connection pool sizing based on workload patterns ### References - Constitutional Principles: `.specify/memory/constitution.md` - Phase 06 Analysis: `docs/mcp-split-plan/phases/phase-06-performance/analysis.md` - Original Split Spec: `specs/008-mcp-server-split/spec.md` - Performance Testing Guide: `docs/operations/PERFORMANCE_TESTING.md` (to be created in Phase 06) --- ## Final Directive to AI Agent You have now generated a **complete, SpecKit-compliant feature specification** for Phase 06 - Performance Validation & Testing. **Next Steps**: 1. **Save this spec** as `specs/[FEATURE_NUM]-phase-06-performance-validation/spec.md` 2. **Run `/clarify`** to resolve the 3 identified ambiguities in the Clarifications section 3. **After clarifications**, run `/plan` to generate the implementation plan (research.md, data-model.md, contracts/, quickstart.md) **Key Reminders**: - This spec describes **what** to validate and **why** it matters (no implementation details) - User scenarios show **observable behaviors** from user/operator perspective - Functional requirements map to **constitutional principles** (Principles III, IV, V, VII) - Success criteria are **measurable** (latency targets, uptime percentages, test coverage) **Quality Gates Before `/plan`**: - [ ] All 10 FRs have clear acceptance criteria - [ ] All 3 user scenarios have 2+ alternative paths - [ ] Edge cases cover realistic failure modes with recovery steps - [ ] Success criteria include quantitative targets (<60s, <500ms, 99.9%) If any section is unclear or incomplete, **flag it in the Clarifications section** with `[NEEDS CLARIFICATION: specific question]` before proceeding to `/plan`. --- **END OF ENHANCED PROMPT**

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ravenight13/codebase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

specify-prompt-enhanced.md•34 KiB