Skip to main content
Glama

Codebase MCP Server

by Ravenight13
COMPARE_BASELINES_README.md8.16 kB
# Baseline Comparison Script - Hybrid Regression Detection ## Overview `compare_baselines.py` implements hybrid regression detection for performance validation as specified in feature 011 research.md lines 268-305. The script compares two baseline JSON files (pre-split and post-split) to detect performance regressions using a dual-threshold approach. ## Hybrid Regression Logic The script flags a performance regression **ONLY** if **BOTH** conditions are met: 1. **Degradation Check**: Current metric exceeds baseline by >10% 2. **Constitutional Check**: Current metric exceeds constitutional target This approach allows minor degradation within constitutional targets while preventing significant regressions that violate performance guarantees. ## Constitutional Performance Targets From `.specify/memory/constitution.md` and `spec.md`: | Operation Type | Constitutional Target (p95) | |----------------|----------------------------| | `index` | <60 seconds (60,000ms) | | `search` | <500 milliseconds | | `project_switch` | <50 milliseconds | | `entity_query` | <100 milliseconds | ## Usage ### Basic Comparison ```bash python scripts/compare_baselines.py pre-split.json post-split.json ``` ### Verbose Mode (Show Detailed Metrics) ```bash python scripts/compare_baselines.py pre-split.json post-split.json --verbose ``` ### Save JSON Report ```bash python scripts/compare_baselines.py pre-split.json post-split.json --output report.json ``` ### Combined Options ```bash python scripts/compare_baselines.py pre-split.json post-split.json --verbose --output report.json ``` ## Exit Codes - **0**: All metrics pass (no regressions detected) - **1**: One or more metrics failed regression checks ## Output Interpretation ### PASS (✓) ``` ✓ PASS: Within baseline (8.5% change) and constitutional target (430.00ms < 500.00ms) ``` Both conditions satisfied: - Metric is within 10% of baseline - Metric is below constitutional target **Action**: None required. Performance is acceptable. --- ### WARNING (⚠) - Scenario 1: Baseline Exceeded, Target Met ``` ⚠ WARNING: Exceeds baseline by 14.6% but within constitutional target (55000.00ms < 60000.00ms). Acceptable degradation. ``` Conditions: - Metric exceeds baseline by >10% - Metric is still below constitutional target **Action**: Review performance, but deployment not blocked. This is acceptable degradation per FR-018. --- ### WARNING (⚠) - Scenario 2: Target Exceeded, Baseline Met ``` ⚠ WARNING: Exceeds constitutional target (520.00ms > 500.00ms) but within baseline variance (8.2% change). Baseline may need adjustment. ``` Conditions: - Metric is within 10% of baseline - Metric exceeds constitutional target **Action**: Review baseline accuracy. Target violation without significant degradation suggests baseline was already non-compliant. --- ### FAIL (✗) - Regression Detected ``` ✗ FAIL: REGRESSION DETECTED. Exceeds baseline by 41.7% AND exceeds constitutional target (68000.00ms > 60000.00ms). This violates performance guarantees. ``` Both conditions met: - Metric exceeds baseline by >10% - Metric exceeds constitutional target **Action**: BLOCK deployment. Investigate and resolve performance regression before proceeding. ## Baseline JSON Format Baselines use `PerformanceBenchmarkResult` model from `src/models/performance.py`: ```json { "version": "1.0", "timestamp": "2025-10-13T10:00:00Z", "benchmarks": [ { "benchmark_id": "550e8400-e29b-41d4-a716-446655440000", "server_id": "codebase-mcp", "operation_type": "index", "timestamp": "2025-10-13T10:00:00Z", "latency_p50_ms": 45000.00, "latency_p95_ms": 48000.00, "latency_p99_ms": 52000.00, "latency_mean_ms": 46000.00, "latency_min_ms": 42000.00, "latency_max_ms": 55000.00, "sample_size": 5, "test_parameters": { "file_count": 10000, "repository": "test-repo" }, "pass_status": "pass", "target_threshold_ms": 60000.0 } ] } ``` ### Required Fields - `version`: Baseline format version (string) - `timestamp`: When baseline was created (ISO 8601 string) - `benchmarks`: Array of benchmark results (array) ### Benchmark Result Fields - `benchmark_id`: Unique identifier (UUID string) - `server_id`: Either "codebase-mcp" or "workflow-mcp" - `operation_type`: One of "index", "search", "project_switch", "entity_query" - `timestamp`: When benchmark was executed (ISO 8601 string) - `latency_p50_ms`: 50th percentile latency (Decimal) - `latency_p95_ms`: 95th percentile latency (Decimal) - **PRIMARY COMPARISON METRIC** - `latency_p99_ms`: 99th percentile latency (Decimal) - `latency_mean_ms`: Mean latency (Decimal) - `latency_min_ms`: Minimum latency (Decimal) - `latency_max_ms`: Maximum latency (Decimal) - `sample_size`: Number of iterations (integer) - `test_parameters`: Test-specific parameters (object) - `pass_status`: "pass", "fail", or "warning" - `target_threshold_ms`: Constitutional target for this operation (Decimal, optional) ## Example Scenarios ### Scenario 1: All Metrics Pass **Pre-split**: Indexing p95 = 48,000ms **Post-split**: Indexing p95 = 50,000ms **Target**: 60,000ms **Result**: PASS (4.2% degradation, within target) ### Scenario 2: Acceptable Degradation **Pre-split**: Search p95 = 380ms **Post-split**: Search p95 = 430ms **Target**: 500ms **Result**: WARNING (13.2% degradation, but within target) **Exit Code**: 0 (Success) ### Scenario 3: Regression Detected **Pre-split**: Indexing p95 = 48,000ms **Post-split**: Indexing p95 = 68,000ms **Target**: 60,000ms **Result**: FAIL (41.7% degradation AND exceeds target) **Exit Code**: 1 (Failure) ## Integration with CI/CD ### Example GitHub Actions Workflow ```yaml - name: Compare Performance Baselines run: | python scripts/compare_baselines.py \ docs/performance/baseline-pre-split.json \ docs/performance/baseline-post-split.json \ --output performance-report.json - name: Upload Performance Report if: always() uses: actions/upload-artifact@v3 with: name: performance-report path: performance-report.json - name: Fail on Regression if: failure() run: | echo "Performance regression detected!" cat performance-report.json exit 1 ``` ## Type Safety The script is fully type-safe and passes `mypy --strict` validation: ```bash mypy --strict scripts/compare_baselines.py # Success: no issues found in 1 source file ``` ## Constitutional Compliance This script implements: - **Principle IV**: Performance Guarantees (constitutional targets enforcement) - **Principle VIII**: Pydantic-Based Type Safety (mypy --strict compliance) - **Principle V**: Production Quality (comprehensive error handling) - **FR-018**: Hybrid approach for regression detection (spec.md) ## Troubleshooting ### Error: File not found ``` Error: Baseline file not found: pre-split.json ``` **Solution**: Verify file paths are correct and files exist. ### Error: Invalid JSON ``` Error: Invalid JSON: Expecting value: line 1 column 1 (char 0) ``` **Solution**: Validate JSON syntax. Use `jq` to check: ```bash jq . baseline.json ``` ### Error: Invalid baseline format ``` Error: Invalid baseline format: 1 validation error for BaselineFile ``` **Solution**: Verify baseline matches expected schema. Check: - All required fields present - Field types match specification - `operation_type` is one of valid values ## Related Files - `/Users/cliffclarke/Claude_Code/codebase-mcp/src/models/performance.py` - Performance model definitions - `/Users/cliffclarke/Claude_Code/codebase-mcp/specs/011-performance-validation-multi/research.md` - Hybrid regression logic research (lines 268-305) - `/Users/cliffclarke/Claude_Code/codebase-mcp/specs/011-performance-validation-multi/spec.md` - Feature specification (FR-018) ## Testing Run the script with sample baselines to verify functionality: ```bash # Create test baselines python scripts/generate_test_baselines.py # Run comparison python scripts/compare_baselines.py \ /tmp/test_pre_split_baseline.json \ /tmp/test_post_split_baseline.json \ --verbose ```

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ravenight13/codebase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server