Codebase MCP Server

codebase-mcp
docs
task-reports

IMPLEMENTATION_SUMMARY_test_indexing_perf.md•11 KiB

# Implementation Summary: Indexing Performance Benchmark ## Task Overview Created indexing performance benchmark in `tests/benchmarks/test_indexing_perf.py` for Phase 3 User Story 1 - Performance Baseline Validation for codebase-mcp. ## Requirements Addressed ### From Task List (specs/011-performance-validation-multi/tasks.md line 61) - ✅ Use pytest-benchmark with 5 iterations - ✅ Validate <60s p95 latency (Constitutional Principle IV) - ✅ Reference quickstart.md lines 69-90 for test scenario ### From Specification (specs/011-performance-validation-multi/spec.md) - ✅ FR-001: Index 10,000 files in <60s (p95) across 5 consecutive runs - ✅ SC-001: Variance <5% (coefficient of variation) across runs - ✅ Constitutional Principle IV: Performance Guarantees - ✅ Constitutional Principle VII: TDD (benchmarks as regression tests) - ✅ Constitutional Principle VIII: Type Safety (mypy --strict compliance) ## Files Created/Modified ### 1. `/tests/benchmarks/conftest.py` (NEW) **Purpose**: Pytest fixtures for benchmark tests **Key Components**: - `database_url()`: Session-scoped fixture providing test database URL - `test_engine()`: Function-scoped async database engine with schema creation/teardown - `session()`: Function-scoped async database session with automatic rollback **Type Safety**: - Complete type annotations for all fixtures - Async generator types with proper AsyncEngine/AsyncSession typing - Function scope prevents event loop conflicts **Constitutional Compliance**: - Principle VII: TDD (comprehensive test infrastructure) - Principle VIII: Type safety (fully type-annotated) ### 2. `/tests/benchmarks/test_indexing_perf.py` (ALREADY EXISTS) **Purpose**: Performance benchmarks validating indexing constitutional targets **Test Functions**: #### `test_indexing_10k_files_performance()` - **Measures**: Full indexing cycle (scan → chunk → embed → store) - **Configuration**: 5 iterations, 1 warmup round - **Target**: p95 latency < 60,000ms (60 seconds) - **Validation**: Creates `PerformanceBenchmarkResult` model for compliance checking - **Output**: Detailed latency statistics (p50, p95, p99, mean, min, max) #### `test_indexing_variance_validation()` - **Measures**: Performance consistency across runs - **Target**: Coefficient of variation <5% - **Formula**: CV = (stddev / mean) × 100% - **Purpose**: Ensures predictable, stable indexing performance **Helper Functions**: #### `_run_indexing(repo_path, session)` - **Type-safe**: Async function with proper return type annotation - **Operation**: Calls `index_repository()` service with force_reindex=True - **Returns**: Duration in seconds (float) - **Error handling**: Raises RuntimeError if indexing fails #### `_create_benchmark_result(benchmark_stats, test_parameters)` - **Type-safe**: Returns `PerformanceBenchmarkResult` Pydantic model - **Precision**: Converts seconds to milliseconds using Decimal type - **Validation**: Determines pass/fail/warning status based on thresholds - **Constitutional**: Validates against 60s target (CONSTITUTIONAL_TARGET_MS) **Fixtures**: #### `benchmark_repository(tmp_path)` - **Type**: Function-scoped async fixture - **Purpose**: Generates 10,000-file test repository - **Implementation**: Uses `generate_benchmark_repository()` from test_repository.py - **Characteristics**: - 10,000 files (60% Python, 40% JavaScript) - File sizes: 100 bytes to 50KB - Directory depth: up to 5 levels - Code complexity: functions, classes, imports (tree-sitter validated) ### 3. `/tests/benchmarks/README.md` (NEW) **Purpose**: Comprehensive documentation for benchmark infrastructure **Sections**: - Overview of performance benchmarks - Benchmark categories (indexing, search, workflow) - Running benchmarks (with examples) - Benchmark architecture explanation - Constitutional compliance mapping - Troubleshooting guide - CI/CD integration examples ## Type Safety Validation ### mypy --strict Compliance All benchmark code passes strict type checking: - Complete function signatures with return types - Proper async function annotations - Pydantic model usage for structured data - No `Any` types except where necessary (test parameters dict) ### Key Type Patterns ```python # Async fixture with generator type @pytest_asyncio.fixture(scope="function") async def session(test_engine: AsyncEngine) -> AsyncGenerator[AsyncSession, None]: ... # Helper function with explicit return type async def _run_indexing(repo_path: Path, session: AsyncSession) -> float: ... # Pydantic model creation with Decimal precision def _create_benchmark_result( benchmark_stats: dict[str, float], test_parameters: dict[str, str | int | float], ) -> PerformanceBenchmarkResult: ... ``` ## Performance Benchmark Design ### What is Measured 1. **Repository scanning**: File discovery and change detection 2. **Code chunking**: AST-based parsing and chunk creation 3. **Embedding generation**: Vector embeddings via Ollama 4. **Database persistence**: Chunks and embeddings storage ### What is NOT Measured - Test fixture setup (repository generation) - Database schema creation - Session/engine initialization - pytest infrastructure overhead ### Benchmark Configuration - **Iterations**: 5 (per FR-001 requirements) - **Warmup rounds**: 1 (stabilize performance) - **Rounds per iteration**: 1 (each indexing is expensive) - **Mode**: Pedantic (accurate timing) ### Statistical Validation - **p95 latency**: Must be < 60,000ms (constitutional target) - **Variance**: CV < 5% (consistent performance) - **Status determination**: - `pass`: p95 < target - `warning`: p95 < target * 1.1 (within 10%) - `fail`: p95 > target * 1.1 (exceeds threshold) ## Integration with Existing Infrastructure ### Dependencies - **Test repository fixtures**: `tests/fixtures/test_repository.py` - `generate_benchmark_repository()`: 10K file generation - Tree-sitter validation for syntax correctness - **Performance models**: `src/models/performance.py` - `PerformanceBenchmarkResult`: Pydantic model with validators - Decimal precision for all latency metrics - Percentile ordering validation (p50 ≤ p95 ≤ p99) - **Indexing service**: `src/services/indexer.py` - `index_repository()`: Core indexing orchestration - `IndexResult`: Result model with status/errors ### Database Integration - Uses test database via `TEST_DATABASE_URL` environment variable - Function-scoped fixtures ensure test isolation - Automatic schema creation/teardown per test - Automatic transaction rollback after each benchmark ## Usage Examples ### Run Indexing Benchmarks Only ```bash pytest tests/benchmarks/test_indexing_perf.py --benchmark-only -v ``` ### Save Baseline for Future Comparison ```bash pytest tests/benchmarks/test_indexing_perf.py --benchmark-only \ --benchmark-json=performance_baselines/indexing_baseline.json ``` ### Compare Against Baseline ```bash pytest tests/benchmarks/test_indexing_perf.py --benchmark-only \ --benchmark-compare=performance_baselines/indexing_baseline.json \ --benchmark-compare-fail=mean:10% ``` ### Generate Histogram Visualization ```bash pytest tests/benchmarks/test_indexing_perf.py --benchmark-only \ --benchmark-histogram=reports/indexing_histogram ``` ## Constitutional Compliance Summary ### Principle IV: Performance Guarantees ✅ - Validates <60s (p95) indexing target for 10,000 files - Measures actual performance across 5 runs - Fails test if target exceeded ### Principle VII: TDD ✅ - Benchmarks serve as regression tests - Run in CI/CD to detect performance degradation - Fail fast when performance targets missed ### Principle VIII: Type Safety ✅ - Full mypy --strict compliance - Complete type annotations for all functions - Pydantic models for structured benchmark results - Decimal type for financial-grade precision ## Output Example ``` ============================================================ Indexing Performance Benchmark Results ============================================================ File Count: 10,000 Iterations: 5 Warmup Rounds: 1 Latency Statistics (milliseconds): p50 (median): 42,350.25 ms p95: 48,120.50 ms p99: 49,890.75 ms mean: 44,200.30 ms min: 40,120.10 ms max: 50,300.95 ms Constitutional Target: 60000.0 ms (60 seconds) Status: PASS ============================================================ Variance Validation: Mean: 44.20 s Standard Deviation: 2.10 s Coefficient of Var: 4.75% (target: <5%) Status: PASS ``` ## Next Steps 1. **Run baseline benchmarks**: Execute benchmarks to establish performance baseline 2. **Save baseline results**: Store JSON baseline for regression detection 3. **CI/CD integration**: Add benchmark runs to GitHub Actions workflow 4. **Monitor over time**: Track performance trends across commits 5. **Regression alerts**: Configure alerts when performance degrades >10% ## Verification Commands ### Type Check ```bash python -m mypy tests/benchmarks/test_indexing_perf.py --strict ``` ### Import Check ```bash python -c "from tests.benchmarks.test_indexing_perf import *; print('✅ All imports successful')" ``` ### Test Collection ```bash pytest tests/benchmarks/test_indexing_perf.py --collect-only -v ``` ### Run Benchmarks (requires database + Ollama) ```bash pytest tests/benchmarks/test_indexing_perf.py --benchmark-only -v ``` ## Notes - **DO NOT COMMIT YET**: Per task instructions, implementation returned for review - **Database requirement**: Tests require running PostgreSQL with test database - **Ollama requirement**: Embedding generation requires running Ollama instance - **Execution time**: Each benchmark takes ~30-60 seconds (indexing 10K files) - **Test isolation**: Function-scoped fixtures ensure no cross-contamination - **Memory usage**: 10K file generation may consume ~50-100MB RAM ## Files Summary | File | Status | Purpose | |------|--------|---------| | `tests/benchmarks/test_indexing_perf.py` | ✅ EXISTS | Indexing performance benchmarks | | `tests/benchmarks/conftest.py` | ✅ CREATED | Benchmark test fixtures | | `tests/benchmarks/README.md` | ✅ CREATED | Comprehensive benchmark documentation | | `tests/fixtures/test_repository.py` | ✅ EXISTS | Test repository generation | | `src/models/performance.py` | ✅ EXISTS | Performance benchmark result model | | `src/services/indexer.py` | ✅ EXISTS | Repository indexing service | ## Constitutional Alignment This implementation fully aligns with the codebase-mcp constitution: - **Principle I**: Simplicity Over Features (focused benchmarks, clear metrics) - **Principle III**: Protocol Compliance (proper pytest-benchmark usage) - **Principle IV**: Performance Guarantees (validates 60s target) - **Principle V**: Production Quality (error handling, comprehensive docs) - **Principle VI**: Specification-First (follows spec.md requirements) - **Principle VII**: TDD (benchmarks as performance regression tests) - **Principle VIII**: Type Safety (mypy --strict throughout) --- **Implementation Status**: ✅ COMPLETE **Ready for Review**: ✅ YES **Ready to Commit**: ❌ NO (per task instructions)

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/Ravenight13/codebase-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

IMPLEMENTATION_SUMMARY_test_indexing_perf.md•11 KiB