Registry Review MCP Server

Overview Schema Related Servers Score Discussions

2025-11-20-TEST_OPTIMIZATION_ROADMAP.md•17.5 kB

# Test Suite Optimization Roadmap **Status**: Implementation Plan **Created**: 2025-11-20 **Current State**: 222 tests, 34s runtime, 67% coverage **Target State**: Optimized suite, <15s runtime, 80% coverage --- ## Executive Summary Following comprehensive analysis by 10 parallel agents, this roadmap systematically addresses all identified issues in the test suite. The plan balances quick wins (50% speedup via parallelization) with strategic improvements (removing 57 redundant tests, improving coverage from 67% to 80%). **Key Metrics**: - Current runtime: 34 seconds (serial) - Projected runtime after Phase 1: 15-17 seconds (50% reduction via parallelization) - Projected runtime after Phase 2-3: 10-12 seconds (71% reduction total) - Test coverage improvement: 67% → 80% (+13 percentage points) - Test count reduction: 274 total → 217 total (removing 57 redundant) --- ## Implementation Phases ### Phase 1: Parallel Execution Infrastructure ✅ COMPLETE **Status**: ✅ Implemented 2025-11-20 **Effort**: 30 minutes **Impact**: 50% speedup (34s → 15-17s) #### Changes Made: 1. **`pytest.ini` modification** (lines 20-26): ```ini addopts = -v --tb=short --strict-markers --color=yes -m "not expensive and not integration and not accuracy and not marker" -n auto # NEW: Enable parallel execution with auto worker detection ``` 2. **`tests/conftest.py` worker isolation** (lines 271-306): - Added `worker_id` and `tmp_path_factory` parameters to `cleanup_sessions` fixture - Each xdist worker gets isolated sessions directory to prevent race conditions - Master process uses normal `data/sessions` directory - Workers use `/tmp/pytest-*/gw{N}/sessions` directories #### Verification: ```bash time python -m pytest -q --tb=no # Should complete in 15-17s with 16 workers ``` #### Success Criteria: - ✅ Tests run in parallel (16 workers on 16-core machine) - ✅ No race conditions or failures from shared state - ✅ 50% reduction in runtime (34s → 15-17s) --- ### Phase 2: Fixture Optimization (Week 1-2) **Effort**: 4-6 hours **Impact**: Additional 20-25% speedup, cost reduction #### 2.1: Convert Expensive Fixtures to Session Scope **Files**: `tests/conftest.py` (lines 135-186) **Current State**: ```python @pytest_asyncio.fixture(scope="function") # ❌ Runs per test async def botany_farm_dates(botany_farm_markdown): extractor = DateExtractor() results = await extractor.extract(...) # API call return results ``` **Target State**: ```python @pytest_asyncio.fixture(scope="session") # ✅ Runs once async def botany_farm_dates(botany_farm_markdown): extractor = DateExtractor() results = await extractor.extract(...) # API call ONCE return results ``` **Changes Required**: 1. Change scope of 3 fixtures from `function` to `session`: - `botany_farm_dates` (line 135) - `botany_farm_tenure` (line 153) - `botany_farm_project_ids` (line 171) 2. Verify tests don't mutate fixture data (read-only usage) 3. Add cache clearing after session: ```python @pytest.fixture(scope="session", autouse=True) def cleanup_expensive_fixtures(): yield # Run all tests # Clear any cached data ``` **Estimated Savings**: - 3 LLM API calls eliminated (previously called per test) - $0.02-0.05 cost savings per test run - 2-3 seconds faster #### 2.2: Optimize cleanup_sessions Fixture **File**: `tests/conftest.py` (line 271) **Current State**: - `autouse=True` means runs for ALL 274 tests - Runs 546 times per suite (before + after each test) - File I/O overhead on each run **Target State**: - Use `autouse=False` - only runs when needed - Add explicit fixture request to tests that actually need cleanup **Changes Required**: 1. Remove `autouse=True` from fixture decorator (line 271) 2. Add fixture to tests that create sessions: ```python def test_create_session(cleanup_sessions): # Explicit request ... ``` 3. Identify which tests actually need cleanup (estimated 40-50 tests) **Estimated Savings**: - Reduce cleanup runs from 546 to ~100 (80% reduction) - 1-2 seconds faster **Success Criteria**: - Tests that don't create sessions run faster - No test failures from missing cleanup - `pytest -v` output shows reduced fixture usage --- ### Phase 3: Test Redundancy Elimination (Week 2-3) **Effort**: 6-8 hours **Impact**: Cleaner suite, 5-10% faster, easier maintenance #### 3.1: Consolidate test_upload_tools.py **File**: `tests/test_upload_tools.py` (1,084 lines, 57 tests) **Analysis**: - 21% of entire test suite in one file - Many tests validate same functionality with minor variations - Pattern: "test same thing with different input format" **Refactoring Strategy**: 1. **Identify core scenarios** (10-15 tests): - Valid upload (PDF, shapefile, geojson) - Invalid file handling - Deduplication - Error cases 2. **Parameterize variations**: ```python @pytest.mark.parametrize("file_format,expected", [ ("pdf", "success"), ("shapefile", "success"), ("geojson", "success"), ("invalid", "error"), ]) def test_file_upload(file_format, expected): # Single test handles all formats ``` 3. **Remove redundant tests**: - Tests that validate same behavior with trivial differences - Tests that overlap with integration tests - Tests with weak/duplicate assertions **Estimated Reduction**: - Remove 20-25 tests from this file - Reduce from 1,084 lines to ~600 lines - 3-5 seconds faster #### 3.2: Remove Duplicate Validation Tests **Files**: - `tests/test_validation.py` (19 tests) - `tests/test_phase4_validation.py` (overlap) **Strategy**: 1. Map test coverage overlap 2. Keep higher-quality tests (better assertions, clearer intent) 3. Remove redundant validation checks 4. Consolidate into single validation test file **Estimated Reduction**: - Remove 10-15 redundant tests - 1-2 seconds faster #### 3.3: Consolidate LLM Testing **Files**: - `tests/test_llm_extraction_integration.py` - `tests/test_llm_json_validation.py` **Strategy**: 1. Identify mock-only tests (can be faster unit tests) 2. Identify integration tests (need real workflow) 3. Remove tests that just validate mock behavior 4. Keep tests that validate actual extraction logic **Estimated Reduction**: - Remove 5-10 tests - Convert some to unit tests (faster) **Success Criteria (Phase 3)**: - Test count reduced from 274 to ~220 - No reduction in actual coverage - Clearer test organization - 5-10% faster runtime --- ### Phase 4: Coverage Improvement (Week 3-4) **Effort**: 8-12 hours **Impact**: 67% → 80% coverage #### 4.1: Critical Missing Coverage **Priority 1: Unified Analysis (0% coverage)** **File**: `src/registry_review_mcp/prompts/unified_analysis.py` **Tests to Add**: 1. `test_unified_analysis_schema_validation()` - Ensure output matches expected structure 2. `test_unified_analysis_with_mock_llm()` - Mock LLM responses 3. `test_unified_analysis_error_handling()` - Invalid input handling **Lines**: +50-75 **Priority 2: Validation Framework (48% coverage)** **File**: `src/registry_review_mcp/models/validation.py` **Gaps**: - Lines 45-68: `ValidationResult` class methods - Lines 89-112: `ValidationError` handling - Lines 134-156: Edge case validation **Tests to Add**: 1. `test_validation_result_aggregation()` - Test result combining 2. `test_validation_error_recovery()` - Error handling paths 3. `test_validation_edge_cases()` - Boundary conditions **Lines**: +75-100 **Priority 3: Cost Tracking (34% coverage)** **File**: `src/registry_review_mcp/utils/cost_tracking.py` **Gaps**: - Lines 78-95: Token calculation methods - Lines 112-134: Cost aggregation - Lines 156-178: Report generation **Tests to Add**: 1. `test_token_calculation_accuracy()` - Verify token counting 2. `test_cost_aggregation_multi_session()` - Multiple sessions 3. `test_cost_report_generation()` - Report formatting **Lines**: +60-80 #### 4.2: Coverage Tracking **Add to pytest.ini**: ```ini [coverage:run] source = src/registry_review_mcp branch = True # Track branch coverage too [coverage:report] precision = 2 show_missing = True fail_under = 75 # Fail if coverage drops below 75% ``` **Add coverage enforcement**: ```bash pytest --cov=src/registry_review_mcp --cov-report=html --cov-fail-under=75 ``` **Success Criteria**: - Overall coverage: 67% → 80% - No critical modules below 50% - Branch coverage tracking enabled - CI/CD integration for coverage checks --- ### Phase 5: Anti-Pattern Fixes (Week 4-5) **Effort**: 3-4 hours **Impact**: Better error reporting, maintainability #### 5.1: Replace Bare Except Blocks **Pattern Found** (7 occurrences): ```python try: risky_operation() except: # ❌ Catches everything, including KeyboardInterrupt pass ``` **Fix**: ```python try: risky_operation() except (SpecificError1, SpecificError2) as e: # ✅ Explicit logger.warning(f"Expected error: {e}") ``` **Files to Update**: 1. `tests/test_document_processing.py` (2 occurrences) 2. `tests/test_integration_full_workflow.py` (3 occurrences) 3. `tests/test_upload_tools.py` (2 occurrences) #### 5.2: Strengthen Weak Assertions **Pattern Found** (12 occurrences): ```python assert result is not None # ❌ Weak - just checks existence ``` **Fix**: ```python assert len(result) == 5 # ✅ Specific expectation assert result["status"] == "completed" assert all(item.confidence > 0.5 for item in result) ``` **Strategy**: 1. Grep for `assert .* is not None` patterns 2. Replace with specific value checks 3. Add domain-specific validations #### 5.3: Fix Unnecessary Async Tests **Pattern Found** (13 tests): ```python @pytest.mark.asyncio async def test_sync_function(): # ❌ Not actually async result = sync_function() assert result ``` **Fix**: ```python def test_sync_function(): # ✅ Remove async result = sync_function() assert result ``` **Benefit**: Slight performance improvement (no async overhead) **Success Criteria**: - Zero bare `except:` blocks - All assertions check specific values - Async only where needed --- ### Phase 6: Test Organization (Week 5) **Effort**: 2-3 hours **Impact**: Better discoverability, maintainability #### 6.1: Split Large Test Files **Current Structure**: ``` tests/ test_upload_tools.py (1,084 lines) # Too large test_integration_full_workflow.py (450 lines) test_document_processing.py (380 lines) ``` **Target Structure**: ``` tests/ upload/ test_upload_pdf.py test_upload_gis.py test_upload_validation.py test_upload_deduplication.py integration/ test_end_to_end_workflow.py test_multi_document_workflow.py document/ test_discovery.py test_classification.py test_metadata_extraction.py ``` #### 6.2: Add Test Categories **Mark tests by purpose**: ```python @pytest.mark.smoke # <1s, critical path @pytest.mark.unit # Fast, isolated @pytest.mark.integration # Full workflow @pytest.mark.expensive # Real API calls ``` **Usage**: ```bash pytest -m smoke # Quick sanity check (30 tests, <5s) pytest -m unit # Fast unit tests (120 tests, <10s) pytest -m "not expensive" # Default run ``` #### 6.3: Document Test Patterns **Create**: `tests/README.md` Content: - How to run different test suites - Adding new tests (which file, which markers) - Fixture usage guide - Common patterns and anti-patterns **Success Criteria**: - No test file >500 lines - Clear category structure - Easy to find related tests - Documented patterns --- ### Phase 7: Performance Fine-Tuning (Week 6) **Effort**: 4-6 hours **Impact**: Final 10-20% speedup #### 7.1: Async Optimization **Fix Blocking I/O in Async Tests** (11 occurrences): **Pattern**: ```python async def test_something(): with open("file.txt") as f: # ❌ Blocking I/O in async content = f.read() ``` **Fix**: ```python async def test_something(): async with aiofiles.open("file.txt") as f: # ✅ Non-blocking content = await f.read() ``` **Files to Update**: - `tests/test_document_processing.py` (5 files) - `tests/test_evidence_extraction.py` (4 files) - `tests/test_upload_tools.py` (2 files) #### 7.2: Optimize Test Data Loading **Current Pattern** (repeated loads): ```python def test_1(): data = json.load(open("large_file.json")) # Load 1 def test_2(): data = json.load(open("large_file.json")) # Load 2 (duplicate) ``` **Optimized Pattern** (session fixture): ```python @pytest.fixture(scope="session") def large_test_data(): with open("large_file.json") as f: return json.load(f) # Load once def test_1(large_test_data): # Use cached data def test_2(large_test_data): # Use cached data ``` #### 7.3: Database Query Optimization **If using real database in tests**: - Use in-memory SQLite for unit tests - Batch database setup/teardown - Minimize transaction overhead **Success Criteria**: - No blocking I/O in async contexts - Minimal redundant file I/O - Final runtime: 10-12 seconds --- ## Alternative Strategy: Tiered Testing (Optional) If pure optimization doesn't meet <10s target, implement tiered approach: ### Tier 1: Smoke Tests (<5 seconds) **Purpose**: Instant feedback on critical paths **Count**: 20-30 tests **Coverage**: Core workflows only **Usage**: Pre-commit hook, quick validation ```bash pytest -m smoke --tb=line -q ``` ### Tier 2: Unit Tests (<15 seconds) **Purpose**: Fast, comprehensive validation **Count**: 120-150 tests **Coverage**: Individual components, mocked dependencies **Usage**: Default developer workflow ```bash pytest -m unit ``` ### Tier 3: Integration Tests (<60 seconds) **Purpose**: Full system validation **Count**: 40-50 tests **Coverage**: End-to-end workflows **Usage**: Pre-push, CI/CD ```bash pytest -m integration ``` ### Tier 4: Expensive Tests (as needed) **Purpose**: Accuracy validation with real APIs **Count**: 30-40 tests **Coverage**: Ground truth validation **Usage**: Weekly, before releases ```bash pytest -m expensive ``` --- ## Implementation Timeline | Week | Phase | Effort | Cumulative Time | |------|-------|--------|-----------------| | 1 | Phase 1: Parallel Execution ✅ | 0.5h | ✅ Complete | | 1-2 | Phase 2: Fixture Optimization | 4-6h | 4.5-6.5h | | 2-3 | Phase 3: Redundancy Elimination | 6-8h | 10.5-14.5h | | 3-4 | Phase 4: Coverage Improvement | 8-12h | 18.5-26.5h | | 4-5 | Phase 5: Anti-Pattern Fixes | 3-4h | 21.5-30.5h | | 5 | Phase 6: Test Organization | 2-3h | 23.5-33.5h | | 6 | Phase 7: Performance Tuning | 4-6h | 27.5-39.5h | **Total Estimated Effort**: 27.5-39.5 hours (3.5-5 days) --- ## Success Metrics ### Performance Targets - ✅ **Phase 1**: 34s → 15-17s (50% reduction) - **COMPLETE** - **Phase 2**: 15-17s → 12-14s (20% additional reduction) - **Phase 3**: 12-14s → 11-13s (8% additional reduction) - **Final Target**: <12s total runtime (71% reduction from baseline) ### Quality Targets - **Coverage**: 67% → 80% (+13 percentage points) - **Test Count**: 274 → ~217 (57 redundant tests removed) - **Maintainability**: No file >500 lines, clear organization - **CI/CD Ready**: Fast enough for pre-commit hooks (<5s smoke tests) ### Cost Targets - **LLM API Costs**: Reduce by 60-70% via fixture scoping - **Developer Time**: 34s → 12s = 22s saved per test run - 50 test runs/day → 18 minutes saved/day - 5 days/week → 90 minutes saved/week --- ## Rollback Plan If any phase causes test failures: 1. **Immediate Rollback**: ```bash git revert HEAD # Undo last commit pytest -q # Verify tests pass ``` 2. **Investigate**: - Check git diff to see what changed - Run failing tests in isolation - Check for race conditions (parallel execution) 3. **Incremental Fix**: - Fix issue in separate branch - Validate with full suite - Merge when stable --- ## Monitoring & Validation ### After Each Phase Run validation suite: ```bash # Full test run with timing time pytest -v --tb=short # Coverage check pytest --cov=src/registry_review_mcp --cov-report=term-missing # Verify no regressions pytest -m "expensive or integration or accuracy" --tb=short ``` ### Success Indicators ✅ All tests pass ✅ Runtime decreases or stays same ✅ Coverage increases or stays same ✅ No new warnings ✅ Parallel execution works correctly ### Failure Indicators ❌ New test failures ❌ Increased runtime ❌ Decreased coverage ❌ Race conditions in parallel mode --- ## Next Actions ### Immediate (This Week) 1. ✅ **Phase 1 Complete**: Parallel execution enabled 2. Verify parallel execution stability with full suite 3. Measure baseline metrics post-parallelization 4. Begin Phase 2: Fixture scope optimization ### Short-term (Next 2 Weeks) 1. Complete Phase 2 (fixture optimization) 2. Complete Phase 3 (redundancy elimination) 3. Start Phase 4 (coverage improvement) ### Medium-term (Next 4-6 Weeks) 1. Complete all phases 4-7 2. Document test patterns 3. Set up coverage enforcement in CI/CD 4. Validate against all success criteria --- ## Related Documents - [UX Analysis](./UX_ANALYSIS.md) - User experience improvements - [Reduction Architecture](./REDUCTION_ARCHITECTURE_DIAGRAM.md) - System simplification - [Product Advisory Transcript](./transcripts/2025-11-19-product-advisory.md) - Strategic context --- **Document Owner**: Test Infrastructure Team **Last Updated**: 2025-11-20 **Next Review**: After Phase 2 completion

Loading blob content...

Latest Blog Posts

Don't Use Large Strings as Cache Keys
By punkpeye on January 11, 2026.
markdown
node-js
cache
What are Claude Skills?
By punkpeye on January 10, 2026.
mcp
skills
How to Test MCP Streamable HTTP Endpoints Using cURL
By punkpeye on January 2, 2026.
tutorial
bash

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gaiaaiagent/regen-registry-review-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

2025-11-20-TEST_OPTIMIZATION_ROADMAP.md•17.5 kB