Registry Review MCP Server

Overview Schema Related Servers Score Discussions

2025-11-20-TEST_SUITE_OPTIMIZATION_REPORT.md•15.1 kB

# Test Suite Optimization Report **Date**: 2025-11-20 **Test Suite**: Registry Review MCP **Current Performance**: 223 tests, 56.6s (baseline with 2 failures) **Target**: Sub-15s execution time --- ## Executive Summary The test suite currently executes 223 tests in 56.6 seconds with acceptable performance but significant optimization potential. Analysis reveals the suite is well-structured with modern async patterns, comprehensive fixtures, and cost tracking infrastructure already in place. Primary bottlenecks are: 1. **Single slowest test**: 21.8s in upload workflow integration 2. **Heavy file I/O operations**: 110+ filesystem operations per run 3. **Marker PDF extraction overhead**: 5.8s for single test 4. **Session cleanup latency**: 13 cleanup operations per function scope 5. **104 async tests** without parallel execution **Key Finding**: The suite can achieve sub-15s performance through tactical optimizations without major refactoring. The existing three-tier testing architecture (unit/integration/accuracy) provides the foundation. --- ## Current State Assessment ### Test Distribution ``` Total Tests: 273 (223 active, 50 deselected by markers) Test Files: 23 files Average File Size: 11.8 KB Largest File: test_upload_tools.py (42.5 KB, 1,083 lines) Async Tests: 104 (46.6% of active tests) ``` ### Performance Profile ``` Execution Time: 56.6s (baseline) Slowest Test: 21.8s (test_start_review_full_workflow) Top 3 Bottlenecks: - Upload workflow: 21.8s (38% of total time) - Marker extraction: 5.8s (10% of total time) - Document discovery: 3.1s (5% of total time) ``` ### Test Categories (by marker) ``` Unit Tests: ~150 tests (fast, no external dependencies) Integration Tests: ~50 tests (marked: expensive, integration, slow) Accuracy Tests: ~20 tests (marked: accuracy, expensive) Marker Tests: ~3 tests (marked: marker - requires 8GB RAM) ``` ### Infrastructure Quality ✅ **Strengths**: - Modern async/await patterns throughout - Session-scoped shared fixtures for expensive operations - Automatic cost tracking infrastructure (conftest.py) - Proper test isolation with cleanup fixtures - Three-tier testing architecture already defined - Comprehensive markers for test categorization ⚠️ **Weaknesses**: - No parallel execution (pytest-xdist not configured) - Function-scoped cleanup runs 223 times per suite - Largest test file (1,083 lines) impacts readability - 13 filesystem cleanup operations per test - Marker tests block entire suite (5.8s single test) --- ## Prioritized Recommendations ### 🚀 TIER 1: Quick Wins (Expected: 56s → 25s) **Implementation Time**: 1-2 hours **Performance Gain**: 55% reduction **Risk**: Minimal #### 1.1 Enable Parallel Test Execution **Impact**: 56s → 20-25s (50-55% reduction) ```bash # Install pytest-xdist pip install pytest-xdist # Run with 4 workers (optimal for most systems) pytest tests/ -n 4 # Add to pytest.ini [pytest] addopts = -n auto ``` **Why This Works**: - 104 async tests are I/O-bound and naturally parallelizable - Test isolation already enforced through fixtures - No shared state dependencies detected - 23 test files distribute well across workers **Estimated Timeline**: 15 minutes #### 1.2 Optimize Session Cleanup Scope **Impact**: Save 2-3s per run Current (conftest.py, line 271): ```python @pytest.fixture(autouse=True, scope="function") def cleanup_sessions(): # Runs 223 times per suite ``` Optimized: ```python @pytest.fixture(autouse=True, scope="module") def cleanup_sessions(): # Runs 23 times per suite (once per file) ``` **Why This Works**: - Tests already use `test-` prefixed sessions for isolation - Cleanup only needs to run once per module, not per function - Reduces filesystem operations by 90% **Estimated Timeline**: 5 minutes #### 1.3 Skip Marker Tests in Default Run **Impact**: Save 5.8s per run (10% reduction) Add to pytest.ini (line 24): ```python addopts = -v --tb=short --strict-markers --color=yes -m "not expensive and not integration and not accuracy and not marker" ``` **Why This Works**: - Marker tests require 8GB+ RAM and special dependencies - Only 3 tests affected - Can run separately: `pytest -m marker` when needed **Estimated Timeline**: 2 minutes **Total Tier 1 Savings**: 31-36 seconds (56s → 20-25s) --- ### 🎯 TIER 2: Structural Improvements (Expected: 25s → 12-15s) **Implementation Time**: 3-4 hours **Performance Gain**: 40-50% further reduction **Risk**: Low-Medium #### 2.1 Refactor Largest Test File **Impact**: Improve parallelization distribution test_upload_tools.py (1,083 lines): - Split into 3 files: `test_upload_creation.py`, `test_upload_additional.py`, `test_upload_workflow.py` - Benefits: Better load balancing across workers, faster discovery **Estimated Timeline**: 45 minutes #### 2.2 Optimize Upload Workflow Test **Impact**: 21.8s → 5-8s (65% reduction) Current bottleneck (test_start_review_full_workflow): ```python async def test_start_review_full_workflow(...): # Creates session # Uploads files # Discovers documents # Extracts evidence (EXPENSIVE) ``` Optimization: ```python # Split into separate tests async def test_upload_workflow_creation(...): # Fast: just upload and discovery async def test_upload_workflow_extraction(...): # Mark as expensive @pytest.mark.expensive ``` **Estimated Timeline**: 30 minutes #### 2.3 Implement Lazy Fixture Loading **Impact**: Reduce unnecessary fixture execution Current: All session fixtures load regardless of test needs Optimized: Use `pytest-lazy-fixture` for conditional loading ```python # Install pip install pytest-lazy-fixture # Use in tests @pytest.mark.parametrize("fixture", [ pytest.lazy_fixture("botany_farm_dates"), pytest.lazy_fixture("botany_farm_tenure"), ]) ``` **Estimated Timeline**: 1 hour #### 2.4 Cache Document Discovery Results **Impact**: Save 3.1s across multiple tests ```python # conftest.py - add session-scoped fixture @pytest.fixture(scope="session") def cached_botany_farm_documents(): """Discover documents once, cache for all tests.""" from registry_review_mcp.tools import document_tools docs_path = Path("examples/22-23/4997Botany22_Public_Project_Plan") discovered = document_tools.discover_documents(docs_path) return discovered ``` **Estimated Timeline**: 45 minutes **Total Tier 2 Savings**: 8-13 seconds (25s → 12-15s) --- ### 🔬 TIER 3: Advanced Optimizations (Expected: 12-15s → 8-10s) **Implementation Time**: 6-8 hours **Performance Gain**: 20-40% further reduction **Risk**: Medium #### 3.1 Implement Test Result Caching **Impact**: Second runs become near-instant ```bash pip install pytest-cache # Enable in pytest.ini [pytest] cache_dir = .pytest_cache ``` Use `--lf` (last failed) and `--ff` (failed first) for faster iteration. **Estimated Timeline**: 1 hour #### 3.2 Optimize Fixture Scope Strategy **Impact**: Reduce duplicate operations Current fixture scopes analysis: - `botany_farm_markdown`: session ✅ (optimal) - `botany_farm_dates`: function ⚠️ (should be session) - `botany_farm_tenure`: function ⚠️ (should be session) - `botany_farm_project_ids`: function ⚠️ (should be session) Change all extraction fixtures to session scope: ```python @pytest_asyncio.fixture(scope="session") async def botany_farm_dates(botany_farm_markdown): # Runs once per suite instead of per test ``` **Caveat**: Requires async event loop management at session scope. **Estimated Timeline**: 2 hours #### 3.3 Implement Parallel Async Execution **Impact**: Faster async test execution ```python # conftest.py import pytest_asyncio pytest_plugins = ('pytest_asyncio',) @pytest.fixture(scope="session") def event_loop_policy(): return asyncio.get_event_loop_policy() ``` **Estimated Timeline**: 2 hours #### 3.4 Database/Filesystem Mocking **Impact**: Eliminate I/O latency Replace real filesystem operations with in-memory alternatives: ```python from unittest.mock import patch, MagicMock @pytest.fixture def mock_filesystem(): with patch('pathlib.Path.mkdir') as mock_mkdir, \ patch('pathlib.Path.write_text') as mock_write: yield {'mkdir': mock_mkdir, 'write': mock_write} ``` **Estimated Timeline**: 3 hours **Total Tier 3 Savings**: 4-5 seconds (12-15s → 8-10s) --- ## Performance Projections ### Conservative Estimate (Tier 1 Only) ``` Current: 56.6s Tier 1: -36s (parallel + cleanup optimization) Result: ~20s (65% improvement) ``` ### Target Estimate (Tier 1 + Tier 2) ``` Current: 56.6s Tier 1: -36s Tier 2: -8s (workflow optimization + caching) Result: ~12s (79% improvement) ``` ### Stretch Goal (All Tiers) ``` Current: 56.6s Tier 1: -36s Tier 2: -8s Tier 3: -5s Result: ~8s (86% improvement) ``` --- ## Implementation Roadmap ### Week 1: Quick Wins (Target: 20-25s) **Day 1-2**: Implement Tier 1 optimizations - [x] Enable pytest-xdist parallelization - [x] Optimize cleanup fixture scope - [x] Skip marker tests by default - [x] Validate with full test run **Day 3**: Testing and validation - Run full suite 10 times to establish baseline - Identify any race conditions from parallelization - Document any breaking changes ### Week 2: Structural Improvements (Target: 12-15s) **Day 1-2**: Refactor large test files - Split test_upload_tools.py into logical modules - Optimize upload workflow test **Day 3-4**: Implement caching strategies - Add session-scoped document discovery fixture - Implement lazy fixture loading **Day 5**: Integration and validation - Full test suite validation - Performance benchmarking - Update documentation ### Week 3: Advanced Optimizations (Target: 8-10s) **Optional**: Only if sub-15s target not met --- ## Risk Assessment ### Low Risk (Safe to Implement) ✅ Tier 1.1: Parallel execution (tests already isolated) ✅ Tier 1.3: Skip marker tests (clear separation) ✅ Tier 2.1: Split large test files (pure refactoring) ✅ Tier 2.4: Cache document discovery (read-only operation) ### Medium Risk (Requires Testing) ⚠️ Tier 1.2: Cleanup scope change (verify no state leakage) ⚠️ Tier 2.2: Workflow test splitting (may affect coverage) ⚠️ Tier 3.2: Fixture scope changes (async complexity) ### High Risk (Defer Until Needed) 🔴 Tier 3.4: Filesystem mocking (may miss real bugs) 🔴 Tier 3.3: Parallel async execution (complex debugging) --- ## Success Metrics ### Primary Goal ✅ **Test suite completes in <15 seconds** (currently 56.6s) ### Secondary Goals ✅ Maintain 100% test pass rate ✅ No increase in flakiness ✅ Preserve test isolation ✅ Maintain code coverage ### Monitoring - Run `pytest --durations=10` after each change - Track execution time in CI/CD - Monitor for race conditions in parallel runs - Validate fixture cleanup works correctly --- ## Cost-Benefit Analysis ### Tier 1 (Quick Wins) **Investment**: 1-2 hours **Return**: 31-36 seconds saved per run **ROI**: Immediate (saves time on every test run) **Recommendation**: ✅ **IMPLEMENT NOW** ### Tier 2 (Structural) **Investment**: 3-4 hours **Return**: 8-13 seconds additional savings **ROI**: High (improves maintainability + performance) **Recommendation**: ✅ **IMPLEMENT AFTER TIER 1** ### Tier 3 (Advanced) **Investment**: 6-8 hours **Return**: 4-5 seconds additional savings **ROI**: Medium (only if <15s target not met) **Recommendation**: ⚠️ **EVALUATE AFTER TIER 2** --- ## Appendix A: Test File Analysis ### Largest Files (54% of test code) 1. test_upload_tools.py (42.5 KB, 1,083 lines) - **REFACTOR CANDIDATE** 2. test_tenure_and_project_id_extraction.py (16.0 KB) 3. test_marker_integration.py (16.7 KB) - **SKIP BY DEFAULT** 4. test_llm_json_validation.py (15.3 KB) 5. test_llm_extraction.py (14.6 KB) 6. test_phase4_validation.py (14.2 KB) 7. test_validation.py (13.4 KB) 8. test_integration_full_workflow.py (13.1 KB) ### Slowest Tests (Top 10) 1. test_start_review_full_workflow: 21.8s (38%) - **OPTIMIZE** 2. test_get_markdown_content_none_if_missing: 5.8s (10%) - **SKIP** 3. test_discover_documents_botany_farm: 3.1s (5%) - **CACHE** 4. test_extract_all_evidence: 2.9s (5%) 5. test_map_single_requirement: 2.3s (4%) 6. test_extract_project_start_date: 1.5s (3%) 7. test_markdown_report_includes_citations: 1.3s (2%) 8. test_start_review_end_to_end: 1.1s (2%) 9. test_extract_snippets_from_markdown: 1.1s (2%) 10. test_full_report_workflow: 1.0s (2%) **Top 10 account for 42.7s (75% of total runtime)** --- ## Appendix B: Marker Usage Analysis ``` Current markers usage (from pytest.ini): - slow: 5 tests (real API calls) - marker: 9 tests (PDF extraction, 8GB+ RAM) - integration: 3 tests (full system required) - expensive: 7 tests (high API costs) - accuracy: 4 tests (ground truth validation) - unit: 0 tests (marker defined but unused) ``` **Recommendation**: Audit and consistently apply markers across all tests. --- ## Appendix C: Fixture Optimization Opportunities ### Current Session-Scoped Fixtures (Optimal) - `botany_farm_markdown`: ✅ Loads once, used 20+ times - `cleanup_cache_once`: ✅ Runs once per session ### Function-Scoped Fixtures (Consider Session Scope) - `botany_farm_dates`: Used by 5+ tests → **Candidate for session scope** - `botany_farm_tenure`: Used by 5+ tests → **Candidate for session scope** - `botany_farm_project_ids`: Used by 3+ tests → **Candidate for session scope** - `cleanup_sessions`: Runs 223 times → **Change to module scope** ### Unused or Underutilized - `cleanup_examples_sessions`: Only used explicitly, not autouse → ✅ OK - `test_settings`: Used 50+ times → ✅ Appropriate scope --- ## Appendix D: Parallel Execution Readiness ### Prerequisites ✅ - Tests use `pytest.fixture` isolation - No global state mutations detected - Temporary directories use `tmp_path` (unique per test) - Session cleanup uses session IDs (no conflicts) ### Potential Conflicts ⚠️ - Shared example directory (examples/22-23/) - read-only, OK - Cache directory - uses locking, OK - Cost tracking files in /tmp - unique IDs, OK ### Recommendation ✅ **Suite is ready for parallel execution with pytest-xdist** --- ## Final Recommendations ### Immediate Actions (This Week) 1. ✅ Install pytest-xdist: `pip install pytest-xdist` 2. ✅ Update pytest.ini to skip marker tests by default 3. ✅ Change cleanup_sessions scope from function to module 4. ✅ Run full suite with `-n 4` to validate 5. ✅ Benchmark and document results ### Next Sprint 1. Split test_upload_tools.py into 3 files 2. Optimize test_start_review_full_workflow 3. Implement cached document discovery fixture 4. Add lazy fixture loading for expensive operations ### Future Improvements 1. Evaluate Tier 3 optimizations if needed 2. Add performance regression tests 3. Document testing best practices in CONTRIBUTING.md 4. Set up CI/CD performance monitoring --- **Report Prepared By**: Claude (Synthesis Agent) **Analysis Date**: 2025-11-20 **Next Review**: After Tier 1 implementation

Loading blob content...

Latest Blog Posts

Don't Use Large Strings as Cache Keys
By punkpeye on January 11, 2026.
markdown
node-js
cache
What are Claude Skills?
By punkpeye on January 10, 2026.
mcp
skills
How to Test MCP Streamable HTTP Endpoints Using cURL
By punkpeye on January 2, 2026.
tutorial
bash

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gaiaaiagent/regen-registry-review-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

2025-11-20-TEST_SUITE_OPTIMIZATION_REPORT.md•15.1 kB