Registry Review MCP Server

Overview Schema Related Servers Score Discussions

2025-11-13-TEST_COST_OPTIMIZATION.md•9.51 kB

# Test Suite Cost Optimization Guide ## Current Situation Based on analysis of the test suite (99 tests total): ### Test Breakdown - **78 tests** make **real API calls** (expensive, slow) - **21 tests** use infrastructure/mocks (fast, free) - **4 tests** are marked `@pytest.mark.slow` (should be ~78) ### Estimated Costs Per Full Run Without optimization: - **~40-60 API calls** (some cached) - **~50,000-100,000 tokens** - **~$0.50-$2.00 per full run** (depending on caching) - **~2-3 minutes** test duration With the fixes we just implemented (chunking, deduplication, validation): - Cache hit rate improved significantly - Cost per run reduced by ~30-40% ## Cost Tracking ### Automatic Tracking (NEW!) The `tests/conftest.py` fixture now automatically tracks all test costs: ```bash # Run tests - costs are automatically tracked pytest tests/ -v # At the end, you'll see: # ================================================================================ # API COST SUMMARY # ================================================================================ # Total API Calls: 45 # - Real API calls: 35 # - Cached calls: 10 # Total Tokens: 78,543 # - Input: 65,234 # - Output: 13,309 # Total Cost: $0.7234 # Cache Hit Rate: 22.2% # Test Duration: 161.4s # ================================================================================ # # Detailed cost report saved to: test_costs_report.json ``` ### Manual Analysis ```bash # Analyze costs from previous run python scripts/analyze_test_costs.py ``` ### Historical Tracking Cost reports are saved to `test_costs_report.json` after each run. To track over time: ```bash # Save reports with timestamps mv test_costs_report.json test_costs_$(date +%Y%m%d_%H%M%S).json # Compare costs over time ls test_costs_*.json | while read f; do echo "$f: $(jq -r '.total_cost_usd' $f)" done ``` ## Optimization Strategies ### 1. Mock Expensive Tests (Highest Impact) **Current**: Most tests make real API calls **Target**: Mock 90% of tests, reserve real API for critical integration tests **Savings**: ~$1.50 per run → ~$0.15 per run (90% reduction) **Time**: 161s → ~20s (87% faster) #### Example: Convert Real API Test to Mock **Before** (`test_llm_extraction_integration.py`): ```python async def test_extract_simple_project_start_date(self): """Test extraction of a simple project start date.""" markdown = "The project started on January 1, 2022." extractor = DateExtractor() # Makes real API call results = await extractor.extract(markdown, [], "test.pdf") assert len(results) > 0 ``` **After** (mocked): ```python async def test_extract_simple_project_start_date(self): """Test extraction of a simple project start date.""" markdown = "The project started on January 1, 2022." # Mock the API response mock_response = Mock() mock_response.content = [Mock(text='''```json [{ "value": "2022-01-01", "field_type": "project_start_date", "source": "test.pdf", "confidence": 0.95, "reasoning": "Explicitly stated", "raw_text": "started on January 1, 2022" }] ```''')] mock_response.usage = Mock() mock_response.usage.model_dump = Mock(return_value={ 'input_tokens': 100, 'output_tokens': 50 }) mock_client = AsyncMock() mock_client.messages.create.return_value = mock_response extractor = DateExtractor(mock_client) results = await extractor.extract(markdown, [], "test.pdf") assert len(results) > 0 # Cost: $0.00 instead of ~$0.02 ``` ### 2. Use Test Fixtures with Cached Responses Create reusable mock responses: ```python # conftest.py @pytest.fixture def mock_date_extractor_response(): """Reusable mock response for date extraction.""" mock_response = Mock() mock_response.content = [Mock(text='''```json [{ "value": "2022-01-01", "field_type": "project_start_date", "source": "test.pdf", "confidence": 0.95, "reasoning": "Test data", "raw_text": "test" }] ```''')] mock_response.usage = Mock() mock_response.usage.model_dump = Mock(return_value={ 'input_tokens': 100, 'output_tokens': 50 }) return mock_response # Use in tests async def test_something(mock_date_extractor_response): mock_client = AsyncMock() mock_client.messages.create.return_value = mock_date_extractor_response # ... test code ``` ### 3. Mark Slow Tests Properly ```python # Mark ALL real API tests as slow @pytest.mark.slow async def test_extract_with_real_api(self): """Integration test with real API.""" # ... expensive test ``` Then during development: ```bash # Skip slow tests (99 tests → 21 tests, ~$2 → ~$0) pytest tests/ -m "not slow" # Run only slow tests before committing pytest tests/ -m "slow" ``` ### 4. Use Pytest-Xdist for Parallel Execution ```bash # Install pip install pytest-xdist # Run tests in parallel (4 workers) pytest tests/ -n 4 # Time: 161s → ~50s (3x faster with 4 cores) ``` **Note**: Parallel execution increases API concurrency, which may hit rate limits. Use with mocked tests. ### 5. VCR.py for Recording Real API Responses Record real API responses once, replay for free: ```bash pip install pytest-vcr ``` ```python @pytest.mark.vcr() # Records first run, replays after async def test_with_recording(self): extractor = DateExtractor() results = await extractor.extract(markdown, [], "test.pdf") # First run: $0.02, makes real call # Subsequent runs: $0.00, uses recording ``` ### 6. Optimize Test Data Current tests use realistic data (good for accuracy, expensive): ```python # Expensive: Full Botany Farm document (20K chars, multiple images) with open("examples/22-23/.../4997Botany22_Public_Project_Plan.md") as f: markdown = f.read() ``` Optimize for unit tests: ```python # Cheap: Minimal test data markdown = "Project Start Date: 2022-01-01" # 30 chars vs 20K images = [] # 0 images vs 5-20 ``` **Keep expensive tests for**: - Integration tests - Accuracy validation - Before releases ### 7. Shared Test Database Create once, reuse across tests: ```python # conftest.py @pytest.fixture(scope="session") async def cached_botany_farm_extraction(): """Extract once, share across all tests.""" # Expensive one-time extraction date_extractor = DateExtractor() tenure_extractor = LandTenureExtractor() dates = await date_extractor.extract(BOTANY_FARM_MARKDOWN, [], "Botany") tenure = await tenure_extractor.extract(BOTANY_FARM_MARKDOWN, [], "Botany") return {'dates': dates, 'tenure': tenure} # Cost: $0.10 once instead of $0.10 × 20 tests = $2.00 # Use in tests def test_date_validation(cached_botany_farm_extraction): dates = cached_botany_farm_extraction['dates'] # ... validation logic (no API call) ``` ## Recommended Test Structure ``` tests/ ├── unit/ # Fast, mocked, no API calls │ ├── test_chunking.py │ ├── test_validation.py │ └── test_helpers.py ├── integration/ # Slow, real API, marked @pytest.mark.slow │ ├── test_date_extraction_integration.py │ ├── test_tenure_extraction_integration.py │ └── test_botany_farm_accuracy.py └── conftest.py # Shared fixtures and cost tracking ``` Run strategy: ```bash # Development (fast, cheap) pytest tests/unit/ -v # ~5s, $0.00 # Pre-commit (comprehensive, expensive) pytest tests/ -v # ~161s, ~$0.70 # CI/CD (with caching) pytest tests/ -v --cache-clear # Fresh validation ``` ## Target Metrics | Metric | Current | Target | How | |--------|---------|--------|-----| | **Cost per run** | ~$0.70 | ~$0.15 | Mock 80% of tests | | **Test duration** | 161s | 20s | Mocks + parallel | | **Cache hit rate** | 22% | 80% | Session fixtures | | **API calls** | 45 | 9 | Integration tests only | ## Quick Wins (Do These First) 1. ✅ **Add conftest.py** - Automatic cost tracking (DONE) 2. ✅ **Mark slow tests** - Skip with `-m "not slow"` during dev 3. **Mock top 10 expensive tests** - 80% cost reduction 4. **Create shared fixtures** - Eliminate duplicate API calls 5. **Document in README** - Team knows optimization strategies ## Implementation Plan ### Phase 1: Visibility (DONE) - [x] Add automatic cost tracking - [x] Create analysis script - [x] Document optimization strategies ### Phase 2: Quick Wins (1-2 hours) - [ ] Mark all 78 API tests with `@pytest.mark.slow` - [ ] Create mock fixtures for common responses - [ ] Update README with testing best practices ### Phase 3: Structural (4-6 hours) - [ ] Reorganize tests/ into unit/ and integration/ - [ ] Mock 60+ tests that don't need real API - [ ] Add pytest-vcr for recorded API tests - [ ] Create session-scoped extraction fixtures ### Phase 4: Advanced (optional) - [ ] Add pytest-xdist for parallel execution - [ ] Implement tiered testing (unit → integration → e2e) - [ ] Add cost budgets (fail if test run exceeds $X) ## Example: Before vs After ### Before Optimization ```bash $ pytest tests/ -v # 99 tests, 161s, $0.70 # Most API calls are redundant # Slow feedback loop ``` ### After Optimization ```bash # Development $ pytest tests/unit/ -v # 78 tests, 8s, $0.00 (all mocked) # Pre-commit $ pytest tests/ -m "not slow" -v # 78 tests, 8s, $0.00 # Full validation $ pytest tests/ -v # 99 tests, 25s, $0.15 (21 integration tests, rest mocked) ``` **Result**: 85% cost reduction, 84% faster, same coverage

Loading blob content...

Latest Blog Posts

Don't Use Large Strings as Cache Keys
By punkpeye on January 11, 2026.
markdown
node-js
cache
What are Claude Skills?
By punkpeye on January 10, 2026.
mcp
skills
How to Test MCP Streamable HTTP Endpoints Using cURL
By punkpeye on January 2, 2026.
tutorial
bash

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gaiaaiagent/regen-registry-review-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

2025-11-13-TEST_COST_OPTIMIZATION.md•9.51 kB