Registry Review MCP Server

Overview Schema Related Servers Score Discussions

2025-11-20-TESTING_GUIDE.md•8.68 kB

# Testing Guide for Registry Review MCP This guide explains how to run tests efficiently to avoid expensive API calls and long runtimes. ## Quick Reference | Command | Tests Run | Runtime | Cost | Use Case | |---------|-----------|---------|------|----------| | `pytest` | 220 (fast only) | ~6s | $0.00 | ✅ Default - use this! | | `pytest -m ""` | 274 (ALL tests) | ~200s+ | $0.05+ | ⚠️ Expensive - avoid! | | `pytest -m expensive` | 32 (LLM tests) | ~2min | $0.05 | Weekly only | | `pytest -m marker -n 0` | 4 (PDF tests) | ~2min | $0.00 | Nightly only | | `pytest -m expensive --sample=0.25` | 8 (sampled) | ~30s | $0.01 | CI nightly | ## Understanding Test Markers The test suite uses markers to categorize expensive tests: - **No marker** (default): Fast tests (~220 tests, 6s, $0.00) - **`expensive`**: LLM API tests (32 tests, 2min, $0.05) - **`marker`**: PDF conversion tests (4 tests, 2min, 8GB RAM) - **`accuracy`**: Ground truth validation (4 tests, included in expensive) - **`integration`**: Full system tests ## Default Behavior (Recommended) ```bash # Just run pytest with no arguments pytest ``` **What happens:** - Runs 220 fast tests only - Takes ~6 seconds - Costs $0.00 - Skips expensive LLM and marker tests **This is controlled by `pytest.ini` line 25:** ```ini -m "not expensive and not integration and not accuracy and not marker" ``` ## Common Mistakes ### ❌ DON'T: Override marker filter with `-m ""` ```bash # This runs ALL 274 tests including expensive ones! pytest -m "" # Takes 200+ seconds # Costs $0.05+ per run # Loads 8GB marker models # Makes expensive LLM API calls ``` **If you see 274 tests collected, you're running expensive tests!** ### ❌ DON'T: Run expensive tests locally without sampling ```bash # This runs all 32 LLM tests pytest -m expensive # Takes ~2 minutes # Costs $0.05 # Should only run weekly in CI ``` ### ✅ DO: Use the default command ```bash # Fast, free, efficient pytest # Should see: "220/274 tests collected (54 deselected)" # Takes ~6 seconds # Costs $0.00 ``` ## When to Run Expensive Tests ### Local Development (You) **Fast tests (always):** ```bash pytest ``` **Specific test file:** ```bash pytest tests/test_document_processing.py -v ``` **Markdown helper tests (fast, no API):** ```bash pytest tests/test_marker_real.py::TestMarkdownHelpers -v -m "" ``` **Single expensive test (when debugging LLM issues):** ```bash # Only run one test, not the whole suite pytest tests/test_botany_farm_accuracy.py::TestBotanyFarmAccuracy::test_date_extraction_accuracy -v -m "" ``` ### CI/CD **Pull Request (every commit):** ```bash pytest # Fast tests only ``` **Nightly (daily):** ```bash # Sample 25% of expensive tests (8/32) pytest -m expensive --sample=0.25 # Run marker integration tests pytest -m marker -n 0 ``` **Weekly (before release):** ```bash # Run all expensive tests pytest -m expensive # Run accuracy validation pytest -m accuracy -n 0 -m "" ``` ## Understanding Test Output ### Good (Fast Tests) ``` collected 220/274 tests (54 deselected) in 0.09s ... 215 passed, 5 failed in 6.27s ``` - **220 tests collected** = Fast tests only ✅ - **~6 seconds** = Normal runtime ✅ - **$0.00 cost** ✅ ### Warning (All Tests) ``` collected 274 tests in 0.09s ... passed in 205.78s ``` - **274 tests collected** = ALL tests including expensive ⚠️ - **200+ seconds** = Expensive tests ran ⚠️ - **$0.05+ cost** = LLM API calls made ⚠️ **If you see this, you ran expensive tests by mistake!** ## Sampling for Cost Control The test suite includes a sampling plugin for running expensive tests efficiently. ### Automatic Sampling in CI ```bash # In CI environment (CI=1) CI=1 pytest -m expensive # Automatically samples 25% (8/32 tests) # Output shows: # 🎲 Sampling Strategy: # • Total expensive tests: 32 # • Sample rate: 25% # • Running: 8 tests # • Skipping: 24 tests ``` ### Manual Sampling ```bash # Run 25% of expensive tests pytest -m expensive --sample=0.25 # Run 50% of expensive tests pytest -m expensive --sample=0.50 # Run all expensive tests pytest -m expensive --sample=1.0 # (or just: pytest -m expensive) ``` ### Daily Rotation Sampling uses a daily seed (YYYYMMDD) so: - Same tests run throughout the day (reproducible) - Different tests run each day (good coverage over time) - Over 4 days, all expensive tests are covered ## Test Architecture ### Fast Tests (Tier 1) - 220 tests - Unit tests with mocks - Integration tests with cached data - No real API calls - No heavy model loading - **Runtime**: ~6s - **Cost**: $0.00 - **When**: Every commit ### Markdown Helpers (Tier 2) - 5 tests - Fast parsing utilities - No PDF loading - No marker models - Included in fast tests by default - **Runtime**: < 1s - **Cost**: $0.00 - **When**: Every commit ### Sampled Expensive (Tier 3) - 8 of 32 tests - Random 25% sample of LLM tests - Daily rotation - Cache-friendly - **Runtime**: ~30s - **Cost**: ~$0.01 - **When**: Nightly CI ### Full Expensive (Tier 4) - 32 tests - All LLM API tests - Date extraction, tenure, project IDs - High cache hit rate - **Runtime**: ~2min - **Cost**: ~$0.05 - **When**: Weekly CI, pre-release ### Accuracy Validation (Tier 5) - 4 tests - Ground truth validation - Uses real documents - LLM extraction enabled - **Runtime**: ~40s - **Cost**: ~$0.02 - **When**: Weekly, after major changes ### Marker Integration (Tier 6) - 4 tests - Real PDF to markdown conversion - Loads 8GB marker models - Requires significant RAM - **Runtime**: 1-2min (sequential) - **Cost**: $0.00 (CPU/RAM only) - **When**: Nightly CI, pre-release ## Debugging Test Failures ### If fast tests fail ```bash # Run with verbose output pytest -v # Run specific test pytest tests/test_document_processing.py::test_discover_documents -v # Show full tracebacks pytest --tb=long ``` ### If expensive tests are needed ```bash # Run single expensive test for debugging pytest tests/test_botany_farm_accuracy.py::test_date_extraction_accuracy -v -m "" # Check API costs after cat test_costs_report.json ``` ### If marker tests are needed ```bash # Run marker tests sequentially (they load 8GB models) pytest -m marker -n 0 -v # Run just markdown helpers (fast, no models) pytest tests/test_marker_real.py::TestMarkdownHelpers -v -m "" ``` ## Cost Tracking The test suite automatically tracks API costs. ### View Cost Report After running expensive tests: ```bash cat test_costs_report.json ``` Example output: ```json { "timestamp": "2025-11-20T15:00:00", "duration_seconds": 104.5, "total_cost_usd": 0.047, "total_api_calls": 12, "cached_calls": 8, "total_input_tokens": 45000, "total_output_tokens": 2500 } ``` ### Cost Summary (Printed After Tests) ``` ================================================================================ API COST SUMMARY ================================================================================ Total API Calls: 12 - Real API calls: 4 - Cached calls: 8 Total Tokens: 47,500 - Input: 45,000 - Output: 2,500 Total Cost: $0.0470 Cache Hit Rate: 66.7% Test Duration: 104.5s ================================================================================ ``` ## Environment Variables ### API Configuration ```bash # Required for expensive tests export ANTHROPIC_API_KEY="your-key" export LLM_EXTRACTION_ENABLED=true # Run expensive tests pytest -m expensive ``` ### CI Configuration ```bash # Enable auto-sampling export CI=1 # This will sample 25% automatically pytest -m expensive ``` ## Pre-existing Test Failures The following 5 test failures are known and pre-existing (not related to recent changes): 1. `test_initialize_workflow.py::test_new_session_has_all_workflow_stages` - Missing 'complete' workflow stage 2. `test_upload_tools.py::test_detect_existing_session_basic` - KeyError: 'existing_session_detected' 3-5. Evidence extraction/report tests - "Requirement mapping not complete" errors - Workflow stage dependencies not met These should be addressed separately and are not blockers for normal development. ## Summary: What Command Should I Run? **For local development:** ```bash pytest # Fast tests only, 6s, $0.00 ``` **If you accidentally ran expensive tests:** - You'll see "274 tests collected" - Takes 200+ seconds - Costs $0.05+ - **Solution**: Just run `pytest` (no extra flags!) **For CI/CD:** - PR checks: `pytest` (fast only) - Nightly: `pytest -m expensive --sample=0.25` (sampled) - Weekly: `pytest -m expensive` (full suite) **Remember:** The default `pytest` command is optimized for speed and cost. Only override markers when you specifically need to run expensive tests.

Loading blob content...

Latest Blog Posts

Don't Use Large Strings as Cache Keys
By punkpeye on January 11, 2026.
markdown
node-js
cache
What are Claude Skills?
By punkpeye on January 10, 2026.
mcp
skills
How to Test MCP Streamable HTTP Endpoints Using cURL
By punkpeye on January 2, 2026.
tutorial
bash

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/gaiaaiagent/regen-registry-review-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

2025-11-20-TESTING_GUIDE.md•8.68 kB