Skip to main content
Glama

MaverickMCP

by wshobson
MIT License
165
  • Apple
exa_research_testing_strategy.md13.9 kB
# ExaSearch Research Integration Testing Strategy This document outlines the comprehensive testing strategy for validating the ExaSearch integration with the MaverickMCP research agent architecture. ## Overview The testing strategy covers all aspects of the research system with ExaSearch provider: - **DeepResearchAgent** orchestration with ExaSearch integration - **Specialized Subagents** (Fundamental, Technical, Sentiment, Competitive) - **Parallel Research Orchestration** and task distribution - **Timeout Handling** and circuit breaker patterns - **MCP Tool Integration** via research router endpoints - **Performance Benchmarking** across research depths and configurations ## Test Architecture ### Test Categories 1. **Unit Tests** (`pytest -m unit`) - Individual component testing in isolation - Mock external dependencies - Fast execution (< 30 seconds total) - No external API calls 2. **Integration Tests** (`pytest -m integration`) - End-to-end workflow testing - Real ExaSearch API integration - Multi-component interaction validation - Requires `EXA_API_KEY` environment variable 3. **Performance Tests** (`pytest -m slow`) - Benchmark different research depths - Parallel vs sequential execution comparison - Memory usage and timeout resilience - Longer execution times (2-5 minutes) 4. **Benchmark Suite** (`scripts/benchmark_exa_research.py`) - Comprehensive performance analysis - Cross-configuration comparison - Detailed metrics and reporting - Production-ready performance validation ## Test Files and Structure ``` tests/ ├── test_exa_research_integration.py # Main comprehensive test suite └── conftest.py # Shared fixtures and configuration scripts/ ├── run_exa_tests.py # Test runner utility └── benchmark_exa_research.py # Performance benchmark suite docs/ └── exa_research_testing_strategy.md # This document ``` ## Key Test Components ### 1. ExaSearchProvider Tests **Coverage:** - Provider initialization with/without API key - Adaptive timeout calculation for different query complexities - Failure recording and health status management - Successful search execution with realistic mock responses - Timeout handling and error recovery - Circuit breaker integration **Key Test Methods:** ```python test_exa_provider_initialization() test_timeout_calculation() test_failure_recording_and_health_status() test_exa_search_success() test_exa_search_timeout() test_exa_search_unhealthy_provider() ``` ### 2. DeepResearchAgent Tests **Coverage:** - Agent initialization with ExaSearch provider - Research execution with different depths (basic, standard, comprehensive, exhaustive) - Timeout budget allocation and management - Error handling when no providers are available - Complete research workflow from query to results **Key Test Methods:** ```python test_agent_initialization_with_exa() test_research_comprehensive_success() test_research_comprehensive_no_providers() test_research_depth_levels() ``` ### 3. Specialized Subagent Tests **Coverage:** - All 4 subagent types: Fundamental, Technical, Sentiment, Competitive - Query generation for each specialization - Results processing and analysis - Focus area validation - Cross-subagent consistency **Key Test Methods:** ```python test_fundamental_research_agent() test_technical_research_agent() test_sentiment_research_agent() test_competitive_research_agent() ``` ### 4. Parallel Research Orchestration Tests **Coverage:** - ParallelResearchOrchestrator initialization and configuration - Task preparation and prioritization - Successful parallel execution with multiple tasks - Failure handling and partial success scenarios - Circuit breaker integration - Performance efficiency measurement **Key Test Methods:** ```python test_orchestrator_initialization() test_parallel_execution_success() test_parallel_execution_with_failures() test_circuit_breaker_integration() ``` ### 5. Task Distribution Engine Tests **Coverage:** - Topic relevance analysis for different task types - Intelligent task distribution based on query content - Priority assignment based on relevance scores - Fallback mechanisms when no relevant tasks found **Key Test Methods:** ```python test_topic_relevance_analysis() test_task_distribution_basic() test_task_distribution_fallback() test_task_priority_assignment() ``` ### 6. Timeout and Circuit Breaker Tests **Coverage:** - Timeout budget allocation across research phases - Provider health monitoring and recovery - Research behavior during provider failures - Graceful degradation strategies **Key Test Methods:** ```python test_timeout_budget_allocation() test_provider_health_monitoring() test_research_with_provider_failures() ``` ### 7. Performance Benchmark Tests **Coverage:** - Cross-depth performance comparison (basic → exhaustive) - Parallel vs sequential execution efficiency - Memory usage monitoring during parallel execution - Scalability under load **Key Test Methods:** ```python test_research_depth_performance() test_parallel_vs_sequential_performance() test_memory_usage_monitoring() ``` ### 8. MCP Integration Tests **Coverage:** - MCP tool endpoint validation - Research router integration - Request/response model validation - Error handling in MCP context **Key Test Methods:** ```python test_comprehensive_research_mcp_tool() test_research_without_exa_key() test_research_request_validation() test_get_research_agent_optimization() ``` ### 9. Content Analysis Tests **Coverage:** - AI-powered content analysis functionality - Fallback mechanisms when LLM analysis fails - Batch content processing - Sentiment and insight extraction **Key Test Methods:** ```python test_content_analysis_success() test_content_analysis_fallback() test_batch_content_analysis() ``` ### 10. Error Handling and Edge Cases **Coverage:** - Empty search results handling - Malformed API responses - Network timeout recovery - Concurrent request limits - Memory constraints **Key Test Methods:** ```python test_empty_search_results() test_malformed_search_response() test_network_timeout_recovery() test_concurrent_request_limits() ``` ## Test Data and Fixtures ### Mock Data Factories The test suite includes comprehensive mock data factories: - **`mock_llm`**: Realistic LLM responses for different research phases - **`mock_exa_client`**: ExaSearch API client with query-specific responses - **`sample_research_tasks`**: Representative research tasks for parallel execution - **`mock_settings`**: Configuration with ExaSearch integration enabled ### Realistic Test Scenarios Test scenarios cover real-world usage patterns: ```python test_queries = [ "AAPL stock financial analysis and investment outlook", "Tesla market sentiment and competitive position", "Microsoft earnings performance and growth prospects", "tech sector analysis and market trends", "artificial intelligence investment opportunities", ] research_depths = ["basic", "standard", "comprehensive", "exhaustive"] focus_areas = { "fundamentals": ["earnings", "valuation", "financial_health"], "technicals": ["chart_patterns", "technical_indicators", "price_action"], "sentiment": ["market_sentiment", "analyst_ratings", "news_sentiment"], "competitive": ["competitive_position", "market_share", "industry_trends"], } ``` ## Running Tests ### Quick Start ```bash # Install dependencies uv sync # Set environment variable (for integration tests) export EXA_API_KEY=your_exa_api_key # Run unit tests (fast, no external dependencies) python scripts/run_exa_tests.py --unit # Run integration tests (requires EXA_API_KEY) python scripts/run_exa_tests.py --integration # Run all tests python scripts/run_exa_tests.py --all # Run quick test suite python scripts/run_exa_tests.py --quick # Run with coverage reporting python scripts/run_exa_tests.py --coverage ``` ### Direct pytest Commands ```bash # Unit tests only pytest tests/test_exa_research_integration.py -m unit -v # Integration tests (requires API key) pytest tests/test_exa_research_integration.py -m integration -v # Performance tests pytest tests/test_exa_research_integration.py -m slow -v # All tests pytest tests/test_exa_research_integration.py -v ``` ### Performance Benchmarks ```bash # Comprehensive benchmarks python scripts/benchmark_exa_research.py # Quick benchmarks (reduced test matrix) python scripts/benchmark_exa_research.py --quick # Specific depth testing python scripts/benchmark_exa_research.py --depth basic --focus fundamentals # Parallel execution analysis only python scripts/benchmark_exa_research.py --depth standard --parallel --no-timeout ``` ## Test Environment Setup ### Prerequisites 1. **Python 3.12+**: Core runtime requirement 2. **uv or pip**: Package management 3. **ExaSearch API Key**: For integration tests ```bash export EXA_API_KEY=your_api_key_here ``` ### Optional Dependencies - **Redis**: For caching layer tests (optional) - **PostgreSQL**: For database integration tests (optional) - **psutil**: For memory usage monitoring in performance tests ### Environment Validation ```bash # Validate environment setup python scripts/run_exa_tests.py --validate ``` ## Expected Test Results ### Performance Benchmarks **Research Depth Performance Expectations:** - **Basic**: < 15 seconds execution time - **Standard**: 15-30 seconds execution time - **Comprehensive**: 30-45 seconds execution time - **Exhaustive**: 45-60 seconds execution time **Parallel Execution Efficiency:** - **Speedup**: 2-4x faster than sequential for 3+ subagents - **Memory Usage**: < 100MB additional during parallel execution - **Error Rate**: < 5% for timeout-related failures ### Success Criteria **Unit Tests:** - ✅ 100% pass rate expected - ⚡ Complete in < 30 seconds - 🔄 No external dependencies **Integration Tests:** - ✅ 95%+ pass rate (allowing for API variability) - ⏱️ Complete in < 5 minutes - 🔑 Requires valid EXA_API_KEY **Performance Tests:** - ✅ 90%+ pass rate (allowing for performance variability) - ⏱️ Complete in < 10 minutes - 📊 Generate detailed performance metrics ## Debugging and Troubleshooting ### Common Issues 1. **Missing EXA_API_KEY** ``` Error: Research functionality unavailable - Exa search provider not configured Solution: Set EXA_API_KEY environment variable ``` 2. **Import Errors** ``` ImportError: No module named 'exa_py' Solution: Install dependencies with `uv sync` or `pip install -e .` ``` 3. **Timeout Failures** ``` Error: Research operation timed out Solution: Check network connection or reduce research scope ``` 4. **Memory Issues** ``` Error: Memory usage exceeded limits Solution: Reduce parallel agents or test data size ``` ### Debug Mode Enable detailed logging for debugging: ```bash export PYTHONPATH=/path/to/maverick-mcp export LOG_LEVEL=DEBUG python scripts/run_exa_tests.py --unit --verbose ``` ### Test Output Analysis **Successful Test Run Example:** ``` 🧪 Running ExaSearch Unit Tests ============================ test_exa_provider_initialization PASSED [ 5%] test_timeout_calculation PASSED [ 10%] test_failure_recording_and_health_status PASSED [ 15%] ... ✅ All tests completed successfully! ``` **Benchmark Report Example:** ``` 📊 BENCHMARK SUMMARY REPORT ============================ 📋 Total Tests: 25 ✅ Successful: 23 ❌ Failed: 2 ⏱️ Total Time: 127.3s 📈 Performance Metrics: Avg Execution Time: 18.45s Min/Max Time: 8.21s / 45.67s Avg Confidence Score: 0.78 Avg Sources Analyzed: 8.2 ``` ## Continuous Integration ### CI/CD Integration The test suite is designed for CI/CD integration: ```yaml # Example GitHub Actions workflow - name: Run ExaSearch Tests env: EXA_API_KEY: ${{ secrets.EXA_API_KEY }} run: | python scripts/run_exa_tests.py --unit python scripts/run_exa_tests.py --integration python scripts/benchmark_exa_research.py --quick ``` ### Test Markers for CI Use pytest markers for selective testing: ```bash # Fast tests only (for PR validation) pytest -m "not slow and not external" # Full test suite (for main branch) pytest -m "not external" --maxfail=5 # External API tests (nightly/weekly) pytest -m external ``` ## Maintenance and Updates ### Adding New Tests 1. **Extend existing test classes** for related functionality 2. **Follow naming conventions**: `test_[component]_[scenario]` 3. **Use appropriate markers**: `@pytest.mark.unit`, `@pytest.mark.integration` 4. **Mock external dependencies** in unit tests 5. **Include error scenarios** and edge cases ### Updating Test Data 1. **Mock responses** should reflect real ExaSearch API responses 2. **Test queries** should cover different complexity levels 3. **Performance baselines** should be updated as system improves 4. **Error scenarios** should match actual failure modes ### Performance Regression Detection 1. **Baseline metrics** stored in benchmark results 2. **Automated comparison** against previous runs 3. **Alert thresholds** for performance degradation 4. **Regular benchmark execution** in CI/CD ## Conclusion This comprehensive testing strategy ensures the ExaSearch integration is thoroughly validated across all dimensions: - ✅ **Functional Correctness**: All components work as designed - ⚡ **Performance Characteristics**: System meets timing requirements - 🛡️ **Error Resilience**: Graceful handling of failures and edge cases - 🔗 **Integration Quality**: Seamless operation across component boundaries - 📊 **Monitoring Capability**: Detailed metrics and reporting for ongoing maintenance The test suite provides confidence in the ExaSearch integration's reliability and performance for production deployment.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/wshobson/maverick-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server