exa_research_testing_strategy.md•13.9 kB
# ExaSearch Research Integration Testing Strategy
This document outlines the comprehensive testing strategy for validating the ExaSearch integration with the MaverickMCP research agent architecture.
## Overview
The testing strategy covers all aspects of the research system with ExaSearch provider:
- **DeepResearchAgent** orchestration with ExaSearch integration
- **Specialized Subagents** (Fundamental, Technical, Sentiment, Competitive)
- **Parallel Research Orchestration** and task distribution
- **Timeout Handling** and circuit breaker patterns
- **MCP Tool Integration** via research router endpoints
- **Performance Benchmarking** across research depths and configurations
## Test Architecture
### Test Categories
1. **Unit Tests** (`pytest -m unit`)
- Individual component testing in isolation
- Mock external dependencies
- Fast execution (< 30 seconds total)
- No external API calls
2. **Integration Tests** (`pytest -m integration`)
- End-to-end workflow testing
- Real ExaSearch API integration
- Multi-component interaction validation
- Requires `EXA_API_KEY` environment variable
3. **Performance Tests** (`pytest -m slow`)
- Benchmark different research depths
- Parallel vs sequential execution comparison
- Memory usage and timeout resilience
- Longer execution times (2-5 minutes)
4. **Benchmark Suite** (`scripts/benchmark_exa_research.py`)
- Comprehensive performance analysis
- Cross-configuration comparison
- Detailed metrics and reporting
- Production-ready performance validation
## Test Files and Structure
```
tests/
├── test_exa_research_integration.py # Main comprehensive test suite
└── conftest.py # Shared fixtures and configuration
scripts/
├── run_exa_tests.py # Test runner utility
└── benchmark_exa_research.py # Performance benchmark suite
docs/
└── exa_research_testing_strategy.md # This document
```
## Key Test Components
### 1. ExaSearchProvider Tests
**Coverage:**
- Provider initialization with/without API key
- Adaptive timeout calculation for different query complexities
- Failure recording and health status management
- Successful search execution with realistic mock responses
- Timeout handling and error recovery
- Circuit breaker integration
**Key Test Methods:**
```python
test_exa_provider_initialization()
test_timeout_calculation()
test_failure_recording_and_health_status()
test_exa_search_success()
test_exa_search_timeout()
test_exa_search_unhealthy_provider()
```
### 2. DeepResearchAgent Tests
**Coverage:**
- Agent initialization with ExaSearch provider
- Research execution with different depths (basic, standard, comprehensive, exhaustive)
- Timeout budget allocation and management
- Error handling when no providers are available
- Complete research workflow from query to results
**Key Test Methods:**
```python
test_agent_initialization_with_exa()
test_research_comprehensive_success()
test_research_comprehensive_no_providers()
test_research_depth_levels()
```
### 3. Specialized Subagent Tests
**Coverage:**
- All 4 subagent types: Fundamental, Technical, Sentiment, Competitive
- Query generation for each specialization
- Results processing and analysis
- Focus area validation
- Cross-subagent consistency
**Key Test Methods:**
```python
test_fundamental_research_agent()
test_technical_research_agent()
test_sentiment_research_agent()
test_competitive_research_agent()
```
### 4. Parallel Research Orchestration Tests
**Coverage:**
- ParallelResearchOrchestrator initialization and configuration
- Task preparation and prioritization
- Successful parallel execution with multiple tasks
- Failure handling and partial success scenarios
- Circuit breaker integration
- Performance efficiency measurement
**Key Test Methods:**
```python
test_orchestrator_initialization()
test_parallel_execution_success()
test_parallel_execution_with_failures()
test_circuit_breaker_integration()
```
### 5. Task Distribution Engine Tests
**Coverage:**
- Topic relevance analysis for different task types
- Intelligent task distribution based on query content
- Priority assignment based on relevance scores
- Fallback mechanisms when no relevant tasks found
**Key Test Methods:**
```python
test_topic_relevance_analysis()
test_task_distribution_basic()
test_task_distribution_fallback()
test_task_priority_assignment()
```
### 6. Timeout and Circuit Breaker Tests
**Coverage:**
- Timeout budget allocation across research phases
- Provider health monitoring and recovery
- Research behavior during provider failures
- Graceful degradation strategies
**Key Test Methods:**
```python
test_timeout_budget_allocation()
test_provider_health_monitoring()
test_research_with_provider_failures()
```
### 7. Performance Benchmark Tests
**Coverage:**
- Cross-depth performance comparison (basic → exhaustive)
- Parallel vs sequential execution efficiency
- Memory usage monitoring during parallel execution
- Scalability under load
**Key Test Methods:**
```python
test_research_depth_performance()
test_parallel_vs_sequential_performance()
test_memory_usage_monitoring()
```
### 8. MCP Integration Tests
**Coverage:**
- MCP tool endpoint validation
- Research router integration
- Request/response model validation
- Error handling in MCP context
**Key Test Methods:**
```python
test_comprehensive_research_mcp_tool()
test_research_without_exa_key()
test_research_request_validation()
test_get_research_agent_optimization()
```
### 9. Content Analysis Tests
**Coverage:**
- AI-powered content analysis functionality
- Fallback mechanisms when LLM analysis fails
- Batch content processing
- Sentiment and insight extraction
**Key Test Methods:**
```python
test_content_analysis_success()
test_content_analysis_fallback()
test_batch_content_analysis()
```
### 10. Error Handling and Edge Cases
**Coverage:**
- Empty search results handling
- Malformed API responses
- Network timeout recovery
- Concurrent request limits
- Memory constraints
**Key Test Methods:**
```python
test_empty_search_results()
test_malformed_search_response()
test_network_timeout_recovery()
test_concurrent_request_limits()
```
## Test Data and Fixtures
### Mock Data Factories
The test suite includes comprehensive mock data factories:
- **`mock_llm`**: Realistic LLM responses for different research phases
- **`mock_exa_client`**: ExaSearch API client with query-specific responses
- **`sample_research_tasks`**: Representative research tasks for parallel execution
- **`mock_settings`**: Configuration with ExaSearch integration enabled
### Realistic Test Scenarios
Test scenarios cover real-world usage patterns:
```python
test_queries = [
"AAPL stock financial analysis and investment outlook",
"Tesla market sentiment and competitive position",
"Microsoft earnings performance and growth prospects",
"tech sector analysis and market trends",
"artificial intelligence investment opportunities",
]
research_depths = ["basic", "standard", "comprehensive", "exhaustive"]
focus_areas = {
"fundamentals": ["earnings", "valuation", "financial_health"],
"technicals": ["chart_patterns", "technical_indicators", "price_action"],
"sentiment": ["market_sentiment", "analyst_ratings", "news_sentiment"],
"competitive": ["competitive_position", "market_share", "industry_trends"],
}
```
## Running Tests
### Quick Start
```bash
# Install dependencies
uv sync
# Set environment variable (for integration tests)
export EXA_API_KEY=your_exa_api_key
# Run unit tests (fast, no external dependencies)
python scripts/run_exa_tests.py --unit
# Run integration tests (requires EXA_API_KEY)
python scripts/run_exa_tests.py --integration
# Run all tests
python scripts/run_exa_tests.py --all
# Run quick test suite
python scripts/run_exa_tests.py --quick
# Run with coverage reporting
python scripts/run_exa_tests.py --coverage
```
### Direct pytest Commands
```bash
# Unit tests only
pytest tests/test_exa_research_integration.py -m unit -v
# Integration tests (requires API key)
pytest tests/test_exa_research_integration.py -m integration -v
# Performance tests
pytest tests/test_exa_research_integration.py -m slow -v
# All tests
pytest tests/test_exa_research_integration.py -v
```
### Performance Benchmarks
```bash
# Comprehensive benchmarks
python scripts/benchmark_exa_research.py
# Quick benchmarks (reduced test matrix)
python scripts/benchmark_exa_research.py --quick
# Specific depth testing
python scripts/benchmark_exa_research.py --depth basic --focus fundamentals
# Parallel execution analysis only
python scripts/benchmark_exa_research.py --depth standard --parallel --no-timeout
```
## Test Environment Setup
### Prerequisites
1. **Python 3.12+**: Core runtime requirement
2. **uv or pip**: Package management
3. **ExaSearch API Key**: For integration tests
```bash
export EXA_API_KEY=your_api_key_here
```
### Optional Dependencies
- **Redis**: For caching layer tests (optional)
- **PostgreSQL**: For database integration tests (optional)
- **psutil**: For memory usage monitoring in performance tests
### Environment Validation
```bash
# Validate environment setup
python scripts/run_exa_tests.py --validate
```
## Expected Test Results
### Performance Benchmarks
**Research Depth Performance Expectations:**
- **Basic**: < 15 seconds execution time
- **Standard**: 15-30 seconds execution time
- **Comprehensive**: 30-45 seconds execution time
- **Exhaustive**: 45-60 seconds execution time
**Parallel Execution Efficiency:**
- **Speedup**: 2-4x faster than sequential for 3+ subagents
- **Memory Usage**: < 100MB additional during parallel execution
- **Error Rate**: < 5% for timeout-related failures
### Success Criteria
**Unit Tests:**
- ✅ 100% pass rate expected
- ⚡ Complete in < 30 seconds
- 🔄 No external dependencies
**Integration Tests:**
- ✅ 95%+ pass rate (allowing for API variability)
- ⏱️ Complete in < 5 minutes
- 🔑 Requires valid EXA_API_KEY
**Performance Tests:**
- ✅ 90%+ pass rate (allowing for performance variability)
- ⏱️ Complete in < 10 minutes
- 📊 Generate detailed performance metrics
## Debugging and Troubleshooting
### Common Issues
1. **Missing EXA_API_KEY**
```
Error: Research functionality unavailable - Exa search provider not configured
Solution: Set EXA_API_KEY environment variable
```
2. **Import Errors**
```
ImportError: No module named 'exa_py'
Solution: Install dependencies with `uv sync` or `pip install -e .`
```
3. **Timeout Failures**
```
Error: Research operation timed out
Solution: Check network connection or reduce research scope
```
4. **Memory Issues**
```
Error: Memory usage exceeded limits
Solution: Reduce parallel agents or test data size
```
### Debug Mode
Enable detailed logging for debugging:
```bash
export PYTHONPATH=/path/to/maverick-mcp
export LOG_LEVEL=DEBUG
python scripts/run_exa_tests.py --unit --verbose
```
### Test Output Analysis
**Successful Test Run Example:**
```
🧪 Running ExaSearch Unit Tests
============================
test_exa_provider_initialization PASSED [ 5%]
test_timeout_calculation PASSED [ 10%]
test_failure_recording_and_health_status PASSED [ 15%]
...
✅ All tests completed successfully!
```
**Benchmark Report Example:**
```
📊 BENCHMARK SUMMARY REPORT
============================
📋 Total Tests: 25
✅ Successful: 23
❌ Failed: 2
⏱️ Total Time: 127.3s
📈 Performance Metrics:
Avg Execution Time: 18.45s
Min/Max Time: 8.21s / 45.67s
Avg Confidence Score: 0.78
Avg Sources Analyzed: 8.2
```
## Continuous Integration
### CI/CD Integration
The test suite is designed for CI/CD integration:
```yaml
# Example GitHub Actions workflow
- name: Run ExaSearch Tests
env:
EXA_API_KEY: ${{ secrets.EXA_API_KEY }}
run: |
python scripts/run_exa_tests.py --unit
python scripts/run_exa_tests.py --integration
python scripts/benchmark_exa_research.py --quick
```
### Test Markers for CI
Use pytest markers for selective testing:
```bash
# Fast tests only (for PR validation)
pytest -m "not slow and not external"
# Full test suite (for main branch)
pytest -m "not external" --maxfail=5
# External API tests (nightly/weekly)
pytest -m external
```
## Maintenance and Updates
### Adding New Tests
1. **Extend existing test classes** for related functionality
2. **Follow naming conventions**: `test_[component]_[scenario]`
3. **Use appropriate markers**: `@pytest.mark.unit`, `@pytest.mark.integration`
4. **Mock external dependencies** in unit tests
5. **Include error scenarios** and edge cases
### Updating Test Data
1. **Mock responses** should reflect real ExaSearch API responses
2. **Test queries** should cover different complexity levels
3. **Performance baselines** should be updated as system improves
4. **Error scenarios** should match actual failure modes
### Performance Regression Detection
1. **Baseline metrics** stored in benchmark results
2. **Automated comparison** against previous runs
3. **Alert thresholds** for performance degradation
4. **Regular benchmark execution** in CI/CD
## Conclusion
This comprehensive testing strategy ensures the ExaSearch integration is thoroughly validated across all dimensions:
- ✅ **Functional Correctness**: All components work as designed
- ⚡ **Performance Characteristics**: System meets timing requirements
- 🛡️ **Error Resilience**: Graceful handling of failures and edge cases
- 🔗 **Integration Quality**: Seamless operation across component boundaries
- 📊 **Monitoring Capability**: Detailed metrics and reporting for ongoing maintenance
The test suite provides confidence in the ExaSearch integration's reliability and performance for production deployment.