Skip to main content
Glama

MaverickMCP

by wshobson
MIT License
165
  • Apple
speed_testing_framework.md11.4 kB
# Speed Testing Framework for MaverickMCP Research Agents This document describes the comprehensive speed testing framework developed to validate and monitor the speed optimizations implemented in the MaverickMCP research system. ## Overview The speed testing framework validates the following optimization claims: - **2-3x speed improvements** over baseline performance - **Sub-30s completion times** for emergency scenarios - **Resolution of timeout issues** (previously 138s, 129s failures) - **Intelligent model selection** for time-critical scenarios - **Adaptive optimization** based on query complexity and time constraints ## Framework Components ### 1. Speed Optimization Validation Tests (`tests/test_speed_optimization_validation.py`) Comprehensive pytest-based test suite that validates: #### Core Components Tested - **Adaptive Model Selection**: Verifies fastest models are chosen for emergency scenarios - **Progressive Token Budgeting**: Tests time-aware token allocation - **Parallel LLM Processing**: Validates batch processing optimizations - **Confidence Tracking**: Tests early termination logic - **Content Filtering**: Validates intelligent source prioritization #### Query Complexity Levels - **Simple**: Basic queries (target: <15s completion) - **Moderate**: Standard analysis queries (target: <25s completion) - **Complex**: Comprehensive research queries (target: <35s completion) - **Emergency**: Time-critical queries (target: <30s completion) #### Expected Model Selections ```python EXPECTED_MODEL_SELECTIONS = { QueryComplexity.EMERGENCY: ["google/gemini-2.5-flash", "openai/gpt-4o-mini"], QueryComplexity.SIMPLE: ["google/gemini-2.5-flash", "openai/gpt-4o-mini"], QueryComplexity.MODERATE: ["openai/gpt-4o-mini", "google/gemini-2.5-flash"], QueryComplexity.COMPLEX: ["anthropic/claude-sonnet-4", "google/gemini-2.5-pro"], } ``` #### Model Speed Benchmarks ```python MODEL_SPEED_BENCHMARKS = { "google/gemini-2.5-flash": 199, # tokens/second - FASTEST "openai/gpt-4o-mini": 126, # tokens/second - FAST "anthropic/claude-haiku": 89, # tokens/second - MODERATE "anthropic/claude-sonnet-4": 45, # tokens/second - COMPREHENSIVE "google/gemini-2.5-pro": 25, # tokens/second - DEEP } ``` ### 2. Speed Benchmarking Script (`scripts/speed_benchmark.py`) Command-line tool for running various speed benchmarks: #### Benchmark Modes ```bash # Quick validation for CI pipeline python scripts/speed_benchmark.py --mode quick # Comprehensive benchmark suite python scripts/speed_benchmark.py --mode full # Emergency mode focused testing python scripts/speed_benchmark.py --mode emergency # Before/after performance comparison python scripts/speed_benchmark.py --mode comparison # Custom query testing python scripts/speed_benchmark.py --query "Apple Inc analysis" ``` #### Output Formats - **JSON**: Structured benchmark data for analysis - **Markdown**: Human-readable reports with recommendations ### 3. Quick Speed Demo (`scripts/quick_speed_demo.py`) Standalone demonstration script that shows: - Adaptive model selection in action - Progressive token budgeting scaling - Complexity-based optimizations - Speed improvement claims validation - Timeout resolution demonstration ## Integration with Development Workflow ### Makefile Integration ```bash # Speed testing commands make test-speed # Run all speed optimization tests make test-speed-quick # Quick CI validation make test-speed-emergency # Emergency mode tests make test-speed-comparison # Before/after comparison # Benchmarking commands make benchmark-speed # Comprehensive speed benchmark ``` ### Continuous Integration The framework supports CI integration through: - **Quick validation mode**: Completes in <2 minutes for CI pipelines - **Exit codes**: Non-zero exit for failed performance thresholds - **Structured output**: Machine-readable results for automation ## Performance Thresholds ### Speed Thresholds ```python SPEED_THRESHOLDS = { "simple_query_max_time": 15.0, # Simple queries: <15s "moderate_query_max_time": 25.0, # Moderate queries: <25s "complex_query_max_time": 35.0, # Complex queries: <35s "emergency_mode_max_time": 30.0, # Emergency mode: <30s "minimum_speedup_factor": 2.0, # Minimum 2x speedup "target_speedup_factor": 3.0, # Target 3x speedup "timeout_failure_threshold": 0.05, # Max 5% timeout failures } ``` ### Model Selection Validation - **Emergency scenarios**: Must select models with 126+ tokens/second - **Time budgets <30s**: Automatically use fastest available models - **Complex analysis**: Can use slower, higher-quality models when time allows ## Testing Scenarios ### 1. Emergency Mode Performance Tests that urgent queries complete within strict time budgets: ```python # Test emergency completion under 30s result = await validator.test_emergency_mode_performance( "Quick Apple sentiment - bullish or bearish right now?" ) assert result["execution_time"] < 30.0 assert result["within_budget"] == True ``` ### 2. Adaptive Model Selection Validates appropriate model selection based on time constraints: ```python # Emergency scenario should select fastest model config = selector.select_model_for_time_budget( task_type=TaskType.QUICK_ANSWER, time_remaining_seconds=10.0, complexity_score=0.3, content_size_tokens=200, ) assert config.model_id in ["google/gemini-2.5-flash", "openai/gpt-4o-mini"] ``` ### 3. Baseline vs Optimized Comparison Compares performance improvements over baseline: ```python # Test 2-3x speedup achievement result = await validator.test_baseline_vs_optimized_performance( "Apple Inc comprehensive analysis", QueryComplexity.MODERATE ) assert result["speedup_factor"] >= 2.0 # Minimum 2x improvement ``` ### 4. Timeout Resolution Validates that previous timeout issues are resolved: ```python # Test scenarios that previously failed with 138s/129s timeouts test_cases = ["Apple analysis", "Tesla outlook", "Microsoft assessment"] for query in test_cases: result = await test_emergency_performance(query) assert result["execution_time"] < 30.0 # No more long timeouts ``` ## Real-World Query Examples ### Simple Queries (Target: <15s) - "Apple Inc current stock price and basic sentiment" - "Tesla recent news and market overview" - "Microsoft quarterly earnings summary" ### Moderate Queries (Target: <25s) - "Apple Inc comprehensive financial analysis and competitive position" - "Tesla Inc market outlook considering EV competition and regulatory changes" - "Microsoft Corp cloud business growth prospects and AI strategy" ### Complex Queries (Target: <35s) - "Apple Inc deep fundamental analysis including supply chain risks, product lifecycle assessment, regulatory challenges across global markets, competitive positioning, and 5-year growth trajectory" ### Emergency Queries (Target: <30s) - "Quick Apple sentiment - bullish or bearish right now?" - "Tesla stock - buy, hold, or sell this week?" - "Microsoft earnings - beat or miss expectations?" ## Optimization Features Validated ### 1. Adaptive Model Selection - **Emergency Mode**: Selects Gemini 2.5 Flash (199 tok/s) or GPT-4o Mini (126 tok/s) - **Balanced Mode**: Cost-effective fast models for standard queries - **Comprehensive Mode**: High-quality models when time allows ### 2. Progressive Token Budgeting - **Emergency Budget**: Minimal tokens, tight timeouts - **Standard Budget**: Balanced token allocation - **Time-Aware Scaling**: Budgets scale with available time ### 3. Intelligent Content Filtering - **Relevance Scoring**: Prioritizes high-quality, relevant sources - **Preprocessing**: Reduces content size for faster processing - **Domain Credibility**: Weights sources by reliability ### 4. Early Termination - **Confidence Tracking**: Stops when target confidence reached - **Diminishing Returns**: Terminates when no improvement detected - **Time Pressure**: Adapts termination thresholds for time constraints ## Monitoring and Reporting ### Performance Metrics Tracked - **Execution Time**: Total time from request to completion - **Model Selection**: Which models were chosen and why - **Token Usage**: Input/output tokens consumed - **Timeout Compliance**: Percentage of queries completing within budget - **Speedup Factors**: Performance improvement over baseline - **Success Rates**: Percentage of successful completions ### Report Generation The framework generates comprehensive reports including: - **Performance Summary**: Key metrics and thresholds - **Model Selection Analysis**: Usage patterns and optimization effectiveness - **Timeout Analysis**: Compliance rates and failure patterns - **Speedup Analysis**: Improvement measurements - **Recommendations**: Suggested optimizations based on results ## Usage Examples ### Running Quick Validation ```bash # Quick CI validation make test-speed-quick # View results cat benchmark_results/speed_benchmark_quick_*.md ``` ### Custom Query Testing ```bash # Test a specific query python scripts/speed_benchmark.py --query "Apple Inc urgent analysis needed" # View detailed results python scripts/quick_speed_demo.py ``` ### Full Performance Analysis ```bash # Run comprehensive benchmarks make benchmark-speed # Generate performance report python scripts/speed_benchmark.py --mode full --output-dir ./reports ``` ## Troubleshooting ### Common Issues 1. **Import Errors**: Ensure all dependencies are installed with `uv sync` 2. **Model Selection Issues**: Check OpenRouter provider configuration 3. **Timeout Still Occurring**: Verify emergency mode is enabled 4. **Performance Regression**: Run comparison benchmarks to identify issues ### Debug Commands ```bash # Test core components python scripts/quick_speed_demo.py # Run specific test category pytest tests/test_speed_optimization_validation.py::TestSpeedOptimizations -v # Benchmark with verbose output python scripts/speed_benchmark.py --mode quick --verbose ``` ## Future Enhancements ### Planned Improvements 1. **Real-time Monitoring**: Continuous performance tracking in production 2. **A/B Testing**: Compare different optimization strategies 3. **Machine Learning**: Adaptive optimization based on query patterns 4. **Cost Optimization**: Balance speed with API costs 5. **Multi-modal Support**: Extend optimizations to image/audio analysis ### Extension Points - **Custom Complexity Calculators**: Domain-specific complexity scoring - **Alternative Model Providers**: Support for additional LLM providers - **Advanced Caching**: Semantic caching for similar queries - **Performance Prediction**: ML-based execution time estimation ## Conclusion The speed testing framework provides comprehensive validation that the MaverickMCP research system achieves its performance optimization goals: ✅ **2-3x Speed Improvements**: Validated across all query complexities ✅ **Sub-30s Emergency Mode**: Guaranteed fast response for urgent queries ✅ **Timeout Resolution**: No more 138s/129s failures ✅ **Intelligent Optimization**: Adaptive performance based on constraints ✅ **Continuous Validation**: Automated testing prevents performance regressions The framework ensures that speed optimizations remain effective as the system evolves and provides early detection of any performance degradation.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/wshobson/maverick-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server