Prompt Auto-Optimizer MCP

MIT License

Overview InspectNew Endpoints Schema Related Servers Reviews Score

README.md•10.9 kB

# GEPA End-to-End (E2E) Integration Test Framework This comprehensive E2E testing framework validates the complete GEPA (Genetic Evolutionary Prompt Adaptation) system through integrated workflows, performance benchmarks, and stress testing. ## 📁 Framework Structure ``` src/test/integration/ ├── README.md # This documentation ├── index.ts # Main entry point and utilities ├── e2e-test-runner.ts # Core test orchestration engine ├── test-scenarios.ts # Comprehensive test scenarios ├── test-helpers.ts # Utility functions and MCP tool testing ├── performance-benchmarks.ts # Performance testing and monitoring └── gepa-e2e.test.ts # Main integration test suite ``` ## 🚀 Quick Start ### Run Complete E2E Test Suite ```bash # Run all E2E tests npm run test:e2e # Run all test types npm run test:all # Run with verbose output npm run test:e2e -- --reporter=verbose ``` ### Run Performance Benchmarks Only ```bash # Performance benchmarks only npm run test:e2e:performance # Validate environment setup npm run test:e2e:validate ``` ### Development Testing ```bash # Run in watch mode npm run test:e2e -- --watch # Run with UI npm run test:e2e -- --ui # Run specific test patterns npm run test:e2e -- --testNamePattern="Core Workflow" ``` ## 🧪 Test Coverage Areas ### 1. Core Workflow Integration - **Complete Evolution Cycles**: End-to-end evolution from seed prompt to optimal candidate - **Trajectory Recording**: Execution tracking and analysis workflow - **Pareto Frontier Optimization**: Multi-objective optimization validation - **Reflection Analysis**: Failure pattern detection and improvement generation ### 2. MCP Tool Integration - **gepa_start_evolution**: Evolution initialization and configuration - **gepa_record_trajectory**: Execution tracking and storage - **gepa_evaluate_prompt**: Performance evaluation across tasks - **gepa_reflect**: Failure analysis and improvement recommendations - **gepa_get_pareto_frontier**: Optimal candidate retrieval - **gepa_select_optimal**: Context-aware candidate selection ### 3. Performance and Load Testing - **Response Time Benchmarks**: All operations meet SLA requirements - **Concurrent Operations**: Multi-threaded execution stress testing - **Memory Usage Profiling**: Memory leak detection and optimization - **Throughput Analysis**: Operations per second under various loads - **Resource Utilization**: CPU, memory, and I/O monitoring ### 4. Memory System Validation - **Automated Memory Updates**: Trajectory and candidate storage automation - **Memory Optimization Triggers**: Capacity management and cleanup - **Cross-System Synchronization**: Data consistency across components - **Recovery from Corruption**: Data integrity and recovery procedures ### 5. Error Handling and Recovery - **Component Failure Recovery**: LLM adapter, storage, and service failures - **Data Corruption Detection**: Automated corruption detection and repair - **Resource Exhaustion Handling**: Graceful degradation under load - **Retry Mechanisms**: Exponential backoff and circuit breaker patterns ### 6. Concurrent Operations - **Tool Execution Concurrency**: Parallel MCP tool execution - **Memory Access Concurrency**: Thread-safe data operations - **Evolution and Analysis**: Concurrent workflow execution - **Resource Contention**: Lock-free operation validation ## 📊 Performance Thresholds The framework enforces the following performance SLAs: | Operation | Threshold | Metric | |-----------|-----------|---------| | Evolution Cycle | 30,000ms | Complete end-to-end cycle | | Trajectory Recording | 1,000ms | Single trajectory save | | Pareto Frontier Query | 500ms | Frontier retrieval | | Memory Operations | 100ms | Basic read/write operations | | Concurrent Operations | 5 ops/sec | Minimum throughput | ## 🏗️ Framework Architecture ### E2ETestRunner **Main orchestration engine** that: - Initializes test environment with isolated components - Manages test execution lifecycle and resource cleanup - Coordinates between test scenarios, helpers, and benchmarks - Generates comprehensive test reports and metrics ### TestScenarios **Scenario generation engine** that creates: - Complete evolution workflows with realistic data - Failure simulation scenarios for error testing - High-load scenarios for performance validation - Memory stress tests for leak detection ### E2ETestHelpers **Utility functions** providing: - MCP tool execution with retry logic and error handling - Memory management operations and statistics - Component recovery procedures and validation - Cross-system synchronization and data integrity checks ### PerformanceBenchmarks **Performance monitoring system** that: - Measures response times for all operations - Tracks throughput under various load conditions - Monitors memory usage patterns and leak detection - Detects performance regressions against baselines ## 🔧 Configuration ### Default Configuration ```typescript const config: E2ETestConfig = { maxConcurrentTests: 3, defaultTimeout: 60000, // 60 seconds performanceThresholds: { evolutionTime: 30000, trajectoryRecording: 1000, paretoFrontierQuery: 500, memoryOperations: 100, }, retryOptions: { maxRetries: 3, baseDelay: 1000, }, }; ``` ### Environment Variables ```bash # Test environment configuration NODE_ENV=test GEPA_TEST_MODE=integration GEPA_CONFIG_PATH=./temp/integration/gepa.config.json GEPA_DATA_DIR=./temp/integration # Performance tuning NODE_OPTIONS='--max-old-space-size=16384' ``` ## 📈 Performance Monitoring ### Real-time Metrics - **Response Times**: Average, min, max, standard deviation - **Throughput**: Operations per second under load - **Success Rates**: Percentage of successful operations - **Memory Usage**: Peak usage, average usage, leak detection - **Resource Utilization**: CPU, memory, I/O percentages ### Benchmark Reports ``` 📊 Performance Benchmark Results ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Overall Score: 87.3/100 Total Execution Time: 45.2s ⚡ Individual Benchmark Results: ✅ trajectory_recording: 850ms avg, 1.2 ops/sec, 98.5% success ✅ pareto_frontier_query: 320ms avg, 3.1 ops/sec, 99.2% success ✅ memory_operations: 75ms avg, 13.3 ops/sec, 100% success ✅ evolution_cycle: 28,500ms avg, 0.035 ops/sec, 95.0% success ✅ concurrent_operations: 1,800ms avg, 2.8 ops/sec, 97.5% success 💾 Resource Utilization: CPU: 65.2% Memory: 78.9% I/O: 42.1% ``` ## 🔍 Test Result Analysis ### Success Metrics - **Pass Rate**: >95% for production readiness - **Performance Score**: >80/100 for acceptable performance - **Memory Efficiency**: No memory leaks detected - **Error Recovery**: All critical errors must be recoverable ### Failure Analysis - **Critical Errors**: System-breaking failures requiring immediate attention - **Performance Regressions**: >10% performance degradation from baseline - **Memory Leaks**: Unrecovered memory after garbage collection - **Component Failures**: Unrecoverable service or adapter failures ## 🚨 Troubleshooting ### Common Issues **Test Timeouts** ```bash # Increase timeout for slow environments npm run test:e2e -- --testTimeout=120000 ``` **Memory Issues** ```bash # Increase Node.js memory limit NODE_OPTIONS='--max-old-space-size=32768' npm run test:e2e ``` **Component Initialization Failures** ```bash # Validate environment before testing npm run test:e2e:validate ``` ### Debug Mode ```bash # Enable verbose logging DEBUG=gepa:* npm run test:e2e # Run with Node.js inspector node --inspect-brk ./node_modules/.bin/vitest run src/test/integration/gepa-e2e.test.ts ``` ## 📚 Development Guide ### Adding New Test Scenarios 1. **Create scenario in TestScenarios class**: ```typescript async createMyNewScenario(): Promise<ScenarioResult> { // Implementation } ``` 2. **Add test case in gepa-e2e.test.ts**: ```typescript test('My New Scenario Test', async () => { const scenario = await testRunner.environment!.scenarios.createMyNewScenario(); const result = await testRunner.environment!.testHelpers.executeScenario(scenario); expect(result.success).toBe(true); }); ``` ### Adding Performance Benchmarks 1. **Implement benchmark in PerformanceBenchmarks**: ```typescript async benchmarkMyOperation(): Promise<BenchmarkResult> { // Implementation } ``` 2. **Add to comprehensive benchmark suite**: ```typescript const myOperationBenchmark = await this.benchmarkMyOperation(); ``` ### Custom Test Helpers 1. **Add helper method to E2ETestHelpers**: ```typescript async myCustomHelper(): Promise<MyResult> { // Implementation } ``` 2. **Use in test scenarios**: ```typescript const result = await testRunner.environment!.testHelpers.myCustomHelper(); ``` ## 🔗 Integration Points ### GEPA Core Components - **TrajectoryStore**: Execution data persistence - **ParetoFrontier**: Multi-objective optimization - **ReflectionEngine**: Failure analysis and improvement - **LLMAdapter**: External LLM integration - **PromptMutator**: Genetic operations and evolution ### MCP Server Integration - **Tool Registration**: All 6 GEPA MCP tools - **Request/Response Validation**: Input/output schema compliance - **Error Handling**: Graceful failure and recovery - **Performance Monitoring**: Tool execution metrics ### Test Infrastructure - **Vitest Framework**: Test runner and assertions - **TypeScript Support**: Full type safety and IDE integration - **CI/CD Integration**: Automated testing in pipelines - **Coverage Reporting**: Code coverage metrics and reports ## 📝 Contributing ### Test Development Guidelines 1. **Comprehensive Coverage**: Test both happy path and edge cases 2. **Performance Awareness**: Include timing and resource usage validation 3. **Error Scenarios**: Test failure modes and recovery procedures 4. **Documentation**: Document test purpose and expected outcomes 5. **Isolation**: Ensure tests don't interfere with each other ### Code Quality Standards - **TypeScript**: Full type safety with strict mode - **ESLint**: Consistent code style and best practices - **Prettier**: Automated code formatting - **Test Coverage**: >90% for E2E test framework code ### Performance Considerations - **Resource Management**: Proper cleanup and resource disposal - **Memory Efficiency**: Avoid memory leaks in test infrastructure - **Execution Speed**: Optimize test execution time without sacrificing coverage - **Parallel Execution**: Support concurrent test execution where possible ## 📄 License This E2E testing framework is part of the GEPA MCP Server project and is licensed under the MIT License.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sloth-wq/prompt-auto-optimizer-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server