README.mdโข10.9 kB
# GEPA End-to-End (E2E) Integration Test Framework
This comprehensive E2E testing framework validates the complete GEPA (Genetic Evolutionary Prompt Adaptation) system through integrated workflows, performance benchmarks, and stress testing.
## ๐ Framework Structure
```
src/test/integration/
โโโ README.md # This documentation
โโโ index.ts # Main entry point and utilities
โโโ e2e-test-runner.ts # Core test orchestration engine
โโโ test-scenarios.ts # Comprehensive test scenarios
โโโ test-helpers.ts # Utility functions and MCP tool testing
โโโ performance-benchmarks.ts # Performance testing and monitoring
โโโ gepa-e2e.test.ts # Main integration test suite
```
## ๐ Quick Start
### Run Complete E2E Test Suite
```bash
# Run all E2E tests
npm run test:e2e
# Run all test types
npm run test:all
# Run with verbose output
npm run test:e2e -- --reporter=verbose
```
### Run Performance Benchmarks Only
```bash
# Performance benchmarks only
npm run test:e2e:performance
# Validate environment setup
npm run test:e2e:validate
```
### Development Testing
```bash
# Run in watch mode
npm run test:e2e -- --watch
# Run with UI
npm run test:e2e -- --ui
# Run specific test patterns
npm run test:e2e -- --testNamePattern="Core Workflow"
```
## ๐งช Test Coverage Areas
### 1. Core Workflow Integration
- **Complete Evolution Cycles**: End-to-end evolution from seed prompt to optimal candidate
- **Trajectory Recording**: Execution tracking and analysis workflow
- **Pareto Frontier Optimization**: Multi-objective optimization validation
- **Reflection Analysis**: Failure pattern detection and improvement generation
### 2. MCP Tool Integration
- **gepa_start_evolution**: Evolution initialization and configuration
- **gepa_record_trajectory**: Execution tracking and storage
- **gepa_evaluate_prompt**: Performance evaluation across tasks
- **gepa_reflect**: Failure analysis and improvement recommendations
- **gepa_get_pareto_frontier**: Optimal candidate retrieval
- **gepa_select_optimal**: Context-aware candidate selection
### 3. Performance and Load Testing
- **Response Time Benchmarks**: All operations meet SLA requirements
- **Concurrent Operations**: Multi-threaded execution stress testing
- **Memory Usage Profiling**: Memory leak detection and optimization
- **Throughput Analysis**: Operations per second under various loads
- **Resource Utilization**: CPU, memory, and I/O monitoring
### 4. Memory System Validation
- **Automated Memory Updates**: Trajectory and candidate storage automation
- **Memory Optimization Triggers**: Capacity management and cleanup
- **Cross-System Synchronization**: Data consistency across components
- **Recovery from Corruption**: Data integrity and recovery procedures
### 5. Error Handling and Recovery
- **Component Failure Recovery**: LLM adapter, storage, and service failures
- **Data Corruption Detection**: Automated corruption detection and repair
- **Resource Exhaustion Handling**: Graceful degradation under load
- **Retry Mechanisms**: Exponential backoff and circuit breaker patterns
### 6. Concurrent Operations
- **Tool Execution Concurrency**: Parallel MCP tool execution
- **Memory Access Concurrency**: Thread-safe data operations
- **Evolution and Analysis**: Concurrent workflow execution
- **Resource Contention**: Lock-free operation validation
## ๐ Performance Thresholds
The framework enforces the following performance SLAs:
| Operation | Threshold | Metric |
|-----------|-----------|---------|
| Evolution Cycle | 30,000ms | Complete end-to-end cycle |
| Trajectory Recording | 1,000ms | Single trajectory save |
| Pareto Frontier Query | 500ms | Frontier retrieval |
| Memory Operations | 100ms | Basic read/write operations |
| Concurrent Operations | 5 ops/sec | Minimum throughput |
## ๐๏ธ Framework Architecture
### E2ETestRunner
**Main orchestration engine** that:
- Initializes test environment with isolated components
- Manages test execution lifecycle and resource cleanup
- Coordinates between test scenarios, helpers, and benchmarks
- Generates comprehensive test reports and metrics
### TestScenarios
**Scenario generation engine** that creates:
- Complete evolution workflows with realistic data
- Failure simulation scenarios for error testing
- High-load scenarios for performance validation
- Memory stress tests for leak detection
### E2ETestHelpers
**Utility functions** providing:
- MCP tool execution with retry logic and error handling
- Memory management operations and statistics
- Component recovery procedures and validation
- Cross-system synchronization and data integrity checks
### PerformanceBenchmarks
**Performance monitoring system** that:
- Measures response times for all operations
- Tracks throughput under various load conditions
- Monitors memory usage patterns and leak detection
- Detects performance regressions against baselines
## ๐ง Configuration
### Default Configuration
```typescript
const config: E2ETestConfig = {
maxConcurrentTests: 3,
defaultTimeout: 60000, // 60 seconds
performanceThresholds: {
evolutionTime: 30000,
trajectoryRecording: 1000,
paretoFrontierQuery: 500,
memoryOperations: 100,
},
retryOptions: {
maxRetries: 3,
baseDelay: 1000,
},
};
```
### Environment Variables
```bash
# Test environment configuration
NODE_ENV=test
GEPA_TEST_MODE=integration
GEPA_CONFIG_PATH=./temp/integration/gepa.config.json
GEPA_DATA_DIR=./temp/integration
# Performance tuning
NODE_OPTIONS='--max-old-space-size=16384'
```
## ๐ Performance Monitoring
### Real-time Metrics
- **Response Times**: Average, min, max, standard deviation
- **Throughput**: Operations per second under load
- **Success Rates**: Percentage of successful operations
- **Memory Usage**: Peak usage, average usage, leak detection
- **Resource Utilization**: CPU, memory, I/O percentages
### Benchmark Reports
```
๐ Performance Benchmark Results
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Overall Score: 87.3/100
Total Execution Time: 45.2s
โก Individual Benchmark Results:
โ
trajectory_recording: 850ms avg, 1.2 ops/sec, 98.5% success
โ
pareto_frontier_query: 320ms avg, 3.1 ops/sec, 99.2% success
โ
memory_operations: 75ms avg, 13.3 ops/sec, 100% success
โ
evolution_cycle: 28,500ms avg, 0.035 ops/sec, 95.0% success
โ
concurrent_operations: 1,800ms avg, 2.8 ops/sec, 97.5% success
๐พ Resource Utilization:
CPU: 65.2%
Memory: 78.9%
I/O: 42.1%
```
## ๐ Test Result Analysis
### Success Metrics
- **Pass Rate**: >95% for production readiness
- **Performance Score**: >80/100 for acceptable performance
- **Memory Efficiency**: No memory leaks detected
- **Error Recovery**: All critical errors must be recoverable
### Failure Analysis
- **Critical Errors**: System-breaking failures requiring immediate attention
- **Performance Regressions**: >10% performance degradation from baseline
- **Memory Leaks**: Unrecovered memory after garbage collection
- **Component Failures**: Unrecoverable service or adapter failures
## ๐จ Troubleshooting
### Common Issues
**Test Timeouts**
```bash
# Increase timeout for slow environments
npm run test:e2e -- --testTimeout=120000
```
**Memory Issues**
```bash
# Increase Node.js memory limit
NODE_OPTIONS='--max-old-space-size=32768' npm run test:e2e
```
**Component Initialization Failures**
```bash
# Validate environment before testing
npm run test:e2e:validate
```
### Debug Mode
```bash
# Enable verbose logging
DEBUG=gepa:* npm run test:e2e
# Run with Node.js inspector
node --inspect-brk ./node_modules/.bin/vitest run src/test/integration/gepa-e2e.test.ts
```
## ๐ Development Guide
### Adding New Test Scenarios
1. **Create scenario in TestScenarios class**:
```typescript
async createMyNewScenario(): Promise<ScenarioResult> {
// Implementation
}
```
2. **Add test case in gepa-e2e.test.ts**:
```typescript
test('My New Scenario Test', async () => {
const scenario = await testRunner.environment!.scenarios.createMyNewScenario();
const result = await testRunner.environment!.testHelpers.executeScenario(scenario);
expect(result.success).toBe(true);
});
```
### Adding Performance Benchmarks
1. **Implement benchmark in PerformanceBenchmarks**:
```typescript
async benchmarkMyOperation(): Promise<BenchmarkResult> {
// Implementation
}
```
2. **Add to comprehensive benchmark suite**:
```typescript
const myOperationBenchmark = await this.benchmarkMyOperation();
```
### Custom Test Helpers
1. **Add helper method to E2ETestHelpers**:
```typescript
async myCustomHelper(): Promise<MyResult> {
// Implementation
}
```
2. **Use in test scenarios**:
```typescript
const result = await testRunner.environment!.testHelpers.myCustomHelper();
```
## ๐ Integration Points
### GEPA Core Components
- **TrajectoryStore**: Execution data persistence
- **ParetoFrontier**: Multi-objective optimization
- **ReflectionEngine**: Failure analysis and improvement
- **LLMAdapter**: External LLM integration
- **PromptMutator**: Genetic operations and evolution
### MCP Server Integration
- **Tool Registration**: All 6 GEPA MCP tools
- **Request/Response Validation**: Input/output schema compliance
- **Error Handling**: Graceful failure and recovery
- **Performance Monitoring**: Tool execution metrics
### Test Infrastructure
- **Vitest Framework**: Test runner and assertions
- **TypeScript Support**: Full type safety and IDE integration
- **CI/CD Integration**: Automated testing in pipelines
- **Coverage Reporting**: Code coverage metrics and reports
## ๐ Contributing
### Test Development Guidelines
1. **Comprehensive Coverage**: Test both happy path and edge cases
2. **Performance Awareness**: Include timing and resource usage validation
3. **Error Scenarios**: Test failure modes and recovery procedures
4. **Documentation**: Document test purpose and expected outcomes
5. **Isolation**: Ensure tests don't interfere with each other
### Code Quality Standards
- **TypeScript**: Full type safety with strict mode
- **ESLint**: Consistent code style and best practices
- **Prettier**: Automated code formatting
- **Test Coverage**: >90% for E2E test framework code
### Performance Considerations
- **Resource Management**: Proper cleanup and resource disposal
- **Memory Efficiency**: Avoid memory leaks in test infrastructure
- **Execution Speed**: Optimize test execution time without sacrificing coverage
- **Parallel Execution**: Support concurrent test execution where possible
## ๐ License
This E2E testing framework is part of the GEPA MCP Server project and is licensed under the MIT License.