Skip to main content
Glama

Prompt Auto-Optimizer MCP

by sloth-wq
DEVELOPER_GUIDE.md21.7 kB
# GEPA Developer Guide *Genetic Evolutionary Prompt Adaptation - Complete Developer Reference* ## Table of Contents 1. [Architecture Overview](#architecture-overview) 2. [System Components](#system-components) 3. [Data Flow & Interactions](#data-flow--interactions) 4. [Development Setup](#development-setup) 5. [Testing Strategies](#testing-strategies) 6. [Performance Profiling](#performance-profiling) 7. [Extension Points](#extension-points) 8. [Advanced Topics](#advanced-topics) 9. [Troubleshooting Guide](#troubleshooting-guide) 10. [Best Practices](#best-practices) ## Architecture Overview GEPA implements a sophisticated genetic evolutionary algorithm for prompt optimization using multi-objective optimization, reflection-based learning, and performance-driven adaptation. ### Core Design Principles - **Multi-Objective Optimization**: Balances multiple fitness dimensions using Pareto frontier analysis - **Genetic Evolution**: Uses mutation, crossover, and selection strategies for prompt improvement - **Reflection-Based Learning**: Analyzes execution trajectories to identify failure patterns - **Performance-First**: Built with comprehensive monitoring, caching, and optimization - **Resilience**: Implements circuit breakers, graceful degradation, and fault tolerance ### System Architecture ``` ┌─────────────────────────────────────────────────────────────────┐ │ GEPA MCP Server │ ├─────────────────────────────────────────────────────────────────┤ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ Evolution Engine │ │ Pareto Frontier │ │ Reflection │ │ │ │ - Orchestrates │ │ - Multi-obj opt │ │ Engine │ │ │ │ generations │ │ - Dominance │ │ - Trajectory │ │ │ │ - Population │ │ checking │ │ analysis │ │ │ │ management │ │ - Sampling │ │ - Pattern │ │ │ │ - Convergence │ │ strategies │ │ detection │ │ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ ├─────────────────────────────────────────────────────────────────┤ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ Prompt Mutator │ │ Performance │ │ LLM Adapter │ │ │ │ - Genetic ops │ │ Tracker │ │ - Provider │ │ │ │ - Validation │ │ - Metrics │ │ abstraction │ │ │ │ - Diversity │ │ - Analytics │ │ - Resilience │ │ │ │ management │ │ - Reporting │ │ - Caching │ │ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ ├─────────────────────────────────────────────────────────────────┤ │ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │ │ │ Trajectory │ │ Memory Leak │ │ Resilience │ │ │ │ Store │ │ Detection │ │ Framework │ │ │ │ - Persistence │ │ - Monitoring │ │ - Circuit │ │ │ │ - Querying │ │ - Analytics │ │ breakers │ │ │ │ - Optimization │ │ - Cleanup │ │ - Retry logic │ │ │ └─────────────────┘ └─────────────────┘ └─────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ``` ## System Components ### 1. Evolution Engine (`src/core/prompt-evolution.ts`) **Purpose**: Orchestrates the complete genetic evolutionary process **Key Responsibilities**: - Population initialization and management - Generation cycle execution - Convergence detection and early stopping - Integration with all other components - Performance optimization through parallel processing **Configuration Options**: ```typescript interface EvolutionConfig { taskDescription: string; seedPrompt?: string; targetModules?: string[]; maxGenerations: number; // Default: 10 populationSize: number; // Default: 20 mutationRate: number; // Default: 0.4 } ``` **Key Methods**: - `startEvolution()`: Initiates complete evolution process - `runGeneration()`: Executes single generation cycle - `generateCandidates()`: Creates new mutations via parallel processing - `evaluatePopulation()`: Assesses candidate fitness with concurrency control ### 2. Pareto Frontier (`src/core/pareto-frontier.ts`) **Purpose**: Multi-objective optimization using Pareto dominance **Key Features**: - Efficient dominance checking with spatial indexing - Multiple sampling strategies (uniform, UCB, epsilon-greedy) - Hypervolume calculation for convergence analysis - Memory leak prevention and performance optimization **Objective Configuration**: ```typescript interface ParetoObjective { name: string; weight: number; direction: 'maximize' | 'minimize'; extractor: (candidate: PromptCandidate) => number; } ``` **Sampling Strategies**: - **Uniform**: Random selection from frontier - **UCB**: Upper confidence bound for exploration/exploitation balance - **Epsilon-Greedy**: Balanced random and greedy selection ### 3. Reflection Engine (`src/core/reflection-engine.ts`) **Purpose**: Analyzes execution failures to guide prompt improvements **Core Capabilities**: - Trajectory analysis using LLM-based pattern recognition - Batch processing for common pattern identification - Caching for performance optimization - Confidence-based filtering **Analysis Pipeline**: 1. **Trajectory Validation**: Ensures data integrity 2. **LLM Analysis**: Identifies failure patterns and root causes 3. **Suggestion Generation**: Creates specific improvement recommendations 4. **Confidence Assessment**: Filters low-confidence suggestions ### 4. Prompt Mutator (`src/services/prompt-mutator.ts`) **Purpose**: Implements genetic operations for prompt evolution **Mutation Strategies**: - **Reflective**: Based on trajectory analysis insights - **Crossover**: Combines successful elements from multiple prompts - **Adaptive**: Task-context specific optimizations - **Random**: Maintains genetic diversity **Validation Pipeline**: ```typescript // Comprehensive validation chain validatePromptStructure() → validatePromptLength() → validateContentSafety() → validateEssentialComponents() → validateMutationDivergence() ``` ### 5. Performance Tracker (`src/services/performance-tracker.ts`) **Purpose**: Comprehensive system monitoring and analytics **Metrics Categories**: - **Execution Metrics**: Duration, success rates, error patterns - **Memory Metrics**: Heap usage, leak detection, GC analysis - **Token Usage**: Cost tracking and optimization insights - **Evolution Metrics**: Generation statistics, convergence analysis **Key Features**: - Real-time monitoring with event emission - Statistical analysis (percentiles, trends, anomalies) - Memory leak detection with predictive analytics - Configurable reporting (JSON, human-readable, CSV) ## Data Flow & Interactions ### Evolution Cycle Flow ```mermaid graph TD A[Start Evolution] --> B[Initialize Population] B --> C[Generate Candidates] C --> D[Evaluate Population] D --> E[Update Pareto Frontier] E --> F[Select Survivors] F --> G[Check Convergence] G -->|No| H[Analyze Trajectories] H --> I[Generate Reflective Mutations] I --> C G -->|Yes| J[Return Best Candidate] C --> K[Prompt Mutator] K --> L[Reflection Engine] L --> M[Performance Tracker] D --> N[LLM Adapter] E --> O[Trajectory Store] ``` ### Component Interactions 1. **Evolution Engine** ↔ **Pareto Frontier** - Adds evaluated candidates to frontier - Retrieves non-dominated survivors - Checks convergence metrics 2. **Evolution Engine** ↔ **Reflection Engine** - Requests trajectory analysis for failed executions - Receives improvement suggestions for prompt mutations 3. **Prompt Mutator** ↔ **LLM Adapter** - Generates mutations using LLM capabilities - Validates generated content 4. **Performance Tracker** ↔ **All Components** - Collects metrics from all operations - Provides performance insights and alerts ## Development Setup ### Prerequisites ```bash # Node.js 18+ required node --version # Should be v18.0.0 or higher npm --version # Should be v8.0.0 or higher # TypeScript for development npm install -g typescript ``` ### Environment Setup 1. **Clone and Install Dependencies**: ```bash git clone <repository-url> cd gepa-mcp-server npm install ``` 2. **Environment Configuration**: ```bash # Copy environment template cp .env.example .env # Configure LLM provider settings export OPENAI_API_KEY="your-api-key" export ANTHROPIC_API_KEY="your-claude-key" ``` 3. **Build System**: ```bash # Development build with watching npm run dev # Production build npm run build # Type checking npm run typecheck ``` ### Local Development Workflow 1. **Start Development Server**: ```bash # Start with hot reloading npm run dev # Start with debugging npm run dev:debug ``` 2. **Run Test Suite**: ```bash # Unit tests npm test # Integration tests npm run test:integration # Memory optimization tests npm run test:memory # Full test suite with coverage npm run test:coverage ``` 3. **Performance Profiling**: ```bash # Memory profiling npm run profile:memory # CPU profiling npm run profile:cpu # Benchmark suite npm run benchmark ``` ## Testing Strategies ### Test Architecture GEPA uses a comprehensive testing strategy with multiple layers: #### 1. Unit Tests - **Location**: `src/**/*.test.ts` - **Focus**: Individual component behavior - **Coverage Target**: >90% ```typescript // Example unit test structure describe('EvolutionEngine', () => { let engine: EvolutionEngine; let mockDependencies: EvolutionEngineDependencies; beforeEach(() => { mockDependencies = createMockDependencies(); engine = new EvolutionEngine(mockDependencies); }); describe('startEvolution', () => { it('should complete evolution cycle', async () => { const result = await engine.startEvolution(testParams); expect(result.convergenceAchieved).toBe(true); expect(result.generations).toBeGreaterThan(0); }); }); }); ``` #### 2. Integration Tests - **Location**: `src/test/integration/` - **Focus**: Component interactions and workflows - **Scenarios**: End-to-end evolution cycles ```typescript // Integration test example describe('GEPA Integration', () => { it('should optimize chat application prompts', async () => { const gepaServer = new GepaServer(config); const result = await gepaServer.optimizePrompt({ taskDescription: 'Improve customer service chat responses', seedPrompt: 'You are a helpful assistant.', maxGenerations: 5 }); expect(result.bestPrompt.averageScore).toBeGreaterThan(0.8); }); }); ``` #### 3. Memory Tests - **Location**: `src/test/stress/` - **Focus**: Memory leak detection and performance validation - **Tools**: Memory profiling, heap snapshots #### 4. Chaos Tests - **Location**: `src/test/chaos/` - **Focus**: Resilience under failure conditions - **Scenarios**: Network failures, resource exhaustion, component crashes ### Test Data Management **Test Fixtures**: `src/test/fixtures/sample-data.ts` ```typescript export const samplePromptCandidate: PromptCandidate = { id: 'test-candidate-1', content: 'You are a helpful assistant specialized in...', generation: 1, taskPerformance: new Map([['test-task', 0.85]]), averageScore: 0.85, rolloutCount: 5, createdAt: new Date(), lastEvaluated: new Date(), mutationType: 'initial' }; ``` ## Performance Profiling ### Memory Profiling 1. **Heap Snapshot Analysis**: ```bash # Generate heap snapshot npm run profile:heap # Analyze with Chrome DevTools # 1. Open Chrome DevTools # 2. Memory tab → Load Profile # 3. Select generated .heapsnapshot file ``` 2. **Memory Leak Detection**: ```typescript // Built-in leak detection import { MemoryLeakIntegration } from './core/memory-leak-detector'; MemoryLeakIntegration.initialize(); // Automatic monitoring starts ``` 3. **Memory Optimization**: ```bash # Run memory optimization test suite npm run test:memory-optimization # Generate memory usage report npm run report:memory ``` ### Performance Benchmarking ```bash # Run complete benchmark suite npm run benchmark # Specific component benchmarks npm run benchmark:evolution npm run benchmark:pareto npm run benchmark:reflection ``` ### Real-time Monitoring ```typescript import { PerformanceTracker } from './services/performance-tracker'; const tracker = new PerformanceTracker({ enableRealTimeMonitoring: true, memoryTrackingEnabled: true }); // Subscribe to metrics tracker.subscribeToMetrics((metric) => { console.log(`${metric.name}: ${metric.duration}ms`); }); ``` ## Extension Points ### 1. Custom Mutation Strategies ```typescript // Extend PromptMutator with custom strategy class CustomPromptMutator extends PromptMutator { async generateSemanticMutations( prompt: PromptCandidate, context: SemanticContext ): Promise<PromptCandidate[]> { // Custom semantic analysis and mutation logic return mutations; } } ``` ### 2. Alternative LLM Providers ```typescript // Implement LLMAdapter interface class CustomLLMAdapter implements LLMAdapter { async callLLM(prompt: string): Promise<LLMResponse> { // Custom provider implementation } async evaluatePrompt( prompt: string, context: TaskContext ): Promise<EvaluationResult> { // Custom evaluation logic } } ``` ### 3. Custom Objectives ```typescript // Define custom Pareto objectives const customObjectives: ParetoObjective[] = [ { name: 'creativity', weight: 0.3, direction: 'maximize', extractor: (candidate) => calculateCreativityScore(candidate) }, { name: 'safety', weight: 0.4, direction: 'maximize', extractor: (candidate) => assessSafetyScore(candidate) } ]; ``` ### 4. Custom Storage Backends ```typescript // Implement TrajectoryStore interface class DatabaseTrajectoryStore implements TrajectoryStore { async save(trajectory: ExecutionTrajectory): Promise<void> { // Database persistence logic } async query(filter: TrajectoryFilter): Promise<ExecutionTrajectory[]> { // Database query logic } } ``` ## Advanced Topics ### Memory Management 1. **Leak Prevention**: ```typescript // Automatic cleanup using WeakMap and WeakRef class ComponentManager { private components = new WeakMap<object, Component>(); register(key: object, component: Component): void { this.components.set(key, component); // Automatic cleanup when key is garbage collected } } ``` 2. **Resource Pooling**: ```typescript // Object pooling for frequent allocations class CandidatePool { private pool: PromptCandidate[] = []; acquire(): PromptCandidate { return this.pool.pop() || this.createNew(); } release(candidate: PromptCandidate): void { this.reset(candidate); this.pool.push(candidate); } } ``` ### Performance Optimization 1. **Caching Strategies**: ```typescript // Multi-level caching with TTL class AnalysisCache { private l1Cache = new Map<string, any>(); // Hot data private l2Cache = new Map<string, any>(); // Warm data private ttlMap = new Map<string, number>(); // TTL tracking async get(key: string): Promise<any> { // L1 → L2 → Storage hierarchy } } ``` 2. **Parallel Processing**: ```typescript // Optimized batch processing async function processInBatches<T, R>( items: T[], processor: (item: T) => Promise<R>, batchSize: number = 10 ): Promise<R[]> { const results: R[] = []; for (let i = 0; i < items.length; i += batchSize) { const batch = items.slice(i, i + batchSize); const batchResults = await Promise.all( batch.map(processor) ); results.push(...batchResults); } return results; } ``` ### Scaling Considerations 1. **Distributed Processing**: ```typescript // Workload distribution across nodes interface DistributedEvolution { distributeGeneration( population: PromptCandidate[], nodes: NodeInfo[] ): Promise<DistributionPlan>; collectResults( distributionPlan: DistributionPlan ): Promise<EvaluationResult[]>; } ``` 2. **Resource Management**: ```typescript // Dynamic resource allocation class ResourceManager { private cpuPool: number; private memoryPool: number; async allocateResources( operation: string, requirements: ResourceRequirements ): Promise<ResourceAllocation> { // Dynamic allocation based on current load } } ``` ## Troubleshooting Guide ### Common Issues 1. **Memory Leaks**: ```bash # Symptoms: Increasing heap usage, eventual OOM # Diagnosis: npm run profile:memory npm run test:memory-leak # Solution: Check component cleanup, event listeners ``` 2. **Performance Degradation**: ```bash # Symptoms: Slower evolution cycles, high CPU usage # Diagnosis: npm run benchmark npm run profile:cpu # Solution: Check batch sizes, parallel processing limits ``` 3. **Convergence Issues**: ```typescript // Debug convergence problems const convergenceMetrics = paretoFrontier.getConvergenceMetrics(); console.log('Diversity:', convergenceMetrics.diversity); console.log('Spacing:', convergenceMetrics.spacing); // Adjust parameters: // - Increase population size // - Adjust mutation rate // - Modify selection pressure ``` ### Debug Configuration ```typescript // Enable debug logging const config: EvolutionConfig = { // ... other config debug: { enableVerboseLogging: true, logGenerationStats: true, trackMemoryUsage: true, saveIntermediateResults: true } }; ``` ### Performance Monitoring ```typescript // Real-time performance monitoring const performanceTracker = new PerformanceTracker({ enableRealTimeMonitoring: true, memoryTrackingEnabled: true }); performanceTracker.subscribeToMetrics((metric) => { if (metric.category === 'evolution' && metric.duration > 10000) { console.warn(`Slow evolution detected: ${metric.duration}ms`); } }); ``` ## Best Practices ### Code Organization 1. **Dependency Injection**: ```typescript // Use dependency injection for testability class EvolutionEngine { constructor(private dependencies: EvolutionEngineDependencies) { // Validate dependencies in constructor this.validateDependencies(dependencies); } } ``` 2. **Error Handling**: ```typescript // Comprehensive error handling with context try { await this.processGeneration(generation); } catch (error) { const context = { generation: generation.id, population: generation.candidates.length, timestamp: Date.now() }; this.logger.error('Generation processing failed', { error, context }); throw new GenerationError('Failed to process generation', error, context); } ``` 3. **Configuration Management**: ```typescript // Centralized configuration with validation interface GepaConfig { evolution: EvolutionConfig; performance: PerformanceConfig; logging: LoggingConfig; } class ConfigManager { static validate(config: GepaConfig): void { // Comprehensive validation } static getDefaults(): GepaConfig { // Safe defaults } } ``` ### Performance Guidelines 1. **Batch Operations**: Always process items in batches to avoid overwhelming the system 2. **Resource Limits**: Set appropriate limits for concurrent operations 3. **Memory Management**: Use weak references for temporary objects 4. **Caching**: Implement intelligent caching with TTL and size limits 5. **Monitoring**: Always include performance monitoring in production ### Testing Guidelines 1. **Test Isolation**: Each test should be independent and repeatable 2. **Mock External Dependencies**: Use mocks for LLM providers and external services 3. **Performance Tests**: Include performance benchmarks in CI/CD 4. **Memory Tests**: Regular memory leak detection tests 5. **Integration Tests**: Test complete workflows end-to-end This developer guide provides comprehensive coverage of GEPA's architecture, components, and development practices. For specific examples and advanced use cases, see the `docs/examples/` directory.

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sloth-wq/prompt-auto-optimizer-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server