DEVELOPER_GUIDE.md•21.7 kB
# GEPA Developer Guide
*Genetic Evolutionary Prompt Adaptation - Complete Developer Reference*
## Table of Contents
1. [Architecture Overview](#architecture-overview)
2. [System Components](#system-components)
3. [Data Flow & Interactions](#data-flow--interactions)
4. [Development Setup](#development-setup)
5. [Testing Strategies](#testing-strategies)
6. [Performance Profiling](#performance-profiling)
7. [Extension Points](#extension-points)
8. [Advanced Topics](#advanced-topics)
9. [Troubleshooting Guide](#troubleshooting-guide)
10. [Best Practices](#best-practices)
## Architecture Overview
GEPA implements a sophisticated genetic evolutionary algorithm for prompt optimization using multi-objective optimization, reflection-based learning, and performance-driven adaptation.
### Core Design Principles
- **Multi-Objective Optimization**: Balances multiple fitness dimensions using Pareto frontier analysis
- **Genetic Evolution**: Uses mutation, crossover, and selection strategies for prompt improvement
- **Reflection-Based Learning**: Analyzes execution trajectories to identify failure patterns
- **Performance-First**: Built with comprehensive monitoring, caching, and optimization
- **Resilience**: Implements circuit breakers, graceful degradation, and fault tolerance
### System Architecture
```
┌─────────────────────────────────────────────────────────────────┐
│ GEPA MCP Server │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Evolution Engine │ │ Pareto Frontier │ │ Reflection │ │
│ │ - Orchestrates │ │ - Multi-obj opt │ │ Engine │ │
│ │ generations │ │ - Dominance │ │ - Trajectory │ │
│ │ - Population │ │ checking │ │ analysis │ │
│ │ management │ │ - Sampling │ │ - Pattern │ │
│ │ - Convergence │ │ strategies │ │ detection │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Prompt Mutator │ │ Performance │ │ LLM Adapter │ │
│ │ - Genetic ops │ │ Tracker │ │ - Provider │ │
│ │ - Validation │ │ - Metrics │ │ abstraction │ │
│ │ - Diversity │ │ - Analytics │ │ - Resilience │ │
│ │ management │ │ - Reporting │ │ - Caching │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ Trajectory │ │ Memory Leak │ │ Resilience │ │
│ │ Store │ │ Detection │ │ Framework │ │
│ │ - Persistence │ │ - Monitoring │ │ - Circuit │ │
│ │ - Querying │ │ - Analytics │ │ breakers │ │
│ │ - Optimization │ │ - Cleanup │ │ - Retry logic │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
```
## System Components
### 1. Evolution Engine (`src/core/prompt-evolution.ts`)
**Purpose**: Orchestrates the complete genetic evolutionary process
**Key Responsibilities**:
- Population initialization and management
- Generation cycle execution
- Convergence detection and early stopping
- Integration with all other components
- Performance optimization through parallel processing
**Configuration Options**:
```typescript
interface EvolutionConfig {
taskDescription: string;
seedPrompt?: string;
targetModules?: string[];
maxGenerations: number; // Default: 10
populationSize: number; // Default: 20
mutationRate: number; // Default: 0.4
}
```
**Key Methods**:
- `startEvolution()`: Initiates complete evolution process
- `runGeneration()`: Executes single generation cycle
- `generateCandidates()`: Creates new mutations via parallel processing
- `evaluatePopulation()`: Assesses candidate fitness with concurrency control
### 2. Pareto Frontier (`src/core/pareto-frontier.ts`)
**Purpose**: Multi-objective optimization using Pareto dominance
**Key Features**:
- Efficient dominance checking with spatial indexing
- Multiple sampling strategies (uniform, UCB, epsilon-greedy)
- Hypervolume calculation for convergence analysis
- Memory leak prevention and performance optimization
**Objective Configuration**:
```typescript
interface ParetoObjective {
name: string;
weight: number;
direction: 'maximize' | 'minimize';
extractor: (candidate: PromptCandidate) => number;
}
```
**Sampling Strategies**:
- **Uniform**: Random selection from frontier
- **UCB**: Upper confidence bound for exploration/exploitation balance
- **Epsilon-Greedy**: Balanced random and greedy selection
### 3. Reflection Engine (`src/core/reflection-engine.ts`)
**Purpose**: Analyzes execution failures to guide prompt improvements
**Core Capabilities**:
- Trajectory analysis using LLM-based pattern recognition
- Batch processing for common pattern identification
- Caching for performance optimization
- Confidence-based filtering
**Analysis Pipeline**:
1. **Trajectory Validation**: Ensures data integrity
2. **LLM Analysis**: Identifies failure patterns and root causes
3. **Suggestion Generation**: Creates specific improvement recommendations
4. **Confidence Assessment**: Filters low-confidence suggestions
### 4. Prompt Mutator (`src/services/prompt-mutator.ts`)
**Purpose**: Implements genetic operations for prompt evolution
**Mutation Strategies**:
- **Reflective**: Based on trajectory analysis insights
- **Crossover**: Combines successful elements from multiple prompts
- **Adaptive**: Task-context specific optimizations
- **Random**: Maintains genetic diversity
**Validation Pipeline**:
```typescript
// Comprehensive validation chain
validatePromptStructure() →
validatePromptLength() →
validateContentSafety() →
validateEssentialComponents() →
validateMutationDivergence()
```
### 5. Performance Tracker (`src/services/performance-tracker.ts`)
**Purpose**: Comprehensive system monitoring and analytics
**Metrics Categories**:
- **Execution Metrics**: Duration, success rates, error patterns
- **Memory Metrics**: Heap usage, leak detection, GC analysis
- **Token Usage**: Cost tracking and optimization insights
- **Evolution Metrics**: Generation statistics, convergence analysis
**Key Features**:
- Real-time monitoring with event emission
- Statistical analysis (percentiles, trends, anomalies)
- Memory leak detection with predictive analytics
- Configurable reporting (JSON, human-readable, CSV)
## Data Flow & Interactions
### Evolution Cycle Flow
```mermaid
graph TD
A[Start Evolution] --> B[Initialize Population]
B --> C[Generate Candidates]
C --> D[Evaluate Population]
D --> E[Update Pareto Frontier]
E --> F[Select Survivors]
F --> G[Check Convergence]
G -->|No| H[Analyze Trajectories]
H --> I[Generate Reflective Mutations]
I --> C
G -->|Yes| J[Return Best Candidate]
C --> K[Prompt Mutator]
K --> L[Reflection Engine]
L --> M[Performance Tracker]
D --> N[LLM Adapter]
E --> O[Trajectory Store]
```
### Component Interactions
1. **Evolution Engine** ↔ **Pareto Frontier**
- Adds evaluated candidates to frontier
- Retrieves non-dominated survivors
- Checks convergence metrics
2. **Evolution Engine** ↔ **Reflection Engine**
- Requests trajectory analysis for failed executions
- Receives improvement suggestions for prompt mutations
3. **Prompt Mutator** ↔ **LLM Adapter**
- Generates mutations using LLM capabilities
- Validates generated content
4. **Performance Tracker** ↔ **All Components**
- Collects metrics from all operations
- Provides performance insights and alerts
## Development Setup
### Prerequisites
```bash
# Node.js 18+ required
node --version # Should be v18.0.0 or higher
npm --version # Should be v8.0.0 or higher
# TypeScript for development
npm install -g typescript
```
### Environment Setup
1. **Clone and Install Dependencies**:
```bash
git clone <repository-url>
cd gepa-mcp-server
npm install
```
2. **Environment Configuration**:
```bash
# Copy environment template
cp .env.example .env
# Configure LLM provider settings
export OPENAI_API_KEY="your-api-key"
export ANTHROPIC_API_KEY="your-claude-key"
```
3. **Build System**:
```bash
# Development build with watching
npm run dev
# Production build
npm run build
# Type checking
npm run typecheck
```
### Local Development Workflow
1. **Start Development Server**:
```bash
# Start with hot reloading
npm run dev
# Start with debugging
npm run dev:debug
```
2. **Run Test Suite**:
```bash
# Unit tests
npm test
# Integration tests
npm run test:integration
# Memory optimization tests
npm run test:memory
# Full test suite with coverage
npm run test:coverage
```
3. **Performance Profiling**:
```bash
# Memory profiling
npm run profile:memory
# CPU profiling
npm run profile:cpu
# Benchmark suite
npm run benchmark
```
## Testing Strategies
### Test Architecture
GEPA uses a comprehensive testing strategy with multiple layers:
#### 1. Unit Tests
- **Location**: `src/**/*.test.ts`
- **Focus**: Individual component behavior
- **Coverage Target**: >90%
```typescript
// Example unit test structure
describe('EvolutionEngine', () => {
let engine: EvolutionEngine;
let mockDependencies: EvolutionEngineDependencies;
beforeEach(() => {
mockDependencies = createMockDependencies();
engine = new EvolutionEngine(mockDependencies);
});
describe('startEvolution', () => {
it('should complete evolution cycle', async () => {
const result = await engine.startEvolution(testParams);
expect(result.convergenceAchieved).toBe(true);
expect(result.generations).toBeGreaterThan(0);
});
});
});
```
#### 2. Integration Tests
- **Location**: `src/test/integration/`
- **Focus**: Component interactions and workflows
- **Scenarios**: End-to-end evolution cycles
```typescript
// Integration test example
describe('GEPA Integration', () => {
it('should optimize chat application prompts', async () => {
const gepaServer = new GepaServer(config);
const result = await gepaServer.optimizePrompt({
taskDescription: 'Improve customer service chat responses',
seedPrompt: 'You are a helpful assistant.',
maxGenerations: 5
});
expect(result.bestPrompt.averageScore).toBeGreaterThan(0.8);
});
});
```
#### 3. Memory Tests
- **Location**: `src/test/stress/`
- **Focus**: Memory leak detection and performance validation
- **Tools**: Memory profiling, heap snapshots
#### 4. Chaos Tests
- **Location**: `src/test/chaos/`
- **Focus**: Resilience under failure conditions
- **Scenarios**: Network failures, resource exhaustion, component crashes
### Test Data Management
**Test Fixtures**: `src/test/fixtures/sample-data.ts`
```typescript
export const samplePromptCandidate: PromptCandidate = {
id: 'test-candidate-1',
content: 'You are a helpful assistant specialized in...',
generation: 1,
taskPerformance: new Map([['test-task', 0.85]]),
averageScore: 0.85,
rolloutCount: 5,
createdAt: new Date(),
lastEvaluated: new Date(),
mutationType: 'initial'
};
```
## Performance Profiling
### Memory Profiling
1. **Heap Snapshot Analysis**:
```bash
# Generate heap snapshot
npm run profile:heap
# Analyze with Chrome DevTools
# 1. Open Chrome DevTools
# 2. Memory tab → Load Profile
# 3. Select generated .heapsnapshot file
```
2. **Memory Leak Detection**:
```typescript
// Built-in leak detection
import { MemoryLeakIntegration } from './core/memory-leak-detector';
MemoryLeakIntegration.initialize();
// Automatic monitoring starts
```
3. **Memory Optimization**:
```bash
# Run memory optimization test suite
npm run test:memory-optimization
# Generate memory usage report
npm run report:memory
```
### Performance Benchmarking
```bash
# Run complete benchmark suite
npm run benchmark
# Specific component benchmarks
npm run benchmark:evolution
npm run benchmark:pareto
npm run benchmark:reflection
```
### Real-time Monitoring
```typescript
import { PerformanceTracker } from './services/performance-tracker';
const tracker = new PerformanceTracker({
enableRealTimeMonitoring: true,
memoryTrackingEnabled: true
});
// Subscribe to metrics
tracker.subscribeToMetrics((metric) => {
console.log(`${metric.name}: ${metric.duration}ms`);
});
```
## Extension Points
### 1. Custom Mutation Strategies
```typescript
// Extend PromptMutator with custom strategy
class CustomPromptMutator extends PromptMutator {
async generateSemanticMutations(
prompt: PromptCandidate,
context: SemanticContext
): Promise<PromptCandidate[]> {
// Custom semantic analysis and mutation logic
return mutations;
}
}
```
### 2. Alternative LLM Providers
```typescript
// Implement LLMAdapter interface
class CustomLLMAdapter implements LLMAdapter {
async callLLM(prompt: string): Promise<LLMResponse> {
// Custom provider implementation
}
async evaluatePrompt(
prompt: string,
context: TaskContext
): Promise<EvaluationResult> {
// Custom evaluation logic
}
}
```
### 3. Custom Objectives
```typescript
// Define custom Pareto objectives
const customObjectives: ParetoObjective[] = [
{
name: 'creativity',
weight: 0.3,
direction: 'maximize',
extractor: (candidate) => calculateCreativityScore(candidate)
},
{
name: 'safety',
weight: 0.4,
direction: 'maximize',
extractor: (candidate) => assessSafetyScore(candidate)
}
];
```
### 4. Custom Storage Backends
```typescript
// Implement TrajectoryStore interface
class DatabaseTrajectoryStore implements TrajectoryStore {
async save(trajectory: ExecutionTrajectory): Promise<void> {
// Database persistence logic
}
async query(filter: TrajectoryFilter): Promise<ExecutionTrajectory[]> {
// Database query logic
}
}
```
## Advanced Topics
### Memory Management
1. **Leak Prevention**:
```typescript
// Automatic cleanup using WeakMap and WeakRef
class ComponentManager {
private components = new WeakMap<object, Component>();
register(key: object, component: Component): void {
this.components.set(key, component);
// Automatic cleanup when key is garbage collected
}
}
```
2. **Resource Pooling**:
```typescript
// Object pooling for frequent allocations
class CandidatePool {
private pool: PromptCandidate[] = [];
acquire(): PromptCandidate {
return this.pool.pop() || this.createNew();
}
release(candidate: PromptCandidate): void {
this.reset(candidate);
this.pool.push(candidate);
}
}
```
### Performance Optimization
1. **Caching Strategies**:
```typescript
// Multi-level caching with TTL
class AnalysisCache {
private l1Cache = new Map<string, any>(); // Hot data
private l2Cache = new Map<string, any>(); // Warm data
private ttlMap = new Map<string, number>(); // TTL tracking
async get(key: string): Promise<any> {
// L1 → L2 → Storage hierarchy
}
}
```
2. **Parallel Processing**:
```typescript
// Optimized batch processing
async function processInBatches<T, R>(
items: T[],
processor: (item: T) => Promise<R>,
batchSize: number = 10
): Promise<R[]> {
const results: R[] = [];
for (let i = 0; i < items.length; i += batchSize) {
const batch = items.slice(i, i + batchSize);
const batchResults = await Promise.all(
batch.map(processor)
);
results.push(...batchResults);
}
return results;
}
```
### Scaling Considerations
1. **Distributed Processing**:
```typescript
// Workload distribution across nodes
interface DistributedEvolution {
distributeGeneration(
population: PromptCandidate[],
nodes: NodeInfo[]
): Promise<DistributionPlan>;
collectResults(
distributionPlan: DistributionPlan
): Promise<EvaluationResult[]>;
}
```
2. **Resource Management**:
```typescript
// Dynamic resource allocation
class ResourceManager {
private cpuPool: number;
private memoryPool: number;
async allocateResources(
operation: string,
requirements: ResourceRequirements
): Promise<ResourceAllocation> {
// Dynamic allocation based on current load
}
}
```
## Troubleshooting Guide
### Common Issues
1. **Memory Leaks**:
```bash
# Symptoms: Increasing heap usage, eventual OOM
# Diagnosis:
npm run profile:memory
npm run test:memory-leak
# Solution: Check component cleanup, event listeners
```
2. **Performance Degradation**:
```bash
# Symptoms: Slower evolution cycles, high CPU usage
# Diagnosis:
npm run benchmark
npm run profile:cpu
# Solution: Check batch sizes, parallel processing limits
```
3. **Convergence Issues**:
```typescript
// Debug convergence problems
const convergenceMetrics = paretoFrontier.getConvergenceMetrics();
console.log('Diversity:', convergenceMetrics.diversity);
console.log('Spacing:', convergenceMetrics.spacing);
// Adjust parameters:
// - Increase population size
// - Adjust mutation rate
// - Modify selection pressure
```
### Debug Configuration
```typescript
// Enable debug logging
const config: EvolutionConfig = {
// ... other config
debug: {
enableVerboseLogging: true,
logGenerationStats: true,
trackMemoryUsage: true,
saveIntermediateResults: true
}
};
```
### Performance Monitoring
```typescript
// Real-time performance monitoring
const performanceTracker = new PerformanceTracker({
enableRealTimeMonitoring: true,
memoryTrackingEnabled: true
});
performanceTracker.subscribeToMetrics((metric) => {
if (metric.category === 'evolution' && metric.duration > 10000) {
console.warn(`Slow evolution detected: ${metric.duration}ms`);
}
});
```
## Best Practices
### Code Organization
1. **Dependency Injection**:
```typescript
// Use dependency injection for testability
class EvolutionEngine {
constructor(private dependencies: EvolutionEngineDependencies) {
// Validate dependencies in constructor
this.validateDependencies(dependencies);
}
}
```
2. **Error Handling**:
```typescript
// Comprehensive error handling with context
try {
await this.processGeneration(generation);
} catch (error) {
const context = {
generation: generation.id,
population: generation.candidates.length,
timestamp: Date.now()
};
this.logger.error('Generation processing failed', { error, context });
throw new GenerationError('Failed to process generation', error, context);
}
```
3. **Configuration Management**:
```typescript
// Centralized configuration with validation
interface GepaConfig {
evolution: EvolutionConfig;
performance: PerformanceConfig;
logging: LoggingConfig;
}
class ConfigManager {
static validate(config: GepaConfig): void {
// Comprehensive validation
}
static getDefaults(): GepaConfig {
// Safe defaults
}
}
```
### Performance Guidelines
1. **Batch Operations**: Always process items in batches to avoid overwhelming the system
2. **Resource Limits**: Set appropriate limits for concurrent operations
3. **Memory Management**: Use weak references for temporary objects
4. **Caching**: Implement intelligent caching with TTL and size limits
5. **Monitoring**: Always include performance monitoring in production
### Testing Guidelines
1. **Test Isolation**: Each test should be independent and repeatable
2. **Mock External Dependencies**: Use mocks for LLM providers and external services
3. **Performance Tests**: Include performance benchmarks in CI/CD
4. **Memory Tests**: Regular memory leak detection tests
5. **Integration Tests**: Test complete workflows end-to-end
This developer guide provides comprehensive coverage of GEPA's architecture, components, and development practices. For specific examples and advanced use cases, see the `docs/examples/` directory.