USER_GUIDE.md•14 kB
# GEPA User Guide
*Genetic Evolutionary Prompt Adaptation - Complete User Reference*
## Table of Contents
1. [Getting Started](#getting-started)
2. [Core Concepts](#core-concepts)
3. [Common Use Cases](#common-use-cases)
4. [Step-by-Step Workflows](#step-by-step-workflows)
5. [Best Practices](#best-practices)
6. [Interpreting Results](#interpreting-results)
7. [Performance Tips](#performance-tips)
8. [Troubleshooting](#troubleshooting)
9. [FAQ](#faq)
## Getting Started
### What is GEPA?
GEPA (Genetic Evolutionary Prompt Adaptation) is an AI-powered system that automatically improves prompts through genetic algorithms. Think of it as having an AI assistant that learns from your prompt's performance and continuously evolves better versions.
### Key Benefits
- **Automatic Optimization**: No manual prompt engineering required
- **Multi-Objective Balance**: Optimizes for performance, creativity, and reliability
- **Learning from Failures**: Analyzes what went wrong and suggests improvements
- **Measurable Results**: Provides clear metrics on improvement
### Prerequisites
- **Basic understanding** of what prompts are (instructions to AI models)
- **No programming experience** required for basic usage
- **Claude Code** or compatible MCP client for advanced features
## Core Concepts
### 1. Evolution Process
GEPA improves prompts like nature improves organisms:
```
Initial Prompt → Generate Variations → Test Performance →
Keep Best → Create New Variations → Repeat
```
**Real Example**:
- Start: "Write a summary"
- Evolution: "Create a concise, informative summary that highlights key points"
- Evolution: "Write a well-structured summary with clear topic sentences and supporting details"
### 2. Multi-Objective Optimization
GEPA doesn't just optimize for one thing - it balances multiple goals:
- **Performance**: How well the prompt works
- **Creativity**: How diverse and innovative responses are
- **Consistency**: How reliably the prompt produces good results
- **Efficiency**: How quickly it generates responses
### 3. Learning from Failures
When a prompt doesn't work well, GEPA:
1. Analyzes what went wrong
2. Identifies patterns across multiple failures
3. Suggests specific improvements
4. Tests the improvements automatically
### 4. Pareto Frontier
GEPA maintains a collection of the "best" prompts that represent different trade-offs. For example:
- Prompt A: Very creative but sometimes inconsistent
- Prompt B: Very reliable but less creative
- Prompt C: Balanced between creativity and reliability
## Common Use Cases
### 1. Content Creation
**Goal**: Improve prompts for blog writing, marketing copy, or creative content
**Example Evolution**:
```
Starting prompt: "Write a blog post about AI"
After evolution: "Create an engaging, informative blog post about AI that:
- Opens with a compelling hook
- Explains concepts in accessible language
- Includes real-world examples
- Concludes with actionable insights
- Maintains a conversational, professional tone"
```
### 2. Code Generation
**Goal**: Better prompts for generating clean, functional code
**Example Evolution**:
```
Starting prompt: "Create a login function"
After evolution: "Generate a secure login function that:
- Includes proper input validation
- Handles errors gracefully
- Follows security best practices
- Includes clear documentation
- Uses modern JavaScript/TypeScript syntax"
```
### 3. Data Analysis
**Goal**: Improved prompts for analyzing and interpreting data
**Example Evolution**:
```
Starting prompt: "Analyze this data"
After evolution: "Perform a comprehensive data analysis that:
- Identifies key trends and patterns
- Highlights significant findings
- Provides context for the numbers
- Suggests actionable insights
- Presents findings in clear, prioritized format"
```
### 4. Customer Service
**Goal**: Better chatbot responses for customer support
**Example Evolution**:
```
Starting prompt: "Help the customer"
After evolution: "Provide helpful customer support by:
- Actively listening to the customer's concern
- Asking clarifying questions when needed
- Offering specific, actionable solutions
- Maintaining empathy and professionalism
- Following up to ensure satisfaction"
```
## Step-by-Step Workflows
### Workflow 1: Basic Prompt Optimization
**Step 1: Define Your Task**
```
Task: "Improve customer email responses"
Starting prompt: "Respond to customer emails professionally"
```
**Step 2: Start Evolution**
Use GEPA's evolution tool to begin optimization:
- Set population size (start with 10-20)
- Set number of generations (start with 5-10)
- Provide your starting prompt
**Step 3: Let GEPA Work**
GEPA will:
- Create variations of your prompt
- Test them against real scenarios
- Keep the best performers
- Generate new variations
**Step 4: Review Results**
After evolution completes, you'll get:
- The best-performing prompt
- Performance metrics
- Insights about what worked
**Step 5: Use and Monitor**
- Implement the optimized prompt
- Monitor its performance
- Run additional evolution cycles as needed
### Workflow 2: Advanced Multi-Objective Optimization
**Step 1: Define Multiple Goals**
```
Primary goal: High response quality
Secondary goal: Maintain creative variety
Constraint: Keep responses under 200 words
```
**Step 2: Configure Evolution**
- Set performance weight (e.g., 70%)
- Set diversity weight (e.g., 30%)
- Define evaluation criteria
**Step 3: Monitor Progress**
Track multiple metrics:
- Overall fitness score
- Performance vs. creativity balance
- Consistency across test cases
**Step 4: Select Optimal Candidate**
Choose the prompt that best balances your objectives:
- High performance when you need reliability
- Balanced when you need both creativity and performance
- High diversity when you need innovation
### Workflow 3: Failure Analysis and Improvement
**Step 1: Identify Problems**
When prompts aren't working:
- Collect examples of poor outputs
- Note patterns in failures
- Document specific issues
**Step 2: Use Reflection Analysis**
GEPA will analyze failures to identify:
- Common error patterns
- Root causes
- Specific improvement areas
**Step 3: Apply Insights**
GEPA suggests targeted improvements:
- More specific instructions
- Better context setting
- Clearer formatting requirements
**Step 4: Test Improvements**
Run evolution with reflection-based mutations:
- Focused on identified problem areas
- Validates improvements work
- Continues evolution with better starting point
## Best Practices
### 1. Starting Prompts
**Good Starting Prompts**:
- Clear and specific about the desired outcome
- Include context about the use case
- Mention any constraints or requirements
**Example**:
```
Good: "Write a professional email response to customer complaints that acknowledges their concern, provides a solution, and maintains a helpful tone"
Avoid: "Write an email"
```
### 2. Evolution Parameters
**Population Size**:
- Start small (10-20) for quick results
- Increase (30-50) for complex tasks
- Use larger populations (50+) for highly nuanced tasks
**Generations**:
- Start with 5-10 generations
- Monitor convergence - stop if no improvement
- Complex tasks may need 20+ generations
**Mutation Rate**:
- Default 15% works for most cases
- Increase for more exploration
- Decrease for fine-tuning
### 3. Task Descriptions
**Be Specific About Your Goal**:
```
Good: "Optimize prompts for generating engaging social media posts that drive high engagement rates"
Avoid: "Make prompts better"
```
**Include Success Criteria**:
```
Good: "The prompt should generate responses that are informative, engaging, and under 280 characters"
Avoid: "The prompt should work well"
```
### 4. Evaluation Strategy
**Test on Real Scenarios**:
- Use actual data and use cases
- Include edge cases and challenging examples
- Test across different contexts
**Multiple Evaluation Rounds**:
- Run 5-10 evaluations per prompt
- Look for consistency across evaluations
- Consider both average and worst-case performance
## Interpreting Results
### 1. Performance Metrics
**Fitness Score (0.0-1.0)**:
- 0.8-1.0: Excellent performance
- 0.6-0.8: Good performance
- 0.4-0.6: Needs improvement
- Below 0.4: Significant issues
**Success Rate**:
- Percentage of successful evaluations
- Higher is better
- Consider both rate and consistency
**Average Score**:
- Mean performance across all evaluations
- Balanced with diversity considerations
- Look for upward trends across generations
### 2. Evolution Progress
**Convergence Indicators**:
- Fitness scores plateau
- Less variation in new candidates
- Reflection analysis finds fewer issues
**Diversity Metrics**:
- Variety in prompt approaches
- Different strategies being explored
- Balance between exploration and exploitation
### 3. Comparative Analysis
**Before vs. After**:
- Direct comparison of original vs. evolved prompts
- Quantitative improvement metrics
- Qualitative assessment of output quality
**Pareto Frontier Analysis**:
- Multiple "best" options for different trade-offs
- Choose based on your specific priorities
- Consider context-specific needs
## Performance Tips
### 1. Optimization Strategies
**Start Simple**:
- Begin with basic evolution
- Add complexity gradually
- Focus on one objective initially
**Use Reflection Wisely**:
- Analyze failures regularly
- Apply insights to guide evolution
- Don't over-optimize for rare edge cases
**Monitor Resources**:
- Watch memory usage during large evolutions
- Use parallel evaluation judiciously
- Consider time constraints
### 2. Common Pitfalls to Avoid
**Over-Optimization**:
- Don't run too many generations without purpose
- Monitor for diminishing returns
- Stop when you achieve your goals
**Ignoring Context**:
- Test prompts in realistic conditions
- Consider deployment environment
- Account for user variability
**Single Metric Focus**:
- Balance multiple objectives
- Consider unintended consequences
- Maintain holistic evaluation
### 3. Scaling Considerations
**Large-Scale Evolution**:
- Use incremental improvement approaches
- Implement proper resource management
- Monitor system performance
**Production Deployment**:
- Validate evolved prompts thoroughly
- Implement gradual rollout strategies
- Monitor production performance
## Troubleshooting
### Common Issues and Solutions
#### 1. Evolution Not Improving
**Symptoms**:
- Fitness scores plateau quickly
- No significant improvement after multiple generations
- All candidates seem similar
**Solutions**:
- Increase mutation rate for more exploration
- Expand population size
- Check if task description is too restrictive
- Add more diverse evaluation scenarios
#### 2. Inconsistent Results
**Symptoms**:
- High variance in evaluation scores
- Good candidates sometimes perform poorly
- Unpredictable output quality
**Solutions**:
- Increase number of evaluation rollouts
- Add more specific constraints to prompts
- Use reflection analysis to identify failure patterns
- Consider environmental factors affecting performance
#### 3. Slow Performance
**Symptoms**:
- Evolution takes too long to complete
- High memory usage
- System becomes unresponsive
**Solutions**:
- Reduce population size
- Implement batch processing
- Use parallel evaluation efficiently
- Monitor and optimize resource usage
#### 4. Poor Quality Candidates
**Symptoms**:
- All evolved prompts perform worse than original
- Generated candidates don't make sense
- Evolution seems to be going in wrong direction
**Solutions**:
- Review task description for clarity
- Check evaluation criteria
- Use reflection analysis on failures
- Start with a better seed prompt
### Getting Help
**Built-in Diagnostics**:
- Use GEPA's health check tools
- Monitor system metrics
- Review evolution logs
**Performance Analysis**:
- Check memory usage patterns
- Analyze evaluation timing
- Review convergence metrics
**Community Support**:
- Check documentation for similar issues
- Review example use cases
- Consider consulting with technical support
## FAQ
### General Questions
**Q: How long does evolution typically take?**
A: Depends on complexity, but usually:
- Simple tasks: 5-15 minutes
- Moderate tasks: 15-60 minutes
- Complex tasks: 1-3 hours
**Q: How do I know if my prompt is good enough?**
A: Look for:
- Fitness score above 0.8
- Consistent performance across evaluations
- Meets your specific quality criteria
**Q: Can I stop evolution early?**
A: Yes, evolution can be stopped at any time. You'll get the best candidate found so far.
### Technical Questions
**Q: What's the difference between generations and population size?**
A:
- Population size: How many prompt variations exist at once
- Generations: How many evolution cycles to run
- More of either generally means better results but takes longer
**Q: How does GEPA handle different types of tasks?**
A: GEPA adapts its mutation strategies based on:
- Task description analysis
- Performance feedback
- Failure pattern recognition
**Q: Can I use my own evaluation criteria?**
A: Yes, GEPA allows custom evaluation criteria and objectives to match your specific needs.
### Advanced Usage
**Q: How do I optimize for multiple, competing objectives?**
A: Use Pareto optimization:
- Define weights for each objective
- Use frontier analysis to explore trade-offs
- Select candidates based on your priorities
**Q: Can I continue evolution from where I left off?**
A: Yes, GEPA maintains state and can resume evolution from any previous point.
**Q: How do I interpret reflection analysis results?**
A: Reflection analysis provides:
- Specific failure patterns
- Confidence scores for suggestions
- Prioritized improvement recommendations
---
This user guide provides comprehensive guidance for both beginners and advanced users. For technical details and development information, see the [Developer Guide](./DEVELOPER_GUIDE.md) and [API Documentation](./API.md).