Skip to main content
Glama

Prompt Auto-Optimizer MCP

by sloth-wq
USER_GUIDE.md14 kB
# GEPA User Guide *Genetic Evolutionary Prompt Adaptation - Complete User Reference* ## Table of Contents 1. [Getting Started](#getting-started) 2. [Core Concepts](#core-concepts) 3. [Common Use Cases](#common-use-cases) 4. [Step-by-Step Workflows](#step-by-step-workflows) 5. [Best Practices](#best-practices) 6. [Interpreting Results](#interpreting-results) 7. [Performance Tips](#performance-tips) 8. [Troubleshooting](#troubleshooting) 9. [FAQ](#faq) ## Getting Started ### What is GEPA? GEPA (Genetic Evolutionary Prompt Adaptation) is an AI-powered system that automatically improves prompts through genetic algorithms. Think of it as having an AI assistant that learns from your prompt's performance and continuously evolves better versions. ### Key Benefits - **Automatic Optimization**: No manual prompt engineering required - **Multi-Objective Balance**: Optimizes for performance, creativity, and reliability - **Learning from Failures**: Analyzes what went wrong and suggests improvements - **Measurable Results**: Provides clear metrics on improvement ### Prerequisites - **Basic understanding** of what prompts are (instructions to AI models) - **No programming experience** required for basic usage - **Claude Code** or compatible MCP client for advanced features ## Core Concepts ### 1. Evolution Process GEPA improves prompts like nature improves organisms: ``` Initial Prompt → Generate Variations → Test Performance → Keep Best → Create New Variations → Repeat ``` **Real Example**: - Start: "Write a summary" - Evolution: "Create a concise, informative summary that highlights key points" - Evolution: "Write a well-structured summary with clear topic sentences and supporting details" ### 2. Multi-Objective Optimization GEPA doesn't just optimize for one thing - it balances multiple goals: - **Performance**: How well the prompt works - **Creativity**: How diverse and innovative responses are - **Consistency**: How reliably the prompt produces good results - **Efficiency**: How quickly it generates responses ### 3. Learning from Failures When a prompt doesn't work well, GEPA: 1. Analyzes what went wrong 2. Identifies patterns across multiple failures 3. Suggests specific improvements 4. Tests the improvements automatically ### 4. Pareto Frontier GEPA maintains a collection of the "best" prompts that represent different trade-offs. For example: - Prompt A: Very creative but sometimes inconsistent - Prompt B: Very reliable but less creative - Prompt C: Balanced between creativity and reliability ## Common Use Cases ### 1. Content Creation **Goal**: Improve prompts for blog writing, marketing copy, or creative content **Example Evolution**: ``` Starting prompt: "Write a blog post about AI" After evolution: "Create an engaging, informative blog post about AI that: - Opens with a compelling hook - Explains concepts in accessible language - Includes real-world examples - Concludes with actionable insights - Maintains a conversational, professional tone" ``` ### 2. Code Generation **Goal**: Better prompts for generating clean, functional code **Example Evolution**: ``` Starting prompt: "Create a login function" After evolution: "Generate a secure login function that: - Includes proper input validation - Handles errors gracefully - Follows security best practices - Includes clear documentation - Uses modern JavaScript/TypeScript syntax" ``` ### 3. Data Analysis **Goal**: Improved prompts for analyzing and interpreting data **Example Evolution**: ``` Starting prompt: "Analyze this data" After evolution: "Perform a comprehensive data analysis that: - Identifies key trends and patterns - Highlights significant findings - Provides context for the numbers - Suggests actionable insights - Presents findings in clear, prioritized format" ``` ### 4. Customer Service **Goal**: Better chatbot responses for customer support **Example Evolution**: ``` Starting prompt: "Help the customer" After evolution: "Provide helpful customer support by: - Actively listening to the customer's concern - Asking clarifying questions when needed - Offering specific, actionable solutions - Maintaining empathy and professionalism - Following up to ensure satisfaction" ``` ## Step-by-Step Workflows ### Workflow 1: Basic Prompt Optimization **Step 1: Define Your Task** ``` Task: "Improve customer email responses" Starting prompt: "Respond to customer emails professionally" ``` **Step 2: Start Evolution** Use GEPA's evolution tool to begin optimization: - Set population size (start with 10-20) - Set number of generations (start with 5-10) - Provide your starting prompt **Step 3: Let GEPA Work** GEPA will: - Create variations of your prompt - Test them against real scenarios - Keep the best performers - Generate new variations **Step 4: Review Results** After evolution completes, you'll get: - The best-performing prompt - Performance metrics - Insights about what worked **Step 5: Use and Monitor** - Implement the optimized prompt - Monitor its performance - Run additional evolution cycles as needed ### Workflow 2: Advanced Multi-Objective Optimization **Step 1: Define Multiple Goals** ``` Primary goal: High response quality Secondary goal: Maintain creative variety Constraint: Keep responses under 200 words ``` **Step 2: Configure Evolution** - Set performance weight (e.g., 70%) - Set diversity weight (e.g., 30%) - Define evaluation criteria **Step 3: Monitor Progress** Track multiple metrics: - Overall fitness score - Performance vs. creativity balance - Consistency across test cases **Step 4: Select Optimal Candidate** Choose the prompt that best balances your objectives: - High performance when you need reliability - Balanced when you need both creativity and performance - High diversity when you need innovation ### Workflow 3: Failure Analysis and Improvement **Step 1: Identify Problems** When prompts aren't working: - Collect examples of poor outputs - Note patterns in failures - Document specific issues **Step 2: Use Reflection Analysis** GEPA will analyze failures to identify: - Common error patterns - Root causes - Specific improvement areas **Step 3: Apply Insights** GEPA suggests targeted improvements: - More specific instructions - Better context setting - Clearer formatting requirements **Step 4: Test Improvements** Run evolution with reflection-based mutations: - Focused on identified problem areas - Validates improvements work - Continues evolution with better starting point ## Best Practices ### 1. Starting Prompts **Good Starting Prompts**: - Clear and specific about the desired outcome - Include context about the use case - Mention any constraints or requirements **Example**: ``` Good: "Write a professional email response to customer complaints that acknowledges their concern, provides a solution, and maintains a helpful tone" Avoid: "Write an email" ``` ### 2. Evolution Parameters **Population Size**: - Start small (10-20) for quick results - Increase (30-50) for complex tasks - Use larger populations (50+) for highly nuanced tasks **Generations**: - Start with 5-10 generations - Monitor convergence - stop if no improvement - Complex tasks may need 20+ generations **Mutation Rate**: - Default 15% works for most cases - Increase for more exploration - Decrease for fine-tuning ### 3. Task Descriptions **Be Specific About Your Goal**: ``` Good: "Optimize prompts for generating engaging social media posts that drive high engagement rates" Avoid: "Make prompts better" ``` **Include Success Criteria**: ``` Good: "The prompt should generate responses that are informative, engaging, and under 280 characters" Avoid: "The prompt should work well" ``` ### 4. Evaluation Strategy **Test on Real Scenarios**: - Use actual data and use cases - Include edge cases and challenging examples - Test across different contexts **Multiple Evaluation Rounds**: - Run 5-10 evaluations per prompt - Look for consistency across evaluations - Consider both average and worst-case performance ## Interpreting Results ### 1. Performance Metrics **Fitness Score (0.0-1.0)**: - 0.8-1.0: Excellent performance - 0.6-0.8: Good performance - 0.4-0.6: Needs improvement - Below 0.4: Significant issues **Success Rate**: - Percentage of successful evaluations - Higher is better - Consider both rate and consistency **Average Score**: - Mean performance across all evaluations - Balanced with diversity considerations - Look for upward trends across generations ### 2. Evolution Progress **Convergence Indicators**: - Fitness scores plateau - Less variation in new candidates - Reflection analysis finds fewer issues **Diversity Metrics**: - Variety in prompt approaches - Different strategies being explored - Balance between exploration and exploitation ### 3. Comparative Analysis **Before vs. After**: - Direct comparison of original vs. evolved prompts - Quantitative improvement metrics - Qualitative assessment of output quality **Pareto Frontier Analysis**: - Multiple "best" options for different trade-offs - Choose based on your specific priorities - Consider context-specific needs ## Performance Tips ### 1. Optimization Strategies **Start Simple**: - Begin with basic evolution - Add complexity gradually - Focus on one objective initially **Use Reflection Wisely**: - Analyze failures regularly - Apply insights to guide evolution - Don't over-optimize for rare edge cases **Monitor Resources**: - Watch memory usage during large evolutions - Use parallel evaluation judiciously - Consider time constraints ### 2. Common Pitfalls to Avoid **Over-Optimization**: - Don't run too many generations without purpose - Monitor for diminishing returns - Stop when you achieve your goals **Ignoring Context**: - Test prompts in realistic conditions - Consider deployment environment - Account for user variability **Single Metric Focus**: - Balance multiple objectives - Consider unintended consequences - Maintain holistic evaluation ### 3. Scaling Considerations **Large-Scale Evolution**: - Use incremental improvement approaches - Implement proper resource management - Monitor system performance **Production Deployment**: - Validate evolved prompts thoroughly - Implement gradual rollout strategies - Monitor production performance ## Troubleshooting ### Common Issues and Solutions #### 1. Evolution Not Improving **Symptoms**: - Fitness scores plateau quickly - No significant improvement after multiple generations - All candidates seem similar **Solutions**: - Increase mutation rate for more exploration - Expand population size - Check if task description is too restrictive - Add more diverse evaluation scenarios #### 2. Inconsistent Results **Symptoms**: - High variance in evaluation scores - Good candidates sometimes perform poorly - Unpredictable output quality **Solutions**: - Increase number of evaluation rollouts - Add more specific constraints to prompts - Use reflection analysis to identify failure patterns - Consider environmental factors affecting performance #### 3. Slow Performance **Symptoms**: - Evolution takes too long to complete - High memory usage - System becomes unresponsive **Solutions**: - Reduce population size - Implement batch processing - Use parallel evaluation efficiently - Monitor and optimize resource usage #### 4. Poor Quality Candidates **Symptoms**: - All evolved prompts perform worse than original - Generated candidates don't make sense - Evolution seems to be going in wrong direction **Solutions**: - Review task description for clarity - Check evaluation criteria - Use reflection analysis on failures - Start with a better seed prompt ### Getting Help **Built-in Diagnostics**: - Use GEPA's health check tools - Monitor system metrics - Review evolution logs **Performance Analysis**: - Check memory usage patterns - Analyze evaluation timing - Review convergence metrics **Community Support**: - Check documentation for similar issues - Review example use cases - Consider consulting with technical support ## FAQ ### General Questions **Q: How long does evolution typically take?** A: Depends on complexity, but usually: - Simple tasks: 5-15 minutes - Moderate tasks: 15-60 minutes - Complex tasks: 1-3 hours **Q: How do I know if my prompt is good enough?** A: Look for: - Fitness score above 0.8 - Consistent performance across evaluations - Meets your specific quality criteria **Q: Can I stop evolution early?** A: Yes, evolution can be stopped at any time. You'll get the best candidate found so far. ### Technical Questions **Q: What's the difference between generations and population size?** A: - Population size: How many prompt variations exist at once - Generations: How many evolution cycles to run - More of either generally means better results but takes longer **Q: How does GEPA handle different types of tasks?** A: GEPA adapts its mutation strategies based on: - Task description analysis - Performance feedback - Failure pattern recognition **Q: Can I use my own evaluation criteria?** A: Yes, GEPA allows custom evaluation criteria and objectives to match your specific needs. ### Advanced Usage **Q: How do I optimize for multiple, competing objectives?** A: Use Pareto optimization: - Define weights for each objective - Use frontier analysis to explore trade-offs - Select candidates based on your priorities **Q: Can I continue evolution from where I left off?** A: Yes, GEPA maintains state and can resume evolution from any previous point. **Q: How do I interpret reflection analysis results?** A: Reflection analysis provides: - Specific failure patterns - Confidence scores for suggestions - Prioritized improvement recommendations --- This user guide provides comprehensive guidance for both beginners and advanced users. For technical details and development information, see the [Developer Guide](./DEVELOPER_GUIDE.md) and [API Documentation](./API.md).

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sloth-wq/prompt-auto-optimizer-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server