Prompt Auto-Optimizer MCP

MIT License

Overview InspectNew Endpoints Schema Related Servers Reviews Score

core-tools.md•17 kB

# Core GEPA Tools Reference This document provides detailed specifications for the core GEPA MCP tools used for prompt evolution, trajectory recording, and reflection analysis. ## Table of Contents - [gepa_start_evolution](#gepa_start_evolution) - [gepa_record_trajectory](#gepa_record_trajectory) - [gepa_evaluate_prompt](#gepa_evaluate_prompt) - [gepa_reflect](#gepa_reflect) --- ## gepa_start_evolution Initializes a genetic evolution process with configuration parameters and an optional seed prompt. ### Purpose Starts a new prompt evolution experiment by setting up the initial population, defining evolution parameters, and creating the evolutionary framework for iterative improvement. ### Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `taskDescription` | `string` | ✅ | Clear description of the task to optimize prompts for | | `seedPrompt` | `string` | ❌ | Initial prompt to start evolution from (optional) | | `targetModules` | `string[]` | ❌ | Specific modules or components to target (optional) | | `config` | `object` | ❌ | Evolution configuration parameters (optional) | ### Configuration Object ```typescript interface EvolutionConfig { populationSize?: number; // Default: 20, Range: 5-50 generations?: number; // Default: 10, Range: 1-100 mutationRate?: number; // Default: 0.15, Range: 0.0-1.0 crossoverRate?: number; // Default: 0.7, Range: 0.0-1.0 elitismPercentage?: number; // Default: 0.1, Range: 0.0-0.5 } ``` ### Request Example ```typescript const response = await mcpClient.callTool('gepa_start_evolution', { taskDescription: 'Generate comprehensive API documentation from code comments', seedPrompt: 'Analyze the following code and generate detailed API documentation including parameters, return types, and usage examples:', targetModules: ['documentation', 'code_analysis'], config: { populationSize: 25, generations: 15, mutationRate: 0.12, crossoverRate: 0.75, elitismPercentage: 0.15 } }); ``` ### Response Example ```markdown # Evolution Process Started ## Evolution Details - **Evolution ID**: evolution_1733140800_abc123 - **Task**: Generate comprehensive API documentation from code comments - **Target Modules**: documentation, code_analysis - **Seed Prompt**: Provided ## Configuration - **Population Size**: 25 - **Max Generations**: 15 - **Mutation Rate**: 0.12 ## Initial Population - **Total Candidates**: 25 - **Seed Candidates**: 1 - **Generated Candidates**: 24 Evolution process initialized successfully. Use `gepa_evaluate_prompt` to begin evaluating candidates. ``` ### Error Cases | Error | Cause | Solution | |-------|-------|----------| | `taskDescription is required` | Missing task description | Provide a clear, specific task description | | `Invalid mutation rate` | Rate outside 0.0-1.0 range | Use values between 0.0 and 1.0 | | `Population size too large` | Size exceeds system limits | Reduce population size to ≤50 | --- ## gepa_record_trajectory Records an execution trajectory for prompt evaluation, capturing detailed performance metrics and execution steps. ### Purpose Captures comprehensive execution data to enable reflection analysis, performance tracking, and evolutionary feedback. Essential for building the dataset needed for intelligent prompt improvement. ### Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `promptId` | `string` | ✅ | Unique identifier for the prompt candidate | | `taskId` | `string` | ✅ | Identifier for the specific task instance | | `executionSteps` | `ExecutionStep[]` | ✅ | Sequence of execution steps with details | | `result` | `ExecutionResult` | ✅ | Final execution result and performance score | | `metadata` | `object` | ❌ | Additional execution metadata (optional) | ### ExecutionStep Schema ```typescript interface ExecutionStep { action: string; // Action performed (e.g., "parse_input", "generate_code") input?: object; // Input data for this step output?: object; // Output produced by this step timestamp: string; // ISO timestamp of step execution success: boolean; // Whether step completed successfully reasoning?: string; // AI reasoning for this step toolName?: string; // Tool used in this step error?: string; // Error message if step failed } ``` ### ExecutionResult Schema ```typescript interface ExecutionResult { success: boolean; // Overall execution success score: number; // Performance score (0.0-1.0) output: object; // Final output of execution error?: string; // Error message if execution failed } ``` ### Metadata Schema ```typescript interface TrajectoryMetadata { llmModel?: string; // LLM model used (e.g., "claude-3-sonnet") executionTime?: number; // Total execution time in milliseconds tokenUsage?: number; // Total tokens consumed retryCount?: number; // Number of retries attempted environment?: string; // Execution environment info } ``` ### Request Example ```typescript const response = await mcpClient.callTool('gepa_record_trajectory', { promptId: 'evolution_1733140800_candidate_5', taskId: 'api_documentation_task_001', executionSteps: [ { action: 'parse_code_structure', input: { codeText: '...' }, output: { functions: [...], classes: [...] }, timestamp: '2024-12-02T14:30:00.000Z', success: true, reasoning: 'Successfully identified 5 functions and 2 classes' }, { action: 'generate_documentation', input: { parsedStructure: {...} }, output: { documentation: '...' }, timestamp: '2024-12-02T14:30:05.000Z', success: true, toolName: 'documentation_generator' }, { action: 'validate_output', input: { documentation: '...' }, output: { validationResult: { isValid: true, score: 0.87 } }, timestamp: '2024-12-02T14:30:08.000Z', success: true } ], result: { success: true, score: 0.87, output: { documentationText: 'Generated API documentation...', coverageScore: 0.91, qualityMetrics: { clarity: 0.85, completeness: 0.89 } } }, metadata: { llmModel: 'claude-3-sonnet', executionTime: 8500, tokenUsage: 1250, environment: 'production' } }); ``` ### Response Example ```markdown # Trajectory Recorded Successfully ## Trajectory Details - **Trajectory ID**: trajectory_1733140850_def456 - **Prompt ID**: evolution_1733140800_candidate_5 - **Task ID**: api_documentation_task_001 - **Execution Steps**: 3 - **Success**: ✅ - **Performance Score**: 0.870 ## Execution Summary - **Total Steps**: 3 - **Successful Steps**: 3 - **Failed Steps**: 0 - **Execution Time**: 8500ms - **Token Usage**: 1250 ## Storage - **File**: ./data/trajectories/trajectory_1733140850_def456.json - **Success**: Yes - **ID**: trajectory_1733140850_def456 ✨ Candidate added to Pareto frontier for optimization. ``` ### Error Cases | Error | Cause | Solution | |-------|-------|----------| | `promptId is required` | Missing prompt identifier | Provide valid prompt ID from evolution | | `Invalid execution steps` | Malformed steps array | Ensure each step has required fields | | `Score out of range` | Score not between 0.0-1.0 | Use normalized scores in valid range | | `Storage failure` | File system error | Check disk space and permissions | --- ## gepa_evaluate_prompt Evaluates a prompt candidate's performance across multiple tasks using configurable rollout counts and execution strategies. ### Purpose Systematically tests prompt candidates across diverse scenarios to gather robust performance data for evolutionary selection and Pareto frontier updates. ### Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `promptId` | `string` | ✅ | Unique identifier for prompt to evaluate | | `taskIds` | `string[]` | ✅ | List of task IDs to evaluate against | | `rolloutCount` | `number` | ❌ | Number of evaluation rollouts per task (default: 5) | | `parallel` | `boolean` | ❌ | Whether to run evaluations in parallel (default: true) | ### Request Example ```typescript const response = await mcpClient.callTool('gepa_evaluate_prompt', { promptId: 'evolution_1733140800_candidate_12', taskIds: [ 'code_documentation_basic', 'code_documentation_complex', 'api_reference_generation', 'example_code_creation' ], rolloutCount: 8, parallel: true }); ``` ### Response Example ```markdown # Prompt Evaluation Complete ## Evaluation Details - **Evaluation ID**: eval_1733140900_ghi789 - **Prompt ID**: evolution_1733140800_candidate_12 - **Tasks Evaluated**: 4 - **Rollouts per Task**: 8 - **Total Evaluations**: 32 - **Parallel Execution**: Yes ## Performance Metrics - **Success Rate**: 87.5% - **Average Score**: 0.763 - **Combined Fitness**: 0.668 - **Average Execution Time**: 1850ms ## Task Breakdown - **code_documentation_basic**: 100.0% success, 0.821 avg score - **code_documentation_complex**: 87.5% success, 0.742 avg score - **api_reference_generation**: 75.0% success, 0.698 avg score - **example_code_creation**: 87.5% success, 0.791 avg score ✨ Candidate updated in Pareto frontier with performance metrics. ``` ### Evaluation Metrics The evaluation process tracks several key metrics: | Metric | Description | Range | |--------|-------------|-------| | **Success Rate** | Percentage of successful executions | 0-100% | | **Average Score** | Mean performance score across rollouts | 0.0-1.0 | | **Combined Fitness** | Success rate × Average score | 0.0-1.0 | | **Execution Time** | Mean time per evaluation | milliseconds | | **Token Efficiency** | Output quality per token used | 0.0-1.0 | ### Error Cases | Error | Cause | Solution | |-------|-------|----------| | `promptId not found` | Invalid prompt identifier | Use valid prompt ID from active evolution | | `Empty taskIds array` | No tasks specified | Provide at least one task ID | | `Rollout count too high` | Exceeds system limits | Use rolloutCount ≤ 20 | | `Evaluation timeout` | Tasks taking too long | Reduce complexity or increase timeouts | --- ## gepa_reflect Analyzes execution trajectories to identify failure patterns and generate actionable prompt improvement suggestions. ### Purpose Performs intelligent failure analysis to understand why prompts fail and provides specific, actionable recommendations for improvement. Powers the reflection-driven evolution cycle. ### Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `trajectoryIds` | `string[]` | ✅ | List of trajectory IDs to analyze for patterns | | `targetPromptId` | `string` | ✅ | Prompt ID to generate improvements for | | `analysisDepth` | `'shallow' \| 'deep'` | ❌ | Depth of analysis to perform (default: 'deep') | | `focusAreas` | `string[]` | ❌ | Specific areas to focus analysis on (optional) | ### Analysis Depth Options | Depth | Description | Use Case | |-------|-------------|----------| | `shallow` | Quick pattern identification | Fast iteration, initial analysis | | `deep` | Comprehensive root cause analysis | Detailed improvement, final optimization | ### Focus Areas Common focus areas for targeted analysis: - `instruction_clarity` - Prompt instruction quality - `example_quality` - Example effectiveness - `constraint_handling` - Constraint specification - `error_recovery` - Failure handling strategies - `output_formatting` - Response structure - `reasoning_depth` - Logical reasoning quality ### Request Example ```typescript const response = await mcpClient.callTool('gepa_reflect', { trajectoryIds: [ 'trajectory_1733140850_abc123', 'trajectory_1733140855_def456', 'trajectory_1733140860_ghi789', 'trajectory_1733140865_jkl012' ], targetPromptId: 'evolution_1733140800_candidate_15', analysisDepth: 'deep', focusAreas: ['instruction_clarity', 'example_quality', 'error_recovery'] }); ``` ### Response Example ```markdown # Reflection Analysis Complete ## Analysis Details - **Reflection ID**: reflection_1733141000_mno345 - **Target Prompt**: evolution_1733140800_candidate_15 - **Trajectories Analyzed**: 4/4 - **Analysis Depth**: deep - **Focus Areas**: instruction_clarity, example_quality, error_recovery ## Failure Pattern Analysis - **Patterns Detected**: 3 - **Recommendations**: 5 - **Confidence**: 89.2% ## Key Findings 1. **Instruction Ambiguity** (75.0% frequency) - Severity: 75.0% - Description: Instructions lack specific output format requirements 2. **Insufficient Examples** (50.0% frequency) - Severity: 50.0% - Description: Limited examples for complex edge cases 3. **Error Handling Gaps** (25.0% frequency) - Severity: 25.0% - Description: No guidance for handling malformed input ## Improvement Recommendations 1. **High Priority**: Add explicit output format specifications - Issue: Instruction Ambiguity - Frequency: 75.0% 2. **High Priority**: Include diverse edge case examples - Issue: Insufficient Examples - Frequency: 50.0% 3. **Medium Priority**: Add error handling instructions - Issue: Error Handling Gaps - Frequency: 25.0% 4. **Medium Priority**: Clarify constraint boundaries - Issue: Instruction Ambiguity - Frequency: 75.0% 5. **Low Priority**: Improve reasoning chain structure - Issue: Insufficient Examples - Frequency: 50.0% ## Summary The analysis identified 3 distinct failure patterns across 4 trajectories. Focus on addressing high-priority issues first to maximize improvement impact. ``` ### Reflection Output Components | Component | Description | Use Case | |-----------|-------------|----------| | **Failure Patterns** | Common failure modes and frequencies | Understanding systematic issues | | **Root Cause Analysis** | Deep dive into underlying problems | Targeted improvement focus | | **Improvement Suggestions** | Specific, actionable prompt changes | Direct implementation guidance | | **Confidence Scores** | Reliability of analysis results | Risk assessment for changes | | **Priority Ranking** | Ordered list of improvement areas | Resource allocation decisions | ### Error Cases | Error | Cause | Solution | |-------|-------|----------| | `No valid trajectories` | All trajectory IDs invalid | Verify trajectory IDs exist and are accessible | | `Insufficient data` | Too few trajectories for analysis | Provide at least 3 trajectories | | `Analysis timeout` | Complex analysis taking too long | Use 'shallow' depth or reduce trajectory count | | `Target prompt not found` | Invalid target prompt ID | Verify prompt ID exists in current evolution | --- ## Best Practices ### Evolution Initialization - **Task Descriptions**: Be specific and measurable - **Seed Prompts**: Use high-quality, representative examples - **Population Size**: Start small (10-20) for faster iteration - **Generations**: Monitor convergence, typically 10-30 generations ### Trajectory Recording - **Step Granularity**: Capture meaningful decision points - **Error Context**: Include rich error information for failures - **Performance Metrics**: Use consistent scoring scales (0.0-1.0) - **Metadata**: Include environment and execution context ### Evaluation Strategy - **Task Diversity**: Use varied, representative task sets - **Rollout Counts**: Balance thoroughness vs. speed (3-10 rollouts) - **Parallel Processing**: Enable for faster evaluation cycles - **Score Calibration**: Ensure consistent scoring across tasks ### Reflection Analysis - **Trajectory Selection**: Include both successful and failed executions - **Analysis Depth**: Use 'deep' for final optimization, 'shallow' for iteration - **Focus Areas**: Target specific improvement areas when known - **Implementation**: Apply suggestions systematically and test impact ## Integration Patterns ### Sequential Workflow ```typescript // 1. Start evolution const evolution = await startEvolution({...}); // 2. Evaluate candidates const evaluation = await evaluatePrompt({...}); // 3. Record trajectories await recordTrajectory({...}); // 4. Analyze failures const reflection = await reflect({...}); // 5. Generate new candidates based on insights ``` ### Parallel Evaluation ```typescript // Evaluate multiple candidates simultaneously const evaluations = await Promise.all([ evaluatePrompt({ promptId: 'candidate_1', ... }), evaluatePrompt({ promptId: 'candidate_2', ... }), evaluatePrompt({ promptId: 'candidate_3', ... }) ]); ``` ### Continuous Improvement ```typescript // Monitor performance and trigger reflection automatically if (successRate < threshold) { const analysis = await reflect({ trajectoryIds: recentFailures, targetPromptId: currentBest }); // Apply improvements... } ```

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sloth-wq/prompt-auto-optimizer-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server