Prompt Auto-Optimizer MCP

MIT License

Overview InspectNew Endpoints Schema Related Servers Reviews Score

recovery-tools.md•23.3 kB

# Recovery & Disaster Management Tools Reference This document provides detailed specifications for GEPA MCP tools focused on system resilience, disaster recovery, backup management, and component health monitoring. ## Table of Contents - [gepa_create_backup](#gepa_create_backup) - [gepa_restore_backup](#gepa_restore_backup) - [gepa_list_backups](#gepa_list_backups) - [gepa_recovery_status](#gepa_recovery_status) - [gepa_recover_component](#gepa_recover_component) - [gepa_integrity_check](#gepa_integrity_check) --- ## gepa_create_backup Creates comprehensive system backups including evolution state, trajectories, Pareto frontier data, and component configurations. ### Purpose Provides robust backup capabilities to protect against data loss, system corruption, and component failures. Essential for maintaining system resilience and enabling point-in-time recovery. ### Backup Types | Type | Description | Contents | Use Case | |------|-------------|----------|----------| | **Full** | Complete system state | All components, trajectories, configurations | Regular snapshots | | **Incremental** | Changes since last backup | Modified data only | Frequent updates | | **Component** | Specific component data | Single component state | Targeted backup | | **Archive** | Compressed historical data | Older trajectories and results | Long-term storage | ### Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `label` | `string` | ❌ | Optional descriptive label for the backup | | `includeTrajectories` | `boolean` | ❌ | Include trajectory data in backup (default: true) | ### Request Example ```typescript const response = await mcpClient.callTool('gepa_create_backup', { label: 'pre-major-update-backup', includeTrajectories: true }); ``` ### Response Example ```markdown # System Backup Created ## Backup Details - **ID**: backup_1733140800_abc123def - **Label**: pre-major-update-backup - **Timestamp**: 2024-12-02T14:30:00.000Z - **Type**: full - **Size**: 15.47 MB - **Components**: 7 - **Compressed**: Yes ## Components Backed Up - **evolution_engine** (system): 2.45 KB - **pareto_frontier** (data): 1.23 MB - **trajectory_store** (data): 12.15 MB - **llm_adapter** (config): 1.87 KB - **prompt_mutator** (config): 3.21 KB - **reflection_engine** (system): 5.67 KB - **disaster_recovery** (system): 890 bytes ## Metadata - Generation: 15 - Population Size: 47 - Pareto Frontier Size: 23 - Total Trajectories: 1,247 The backup is ready for restoration if needed. ``` ### Backup Components Each backup includes multiple component types: | Component | Type | Description | Critical Level | |-----------|------|-------------|----------------| | **Evolution Engine** | System | Evolution state and configuration | High | | **Pareto Frontier** | Data | Optimal candidate archive | High | | **Trajectory Store** | Data | Complete execution history | Medium | | **LLM Adapter** | Config | Model configurations and settings | Medium | | **Prompt Mutator** | Config | Mutation strategies and parameters | Medium | | **Reflection Engine** | System | Analysis patterns and learned insights | High | | **Memory Cache** | Data | Performance optimization cache | Low | ### Automated Backup Strategies ```typescript // Schedule regular backups const scheduleBackups = { daily: { label: `daily-backup-${new Date().toISOString().split('T')[0]}`, includeTrajectories: true }, weekly: { label: `weekly-backup-week-${getWeekNumber()}`, includeTrajectories: true }, preUpdate: { label: 'pre-system-update', includeTrajectories: true } }; ``` ### Error Cases | Error | Cause | Solution | |-------|-------|----------| | `Insufficient disk space` | Storage limit exceeded | Free up space or configure backup location | | `Component lock failure` | System busy during backup | Retry after current operations complete | | `Compression failed` | Corrupted data or memory issues | Run integrity check before backup | | `Permission denied` | File system access restricted | Check backup directory permissions | --- ## gepa_restore_backup Restores system state from a previously created backup with optional integrity validation and pre-restore backup creation. ### Purpose Enables recovery from backup snapshots to restore system functionality after failures, corruption, or experimental changes that need to be reverted. ### Restoration Options | Option | Description | Impact | Recommendation | |--------|-------------|--------|----------------| | **Full Restore** | Complete system replacement | All current data lost | Use for disaster recovery | | **Selective Restore** | Specific component restoration | Targeted data replacement | Use for component failures | | **Merge Restore** | Combine backup with current state | Partial data preservation | Use for partial recovery | ### Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `backupId` | `string` | ✅ | ID of the backup to restore from | | `validateIntegrity` | `boolean` | ❌ | Perform integrity validation before restore (default: true) | | `createPreRestoreBackup` | `boolean` | ❌ | Create backup before restoration (default: true) | ### Request Example ```typescript const response = await mcpClient.callTool('gepa_restore_backup', { backupId: 'backup_1733140800_abc123def', validateIntegrity: true, createPreRestoreBackup: true }); ``` ### Response Example ```markdown # System Restore Completed ## Restore Details - **Backup ID**: backup_1733140800_abc123def - **Success**: Yes - **Restore Time**: 2,347 ms - **Pre-restore Backup**: backup_1733141000_prerestore_xyz789 ## Components Restored ✅ evolution_engine ✅ pareto_frontier ✅ trajectory_store ✅ llm_adapter ✅ prompt_mutator ✅ reflection_engine ✅ disaster_recovery ## Integrity Checks - evolution_engine: ✅ Valid - pareto_frontier: ✅ Valid - trajectory_store: ✅ Valid - llm_adapter: ✅ Valid - prompt_mutator: ✅ Valid - reflection_engine: ✅ Valid - disaster_recovery: ✅ Valid System has been successfully restored from backup. ``` ### Restoration Process The restoration follows these steps: 1. **Pre-Validation**: Verify backup integrity and compatibility 2. **Pre-Restore Backup**: Create safety backup of current state 3. **Component Shutdown**: Gracefully stop affected components 4. **Data Restoration**: Replace component data with backup versions 5. **Integrity Verification**: Validate restored data consistency 6. **Component Restart**: Initialize components with restored data 7. **Health Check**: Verify system functionality post-restore ### Error Cases | Error | Cause | Solution | |-------|-------|----------| | `Backup not found` | Invalid backup ID | Verify backup ID exists with list_backups | | `Integrity validation failed` | Corrupted backup data | Use different backup or disable validation | | `Incompatible version` | Backup from different system version | Check compatibility or use migration tools | | `Restoration interrupted` | System failure during restore | Use pre-restore backup to recover | --- ## gepa_list_backups Lists all available system backups with filtering and sorting options for backup management. ### Purpose Provides visibility into backup history, enables backup selection for restoration, and supports backup lifecycle management. ### Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `limit` | `number` | ❌ | Maximum number of backups to return (default: 20) | | `filterLabel` | `string` | ❌ | Filter backups by label pattern (optional) | ### Request Example ```typescript const response = await mcpClient.callTool('gepa_list_backups', { limit: 10, filterLabel: 'daily' }); ``` ### Response Example ```markdown # Available System Backups Found 8 backup(s): ## daily-backup-2024-12-02 (backup_1733140800_abc123def) - **Created**: 2024-12-02T14:30:00.000Z - **Type**: full - **Size**: 15.47 MB - **Components**: 7 ## daily-backup-2024-12-01 (backup_1733054400_def456ghi) - **Created**: 2024-12-01T14:30:00.000Z - **Type**: full - **Size**: 14.92 MB - **Components**: 7 ## pre-evolution-experiment (backup_1733050800_ghi789jkl) - **Created**: 2024-12-01T13:30:00.000Z - **Type**: full - **Size**: 14.85 MB - **Components**: 7 ## weekly-backup-week-48 (backup_1732968000_jkl012mno) - **Created**: 2024-11-30T14:00:00.000Z - **Type**: full - **Size**: 13.67 MB - **Components**: 7 ## emergency-backup (backup_1732881600_mno345pqr) - **Created**: 2024-11-29T14:00:00.000Z - **Type**: full - **Size**: 12.98 MB - **Components**: 6 Use `gepa_restore_backup` with a backup ID to restore the system. ``` ### Backup Management Utilities ```typescript // Find recent backups const recentBackups = await mcpClient.callTool('gepa_list_backups', { limit: 5 }); // Find backups by label pattern const experimentBackups = await mcpClient.callTool('gepa_list_backups', { filterLabel: 'experiment', limit: 20 }); // Get comprehensive backup list const allBackups = await mcpClient.callTool('gepa_list_backups', { limit: 100 }); ``` ### Error Cases | Error | Cause | Solution | |-------|-------|----------| | `Backup directory not found` | Backup storage not configured | Initialize backup system | | `Permission denied` | Insufficient file access rights | Check directory permissions | | `Corrupted backup index` | Backup metadata damaged | Rebuild backup index | --- ## gepa_recovery_status Provides comprehensive disaster recovery status and health information for all system components. ### Purpose Offers real-time visibility into system health, recovery capabilities, and potential issues before they become critical failures. ### Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `includeMetrics` | `boolean` | ❌ | Include detailed metrics in response (default: true) | ### Request Example ```typescript const response = await mcpClient.callTool('gepa_recovery_status', { includeMetrics: true }); ``` ### Response Example ```markdown # Disaster Recovery Status ## Overall System Health: HEALTHY ### System Components - **evolution_engine**: HEALTHY (Last check: 2024-12-02T14:35:00.000Z) - **pareto_frontier**: HEALTHY (Last check: 2024-12-02T14:35:00.000Z) - **trajectory_store**: HEALTHY (Last check: 2024-12-02T14:35:00.000Z) - **llm_adapter**: HEALTHY (Last check: 2024-12-02T14:35:00.000Z) - **prompt_mutator**: HEALTHY (Last check: 2024-12-02T14:35:00.000Z) - **reflection_engine**: HEALTHY (Last check: 2024-12-02T14:35:00.000Z) ### Recovery Dashboard - **System Status**: HEALTHY - **Active Recoveries**: 0 - **Recent Failures**: 1 (24h) - **Total Executions**: 1,247 - **Success Rate**: 97.3% ### Metrics - **Backups Available**: 15 - **Quarantined Items**: 0 - **Critical Components**: 6 - **Last Backup Age**: 23 minutes ### Detailed Metrics #### Evolution Engine - uptime: 72.5 hours - memory_usage: 156 MB - active_processes: 3 - error_rate: 0.002 #### Pareto Frontier - frontier_size: 23 - total_candidates: 47 - update_frequency: 15.7 per hour - optimization_efficiency: 0.89 #### Trajectory Store - total_trajectories: 1,247 - storage_used: 245 MB - query_performance: 12.3ms avg - index_health: optimal #### LLM Adapter - active_connections: 2 - avg_response_time: 1,850ms - token_efficiency: 0.76 - rate_limit_status: normal #### Prompt Mutator - mutations_per_hour: 48.2 - success_rate: 94.7% - diversity_score: 0.73 - cache_hit_rate: 67.8% #### Reflection Engine - analyses_completed: 89 - pattern_recognition_accuracy: 91.2% - improvement_suggestions: 267 - confidence_score: 0.84 ``` ### Health Status Levels | Status | Description | Action Required | |--------|-------------|-----------------| | **HEALTHY** | All systems operating normally | None | | **WARNING** | Minor issues detected | Monitor closely | | **DEGRADED** | Reduced functionality | Investigate and repair | | **CRITICAL** | Severe issues affecting operations | Immediate attention | | **FAILED** | Component non-functional | Emergency recovery | ### Monitoring Thresholds ```typescript const healthThresholds = { memory_usage: { warning: 500, critical: 1000 }, // MB error_rate: { warning: 0.05, critical: 0.1 }, // percentage response_time: { warning: 5000, critical: 10000 }, // ms success_rate: { warning: 0.9, critical: 0.8 }, // percentage disk_usage: { warning: 0.8, critical: 0.95 } // percentage }; ``` ### Error Cases | Error | Cause | Solution | |-------|-------|----------| | `Health check timeout` | Component unresponsive | Restart component or run recovery | | `Metrics collection failed` | Monitoring system issue | Check monitoring configuration | | `Status unavailable` | Recovery system not initialized | Initialize disaster recovery system | --- ## gepa_recover_component Recovers a specific GEPA component using configurable recovery strategies when component failures are detected. ### Purpose Provides targeted recovery for individual components without full system restoration, enabling surgical repairs and minimizing disruption. ### Component Types | Component | Description | Recovery Strategies | |-----------|-------------|-------------------| | `evolution_engine` | Core genetic algorithm engine | restart, restore_from_backup, rebuild | | `pareto_frontier` | Multi-objective optimization frontier | restart, restore_from_backup, reset_to_defaults | | `llm_adapter` | Language model interface | restart, reset_to_defaults | | `trajectory_store` | Execution data storage | restart, restore_from_backup, rebuild | | `memory_cache` | Performance optimization cache | restart, reset_to_defaults | ### Recovery Strategies | Strategy | Description | Data Impact | Recovery Time | |----------|-------------|-------------|---------------| | `restart` | Graceful component restart | None | Fast (1-5s) | | `restore_from_backup` | Load from recent backup | Partial loss | Medium (10-30s) | | `rebuild` | Reconstruct from available data | Minimal loss | Slow (30-120s) | | `reset_to_defaults` | Factory reset configuration | Settings lost | Fast (1-10s) | ### Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `componentType` | `string` | ✅ | Type of component to recover | | `strategy` | `string` | ❌ | Recovery strategy to use (default: 'restart') | ### Request Example ```typescript const response = await mcpClient.callTool('gepa_recover_component', { componentType: 'trajectory_store', strategy: 'restore_from_backup' }); ``` ### Response Example ```markdown # Component Recovery Completed ## Recovery Details - **Component**: trajectory_store - **Strategy**: restore_from_backup - **Success**: Yes - **Duration**: 15,347 ms - **Start Time**: 2024-12-02T14:45:00.000Z - **End Time**: 2024-12-02T14:45:15.347Z ## Recovery Logs [2024-12-02T14:45:00.120Z] Starting trajectory_store recovery [2024-12-02T14:45:00.245Z] Identifying latest valid backup [2024-12-02T14:45:01.156Z] Found backup: backup_1733140800_abc123def [2024-12-02T14:45:01.289Z] Validating backup integrity [2024-12-02T14:45:02.445Z] Backup validation passed [2024-12-02T14:45:02.567Z] Stopping trajectory_store component [2024-12-02T14:45:03.123Z] Backing up current state [2024-12-02T14:45:05.789Z] Restoring data from backup [2024-12-02T14:45:12.234Z] Data restoration completed [2024-12-02T14:45:12.456Z] Rebuilding indexes [2024-12-02T14:45:14.567Z] Restarting trajectory_store component [2024-12-02T14:45:15.123Z] Component health check passed [2024-12-02T14:45:15.347Z] Recovery completed successfully ## Pre-Recovery State - Status: FAILED - Last Action: query_trajectories - Error: Database connection timeout ## Post-Recovery State - Status: HEALTHY - Last Action: component_startup - Performance: optimal Component trajectory_store has been successfully recovered using restore_from_backup strategy. ``` ### Recovery Decision Matrix | Issue Type | Recommended Strategy | Alternative | |------------|---------------------|-------------| | **Memory Leak** | restart | reset_to_defaults | | **Data Corruption** | restore_from_backup | rebuild | | **Configuration Error** | reset_to_defaults | restart | | **Process Crash** | restart | restore_from_backup | | **Performance Degradation** | restart | rebuild | ### Automated Recovery Triggers ```typescript // Set up automatic recovery based on health metrics const recoveryTriggers = { evolution_engine: { memory_usage_threshold: 800, // MB error_rate_threshold: 0.1, strategy: 'restart' }, trajectory_store: { response_time_threshold: 5000, // ms error_rate_threshold: 0.05, strategy: 'restore_from_backup' } }; ``` ### Error Cases | Error | Cause | Solution | |-------|-------|----------| | `Component not found` | Invalid component type | Use valid component name | | `Recovery strategy failed` | Strategy inappropriate for issue | Try alternative strategy | | `No backup available` | Backup required but missing | Create backup or use 'restart' strategy | | `Component in use` | Component locked by active process | Wait for completion or force recovery | --- ## gepa_integrity_check Performs comprehensive data integrity validation with optional automatic repair for corruption detection and prevention. ### Purpose Validates data consistency, detects corruption early, and provides automatic repair capabilities to maintain system reliability and data quality. ### Check Scopes | Scope | Description | Components Checked | |-------|-------------|-------------------| | `all` | Complete system validation | All components and data | | `evolution_state` | Evolution process integrity | Evolution engine, populations | | `trajectories` | Execution data validation | Trajectory store, indexes | | `configuration` | Settings and parameters | All component configurations | | `cache` | Performance cache validation | Memory cache, optimization data | ### Validation Types | Validation | Description | Detection | |------------|-------------|-----------| | **Checksum** | File integrity verification | Data corruption | | **Size Match** | Expected vs actual file sizes | Truncation, incomplete writes | | **Dependencies** | Component relationship validation | Missing dependencies | | **Schema** | Data structure validation | Format inconsistencies | | **Referential** | Cross-component data consistency | Orphaned references | ### Parameters | Parameter | Type | Required | Description | |-----------|------|----------|-------------| | `component` | `string` | ❌ | Component to check (default: 'all') | | `autoRepair` | `boolean` | ❌ | Attempt automatic repair (default: false) | ### Request Example ```typescript const response = await mcpClient.callTool('gepa_integrity_check', { component: 'all', autoRepair: true }); ``` ### Response Example ```markdown # Data Integrity Check Results ## Overall Status: ✅ PASSED ### Summary - **Components Checked**: 6 - **Valid Components**: 6 - **Corrupted Components**: 0 - **Auto-Repair**: Enabled ### Detailed Results #### evolution_engine - **Overall Valid**: ✅ Yes - **Checksum Valid**: ✅ - **Size Match**: ✅ - **Dependencies Valid**: ✅ #### pareto_frontier - **Overall Valid**: ✅ Yes - **Checksum Valid**: ✅ - **Size Match**: ✅ - **Dependencies Valid**: ✅ #### trajectory_store - **Overall Valid**: ✅ Yes - **Checksum Valid**: ✅ - **Size Match**: ✅ - **Dependencies Valid**: ✅ #### llm_adapter - **Overall Valid**: ✅ Yes - **Checksum Valid**: ✅ - **Size Match**: ✅ - **Dependencies Valid**: ✅ #### prompt_mutator - **Overall Valid**: ✅ Yes - **Checksum Valid**: ✅ - **Size Match**: ✅ - **Dependencies Valid**: ✅ #### reflection_engine - **Overall Valid**: ✅ Yes - **Checksum Valid**: ✅ - **Size Match**: ✅ - **Dependencies Valid**: ✅ All checked components passed integrity validation. ``` ### Integrity Check Example with Issues ```markdown # Data Integrity Check Results ## Overall Status: ❌ ISSUES DETECTED ### Summary - **Components Checked**: 6 - **Valid Components**: 4 - **Corrupted Components**: 2 - **Auto-Repair**: Enabled ### Detailed Results #### trajectory_store - **Overall Valid**: ❌ No - **Checksum Valid**: ❌ - **Size Match**: ✅ - **Dependencies Valid**: ✅ - **Errors**: Checksum mismatch in trajectory_1733140850_def456.json #### pareto_frontier - **Overall Valid**: ❌ No - **Checksum Valid**: ✅ - **Size Match**: ❌ - **Dependencies Valid**: ❌ - **Errors**: Missing reference to candidate evolution_1733140800_candidate_15 ### Recommendations - **trajectory_store**: Consider running with autoRepair enabled or manual restoration - **pareto_frontier**: Consider running with autoRepair enabled or manual restoration 2 component(s) failed integrity checks. Auto-repair was attempted. ``` ### Automated Integrity Monitoring ```typescript // Schedule regular integrity checks const integritySchedule = { realtime: { interval: 300000, // 5 minutes scope: 'cache', autoRepair: true }, hourly: { interval: 3600000, // 1 hour scope: 'configuration', autoRepair: true }, daily: { interval: 86400000, // 24 hours scope: 'all', autoRepair: false } }; ``` ### Error Cases | Error | Cause | Solution | |-------|-------|----------| | `Component not accessible` | File system or permission issue | Check file permissions and disk space | | `Validation timeout` | Large dataset or slow storage | Increase timeout or check specific components | | `Auto-repair failed` | Corruption too severe | Manual intervention required | | `Checksum calculation failed` | Missing or corrupted metadata | Rebuild component indexes | --- ## Best Practices ### Backup Strategy - **Regular Schedules**: Daily for critical systems, weekly for development - **Retention Policies**: Keep 7 daily, 4 weekly, 12 monthly backups - **Verification**: Test backup integrity regularly - **Labeling**: Use descriptive labels for easy identification ### Recovery Planning - **Risk Assessment**: Identify critical components and failure modes - **Recovery Priorities**: Establish component recovery order - **Testing**: Regular disaster recovery drills - **Documentation**: Maintain recovery procedures and contact information ### Monitoring and Alerts - **Health Checks**: Continuous component monitoring - **Threshold Tuning**: Adjust thresholds based on system behavior - **Alert Fatigue**: Balance sensitivity vs. noise - **Escalation**: Define alert escalation procedures ### Component Recovery - **Progressive Strategies**: Start with least disruptive recovery methods - **Impact Assessment**: Understand data loss implications - **Rollback Plans**: Prepare rollback procedures for failed recoveries - **Post-Recovery Validation**: Verify system functionality after recovery ### Integrity Management - **Proactive Monitoring**: Regular integrity checks before issues occur - **Auto-Repair Guidelines**: Enable for minor issues, manual for critical - **Correlation Analysis**: Identify patterns in integrity failures - **Preventive Measures**: Address root causes of recurring issues

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/sloth-wq/prompt-auto-optimizer-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server