Mnemex

Overview Inspect Schema Related Servers Score Discussions

mnemex
docs

prompt_optimization_flow.md•14.8 kB

# Prompt Optimization Flow ## Overview This document describes a sophisticated prompt optimization architecture that intercepts, analyzes, enhances, and validates user prompts before they reach Claude. The system uses a multi-stage pipeline involving local LLMs, MCP tool chains, knowledge graph integration, and cloud-based optimization to maximize prompt quality while minimizing API costs. ## Key Benefits - **Zero Initial API Cost**: All optimization happens before hitting paid Claude API endpoints - **Intelligent Complexity Routing**: Simple prompts bypass optimization for speed; complex prompts get full treatment - **Knowledge Graph Integration**: Automatically enriches prompts with relevant context from CortexGraph - **Multi-Model Validation**: Cross-validates optimizations using multiple LLMs to ensure quality - **Flexible Architecture**: Local LLMs can be swapped with cloud providers as needed - **Metadata Enrichment**: Adds confidence scores, similarity metrics, and processing metadata to prompts ## Architecture Components ### 1. **Proxy Server** - Central orchestration layer - Handles routing decisions based on complexity - Manages communication between all components - Tracks confidence/similarity thresholds ### 2. **Local LLMs** - Primary: Prompt optimization and tagging - Validation: Multiple instances for cross-validation - Can be replaced with cloud providers (OpenAI, Anthropic, etc.) ### 3. **MCP Tool Chain** - **CortexGraph**: Knowledge graph for context retrieval - **STOPPER**: Process control and validation - **Custom Tools**: User-defined extensions - **Gemini Optimizer**: Large context window for final assembly ### 4. **Validation Layer** - Semantic similarity checks - Confidence scoring - Iterative refinement below thresholds ## Detailed Flow Description ### Phase 1: Initial Intake 1. **User Input**: User enters prompt in Claude Code interface 2. **Proxy Intercept**: Proxy captures the prompt before it reaches Claude 3. **Complexity Analysis**: NLP-based complexity rating determines routing strategy ### Phase 2: Intelligent Routing 4. **Simple Path** (Low Complexity): - Proxy applies basic formatting rules - Routes directly to Claude with minimal processing - Optimizes for speed and reduces overhead 5. **Complex Path** (High Complexity): - Triggers full optimization pipeline - Proceeds to Phase 3 ### Phase 3: Prompt Optimization 6. **Local LLM Processing**: - Adds semantic tags to categorize intent - Restructures prompt for optimal Claude comprehension - Formats according to Claude best practices - Extracts key entities and concepts ### Phase 4: Validation & Refinement 7. **Multi-Model Validation**: - Routes optimized prompt to 2-n additional local LLMs - Each validator scores the optimization independently - Can use semantic similarity algorithms instead of LLMs - Calculates confidence and similarity metrics 8. **Threshold Check**: - If scores meet threshold: Proceed to Phase 5 - If scores below threshold: Return to Phase 3 for reprocessing - Prevents low-quality optimizations from proceeding 9. **Tool Recommendation**: - Proxy receives validated prompt with metadata - System suggests relevant MCP tools for the query ### Phase 5: MCP Tool Chain Execution 10. **CortexGraph Search**: - Searches knowledge graph for related concepts - Retrieves relevant memories and context - Returns similarity-scored results 11. **STOPPER Validation**: - Process control checks - Safety and constraint validation - Prevents out-of-scope operations 12. **Additional Tools**: - Routes to n other tools based on user preferences - Each tool contributes specialized context - Tools run in parallel for efficiency ### Phase 6: Final Assembly 13. **Gemini Optimization**: - Combines original prompt + optimizations + tool outputs - Leverages Gemini's large context window (2M tokens) - Uses generous free tier for cost optimization - Assembles coherent final prompt 14. **Quality Assurance**: - Compares input to assembled output - Generates similarity score (drift detection) - Calculates final confidence rating - Appends metadata to prompt ### Phase 7: Claude Execution 15. **Final Prompt Delivery**: - Proxy sends optimized prompt to Claude - **First API cost incurred at this step** - Prompt includes: - Original user intent (preserved) - Optimization tags and structure - Knowledge graph context - Tool outputs and recommendations - Confidence/similarity metadata - Processing history 16. **Normal Operation**: - Claude processes the enriched prompt - Claude Code continues standard workflow - User receives high-quality response ## Sequence Diagram \`\`\`mermaid sequenceDiagram actor User participant Claude Code Interface participant Proxy participant NLP Complexity Analyzer participant Local LLM (Optimizer) participant Local LLM 2 (Validator) participant Local LLM N (Validator) participant Semantic Similarity Engine participant MCP Chain participant CortexGraph participant STOPPER participant Custom Tools participant Gemini participant Claude API %% Phase 1: Initial Intake User->>Claude Code Interface: Enter prompt Claude Code Interface->>Proxy: Forward prompt Proxy->>NLP Complexity Analyzer: Analyze complexity NLP Complexity Analyzer-->>Proxy: Complexity rating %% Phase 2: Routing Decision alt Low Complexity (Simple Prompt) Proxy->>Proxy: Apply basic rules Proxy->>Claude API: Route directly to Claude Note over Proxy,Claude API: Fast path for simple queries else High Complexity (Complex Prompt) Note over Proxy: Trigger full optimization pipeline %% Phase 3: Optimization Proxy->>Local LLM (Optimizer): Optimize prompt Note over Local LLM (Optimizer): - Add semantic tags - Format for Claude - Extract entities - Restructure query Local LLM (Optimizer)-->>Proxy: Optimized prompt v1 %% Phase 4: Validation Loop rect rgb(240, 240, 240) Note over Proxy,Semantic Similarity Engine: Validation & Refinement Loop par Parallel Validation Proxy->>Local LLM 2 (Validator): Validate optimization Proxy->>Local LLM N (Validator): Validate optimization Proxy->>Semantic Similarity Engine: Check semantic similarity end Local LLM 2 (Validator)-->>Proxy: Confidence score 2 Local LLM N (Validator)-->>Proxy: Confidence score N Semantic Similarity Engine-->>Proxy: Similarity score Proxy->>Proxy: Aggregate scores alt Below Confidence/Similarity Threshold Note over Proxy,Local LLM (Optimizer): Quality check failed Proxy->>Local LLM (Optimizer): Reprocess with feedback Local LLM (Optimizer)-->>Proxy: Optimized prompt v2 Note over Proxy: Loop until threshold met else Above Threshold Note over Proxy: Quality validated, proceed end end Proxy->>Proxy: Append recommendation metadata %% Phase 5: MCP Tool Chain Proxy->>MCP Chain: Route validated prompt + metadata rect rgb(230, 245, 255) Note over MCP Chain,Custom Tools: MCP Tool Execution (Parallel) par Tool Execution MCP Chain->>CortexGraph: Search knowledge graph MCP Chain->>STOPPER: Validate constraints MCP Chain->>Custom Tools: Execute user-defined tools end CortexGraph-->>MCP Chain: Context + memories (similarity scored) STOPPER-->>MCP Chain: Validation results Custom Tools-->>MCP Chain: Tool outputs end %% Phase 6: Final Assembly MCP Chain->>Gemini: Assemble final prompt Note over Gemini: - Combine all inputs - Optimize structure - 2M token context - Free tier usage Gemini->>Gemini: Compare input vs output Gemini->>Gemini: Calculate similarity & confidence Gemini-->>MCP Chain: Final prompt + metadata MCP Chain-->>Proxy: Return final prompt %% Phase 7: Claude Execution Note over Proxy,Claude API: 💰 First API cost incurred here Proxy->>Claude API: Send final optimized prompt Note over Claude API: Prompt includes: - Original intent - Optimizations - Knowledge graph context - Tool outputs - Metadata end %% Normal Operation Claude API-->>Claude Code Interface: Process request Claude Code Interface-->>User: Return response Note over User,Claude Code Interface: Claude Code continues as normal \`\`\` ## Configuration Options ### Complexity Thresholds \`\`\`python # Proxy configuration # Prompts with complexity > COMPLEX_PROMPT_THRESHOLD follow the complex path, otherwise the simple path is used. COMPLEX_PROMPT_THRESHOLD = 0.4 \`\`\` ### Validation Settings \`\`\`python # Validation thresholds CONFIDENCE_THRESHOLD = 0.75 # Minimum confidence to proceed SIMILARITY_THRESHOLD = 0.80 # Minimum semantic similarity MAX_REFINEMENT_ITERATIONS = 3 # Prevent infinite loops \`\`\` ### Model Selection \`\`\`python # Local LLMs (can be replaced with cloud providers) OPTIMIZER_MODEL = "llama-3.1-70b" # Primary optimizer VALIDATOR_MODELS = [ # Validation ensemble "mixtral-8x7b", "qwen-2.5-72b", "deepseek-v2" ] # Example using cloud providers (alternative to local) # OPTIMIZER_MODEL = "openai:gpt-4" # VALIDATOR_MODELS = ["anthropic:claude-3-opus", "openai:gpt-4"] \`\`\` ### MCP Tools \`\`\`python # Tool chain configuration MCP_TOOLS = { "cortex_graph": { "enabled": True, "similarity_threshold": 0.7, "max_results": 10 }, "stopper": { "enabled": True, "strict_mode": False }, "custom": { "user_preferences": True, "context_retrieval": True } } \`\`\` ### Gemini Settings \`\`\`python # Final assembly configuration GEMINI_MODEL = "gemini-2.0-flash-exp" # Free tier, large context GEMINI_MAX_TOKENS = 2000000 # 2M token context window GEMINI_TEMPERATURE = 0.3 # Consistent assembly \`\`\` ## Performance Characteristics ### Latency Profile | Stage | Estimated Time | Notes | |-------|---------------|-------| | Complexity Analysis | 10-50ms | Fast NLP classification | | Simple Path (total) | 50-100ms | Minimal processing overhead | | Optimization | 200-500ms | Local LLM inference | | Validation | 150-300ms | Parallel execution | | MCP Tool Chain | 100-400ms | Depends on tool complexity | | Gemini Assembly | 300-800ms | Large context processing | | **Complex Path (total)** | **1-3 seconds** | Full pipeline | ### Cost Analysis **Traditional Approach** (direct to Claude): - Every prompt hits Claude API immediately - No optimization or context enrichment - Cost: $X per request from first token **Optimized Approach** (this architecture): - Local LLMs: Free (self-hosted) or cheap (cloud) - Gemini: Leverages the generous free tier for final assembly - Claude API: Only hit after full optimization - Cost: $0 until Claude execution, then same $X but better results **Net Effect**: - Same Claude API cost per request - Significantly better prompt quality - Higher success rate (fewer retries needed) - Lower total cost due to reduced iterations ## Implementation Considerations ### 1. **Local LLM Requirements** - GPU: RTX 4090 or better for 70B models - RAM: 64GB+ recommended - Alternative: Use cloud inference APIs (Groq, Together.ai, OpenRouter) ### 2. **Proxy Server** - Needs to be MCP-compatible - Should support WebSocket for streaming - Must handle concurrent validation requests ### 3. **Knowledge Graph Integration** - CortexGraph needs to be populated with relevant data - Index must be kept up-to-date - Consider using CortexGraph for temporal memory ### 4. **Error Handling** - Fallback to simple path if optimization fails - Timeout protection (max 5s total processing) - Graceful degradation if tools unavailable ### 5. **Monitoring & Observability** - Track optimization success rates - Monitor confidence/similarity distributions - Log processing times for each stage - A/B test optimized vs non-optimized prompts ## Future Enhancements 1. **Adaptive Thresholds**: Learn optimal confidence/similarity thresholds per user 2. **Caching Layer**: Cache optimizations for similar prompts 3. **User Feedback Loop**: Incorporate user ratings to improve optimization 4. **Model Selection**: Automatically choose best LLM based on prompt type 5. **Streaming Optimization**: Stream partial results during processing 6. **Cost Tracking**: Detailed cost accounting per stage 7. **A/B Testing Framework**: Compare different optimization strategies ## Security Considerations - **Prompt Injection**: Validate all optimized prompts for injection attempts - **Data Privacy**: Local LLMs keep sensitive data on-premise - **Rate Limiting**: Prevent abuse of free tier services - **Access Control**: Authenticate proxy requests - **Audit Trail**: Log all prompt transformations ## Related Documentation - [CortexGraph Architecture](architecture.md) - Integration with temporal memory - [CortexGraph Documentation](graph_features.md) - Knowledge graph features - [MCP Specification](https://github.com/modelcontextprotocol/specification) - Tool protocol details - [Prompt Injection Prevention](prompt_injection.md) - Security best practices ## Example Workflow ### Input Prompt \`\`\` "Help me write a Python function to process user data" \`\`\` ### After Optimization \`\`\`markdown ## Task: Python Function Development **User Intent**: Create data processing function **Context** (from CortexGraph): - User prefers type hints (from memory: 2025-10-15) - Uses pytest for testing (from memory: 2025-10-20) - Prefers dataclasses over dicts (from memory: 2025-10-12) **Requirements**: 1. Function should process user data 2. Follow user's Python style preferences 3. Include type hints and docstrings 4. Consider testing approach **Metadata**: - Confidence: 0.87 - Similarity: 0.92 - Optimization iterations: 1 - Tools used: CortexGraph, STOPPER - Processing time: 1.2s \`\`\` ### Result Claude receives a rich, contextualized prompt that produces higher-quality output on the first try, reducing the need for follow-up iterations. --- **Built with** [Claude Code](https://claude.com/claude-code) 🤖

Latest Blog Posts

What Is Context Bloat in MCP?
By Om-Shree-0709 on December 16, 2025.
mcp
Context Bloat
MCP Moves to the Linux Foundation: Neutral Stewardship for Agentic Infrastructure
By Om-Shree-0709 on December 15, 2025.
mcp
anthropic
Linux Foundation
Code Execution with MCP: Architecting Agentic Efficiency
By Om-Shree-0709 on December 14, 2025.
mcp
Token bloat

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/prefrontalsys/mnemex'

If you have feedback or need assistance with the MCP directory API, please join our Discord server