Delia

Overview Schema Related Servers Score Discussions

005-tot-meta-orchestration.md•11.5 KiB

# ADR-005: Tree of Thoughts as Meta-Orchestration **Status:** Proposed **Date:** 2025-12-20 **Deciders:** Delia Core Team ## Context We have multiple orchestration modes (VOTING, AGENTIC, DEEP_THINKING, COMPARISON) but no way to: 1. Know which mode is best for a given task type 2. Learn from comparative outcomes 3. Handle extremely high-stakes decisions that need maximum confidence Additionally, TREE_OF_THOUGHTS is defined but unimplemented (currently aliases to DEEP_THINKING). ## Decision Redefine **Tree of Thoughts (ToT)** as **Meta-Orchestration** that: - Executes multiple orchestration modes in parallel as "branches" - Uses Critic to evaluate and select the best outcome - Feeds results to ACE Framework for meta-learning - Reserves ToT for high-stakes scenarios (not default) ### Key Principles 1. **ToT is the Explorer**: Tries multiple orchestration approaches on the same problem 2. **ACE is the Learner**: Analyzes why one approach won, updates playbook 3. **Intent Detector is the Exploiter**: Uses learned patterns for future routing 4. **Escalation-based**: ToT triggers on high stakes, not routine tasks ## Architecture ### ToT Execution Flow ```python async def _execute_tree_of_thoughts( self, intent: DetectedIntent, message: str, backend_type: str | None, model_override: str | None, messages: list[dict[str, Any]] | None = None, ) -> OrchestrationResult: """ Meta-orchestration: Try multiple orchestration modes, critic picks best, ACE learns from outcome. """ # 1. Define branches (different orchestration modes) branches = [ ("voting", self._execute_voting), ("agentic", self._execute_agentic), ("deep_thinking", self._execute_deep_thinking), ] # 2. Execute all branches in parallel results = await asyncio.gather(*[ executor(intent, message, backend_type, model_override, messages) for _, executor in branches ]) # 3. Critic evaluates all results with reasoning critique = await self._critic_evaluate_branches( message=message, branches=[(name, res) for (name, _), res in zip(branches, results)] ) # 4. Select winner winner_idx = critique["winner_index"] winner_mode = branches[winner_idx][0] winner_result = results[winner_idx] # 5. Feed to ACE for learning await self._ace_learn_from_tot( task_type=intent.task_type, message=message, branches=branches, results=results, winner_mode=winner_mode, critic_reasoning=critique["reasoning"], ) # 6. Return winner with ToT metadata winner_result.mode = OrchestrationMode.TREE_OF_THOUGHTS winner_result.debug_info["tot_branches"] = [b[0] for b in branches] winner_result.debug_info["tot_winner"] = winner_mode winner_result.debug_info["tot_reasoning"] = critique["reasoning"] return winner_result ``` ### Critic Evaluation ```python async def _critic_evaluate_branches( self, message: str, branches: list[tuple[str, OrchestrationResult]], ) -> dict[str, Any]: """ Critic compares all branch outcomes and picks best with reasoning. """ comparison_prompt = f""" Original Task: {message} Branch Results: {self._format_branches_for_critic(branches)} As the Senior Critic, evaluate all branches and pick the BEST result. Scoring criteria: 1. Correctness (does it solve the task?) 2. Completeness (are all requirements met?) 3. Quality (code quality, reasoning depth, security) 4. Confidence (how certain are we this is right?) OUTPUT FORMAT (JSON): {{ "winner_index": 0, // Index of best branch (0-based) "winner_mode": "voting", // Name of winning orchestration mode "reasoning": "VOTING produced the most thorough security analysis...", "scores": [ {{"mode": "voting", "correctness": 9, "completeness": 9, "quality": 9, "confidence": 8}}, {{"mode": "agentic", "correctness": 7, "completeness": 8, "quality": 7, "confidence": 6}}, {{"mode": "deep_thinking", "correctness": 8, "completeness": 7, "quality": 8, "confidence": 7}} ], "insights": "VOTING's consensus mechanism caught edge cases that single-model approaches missed." }} """ from ..llm import call_llm from ..routing import select_model critic_model = await select_model(task_type="moe", content_size=len(comparison_prompt)) result = await call_llm( model=critic_model, prompt=comparison_prompt, system=ROLE_PROMPTS[ModelRole.CRITIC], enable_thinking=True, ) import json from ..text_utils import strip_thinking_tags return json.loads(strip_thinking_tags(result["response"])) ``` ### ACE Meta-Learning ```python async def _ace_learn_from_tot( self, task_type: str, message: str, branches: list[tuple[str, callable]], results: list[OrchestrationResult], winner_mode: str, critic_reasoning: str, ) -> None: """ ACE Framework learns meta-patterns from ToT outcomes. Updates playbook with orchestration selection strategies. """ # Extract task characteristics task_features = self._extract_task_features(message) # Build learning prompt learning_prompt = f""" Task Type: {task_type} Task Features: {json.dumps(task_features)} Orchestration Modes Tried: {self._format_tot_results(branches, results)} Winner: {winner_mode} Critic Reasoning: {critic_reasoning} As the ACE Reflector, analyze WHY {winner_mode} won for this type of task. OUTPUT FORMAT (JSON): {{ "task_pattern": "security-critical code review with crypto primitives", "winning_mode": "{winner_mode}", "why_it_won": "Detailed analysis of what properties made this mode optimal", "when_to_use": "Generalized rule for when to use {winner_mode}", "playbook_update": "Actionable strategy bullet for future tasks", "confidence": 0.85 }} """ # ACE Reflector analyzes from ..llm import call_llm reflector_result = await call_llm( model=config.model_moe.default_model, prompt=learning_prompt, system=ACE_REFLECTOR_PROMPT, enable_thinking=True, ) if reflector_result.get("success"): lesson = json.loads(strip_thinking_tags(reflector_result["response"])) # ACE Curator integrates into playbook await self._curate_orchestration_strategy( task_type=task_type, lesson=lesson, ) log.info( "tot_meta_learning_complete", task_type=task_type, winner_mode=winner_mode, confidence=lesson.get("confidence"), pattern=lesson.get("task_pattern"), ) ``` ### Intent Detection Enhancement ```python # In orchestration/intent.py def _check_tot_triggers(self, message: str, intent: DetectedIntent) -> bool: """ Determine if ToT meta-orchestration should be used. Triggers: 1. Explicit user request ("try multiple approaches", "high stakes") 2. High-risk keywords (security, crypto, auth, payment, medical) 3. User frustration level CRITICAL 4. Previous attempts failed (session history) 5. Playbook recommends ToT for this task pattern """ msg_lower = message.lower() # Explicit ToT request if any(kw in msg_lower for kw in [ "tree of thoughts", "tot", "multiple approaches", "try different ways", "meta-orchestrate", "high stakes" ]): return True # High-risk domains (consult playbook) high_risk_keywords = [ "security", "crypto", "auth", "payment", "medical", "safety-critical", "production", "mission-critical" ] if any(kw in msg_lower for kw in high_risk_keywords): # Check playbook: does it recommend ToT for this domain? playbook_context = playbook_manager.format_for_prompt(intent.task_type) if "meta-orchestration" in playbook_context or "tree-of-thoughts" in playbook_context: return True # User frustration (from frustration detector) from ..frustration import get_frustration_detector frustration = get_frustration_detector().analyze(message) if frustration.level == "CRITICAL": log.info("tot_triggered_by_frustration", level=frustration.level) return True return False ``` ## Benefits ### 1. Self-Improving System - ToT explores → ACE learns → Intent detector exploits - Continuous improvement without manual tuning ### 2. Domain-Specific Optimization ``` Learned patterns: - "Security audits → VOTING (k=5) for consensus" - "Refactoring → AGENTIC for tool access" - "Architecture design → DEEP_THINKING for reasoning" ``` ### 3. Confidence Guarantee - For high-stakes tasks, ToT provides maximum confidence - User gets best result from multiple approaches - Transparent reasoning about why one approach won ### 4. Cost-Effective Escalation - Normal tasks use direct modes (fast, cheap) - Only escalate to ToT when needed - ToT cost justified by learning value ## Trade-offs ### Costs - **Latency**: ToT runs multiple modes in parallel (3-5x slower) - **Compute**: Multiple LLM calls per request - **Complexity**: More code to maintain ### Mitigations - Reserve ToT for high-stakes only (not default) - Parallel execution keeps latency reasonable - Learning amortizes cost (fewer ToT calls over time as intent detector improves) ## Implementation Phases ### Phase 1: Core ToT Execution - [x] Design architecture - [x] Implement `_execute_tree_of_thoughts()` in executor.py (lines 342-500) - [x] Implement `_critic_evaluate_branches()` via Critic class - [x] Add ToT result metadata (tot_branches, tot_winner, tot_reasoning) ### Phase 2: ACE Meta-Learning Integration - [x] Implement `_ace_meta_learn()` in executor.py (lines 502-550) - [x] Implement `learn_from_tot()` in meta_learning.py with Bayesian updates - [x] Add orchestration playbook support via playbook_manager - [x] Test learning loop ### Phase 3: Intent Detection Enhancement - [x] Add `_check_orchestration_learner()` in intent.py (lines 195-256) - [x] Implement `should_use_tot()` with probabilistic triggering - [x] Add high-stakes keyword detection (security, crypto, auth, etc.) - [x] Integrate into executor.execute() dispatch ### Phase 4: MCP Tool Integration - [x] Add `tot=True` parameter to `delegate()` for explicit triggering - [x] Create `_delegate_with_tot()` helper function - [x] Add ToT metadata in response footer ### Phase 5: Optimization (Future) - [ ] Add branch caching (skip redundant executions) - [ ] Implement adaptive branch selection - [ ] Add cost controls (max branches, timeouts) - [ ] Performance benchmarking ## Success Metrics 1. **Learning Rate**: Playbook accumulates useful orchestration patterns 2. **Accuracy**: Intent detector selects correct mode more often over time 3. **ToT Frequency**: Decreases as system learns (good sign) 4. **User Satisfaction**: High-stakes tasks get better results ## Alternatives Considered ### Alternative 1: Full ToT Implementation Traditional tree search with branching/pruning - too complex, marginal benefit ### Alternative 2: Remove ToT Simple but loses opportunity for meta-learning ### Alternative 3: Manual Mode Selection Users pick orchestration mode - puts burden on user ## References - ToolOrchestra: Training LLMs to Think Like Tools - Tree of Thoughts: Deliberate Problem Solving with Large Language Models - MDAP: Massively Decomposed Agentic Processes - ACE Framework (Autonomous Cognitive Entity) ## Decision **APPROVED** - Implement ToT as Meta-Orchestration with ACE integration This creates a complete learning system where: - ToT explores orchestration space - ACE learns meta-patterns - Intent detector exploits learned knowledge - System continuously improves

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/zbrdc/delia'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

005-tot-meta-orchestration.md•11.5 KiB