M.I.M.I.R - Multi-agent Intelligent Memory & Insight Repository

Overview Schema Related Servers Score Discussions

Mimir
docs
architecture

MULTI_AGENT_GRAPH_RAG.md•44.8 KiB

# Multi-Agent Graph-RAG Orchestration **Date:** 2025-10-22 **Status:** ✅ Production Ready (v4.0) **Version:** 4.0 Architecture Specification --- ## 📚 Related Documentation This is the **complete technical architecture specification** for multi-agent orchestration. For related documents: - **📋 [Executive Summary](../MULTI_AGENT_EXECUTIVE_SUMMARY.md)**: High-level overview for stakeholders - **🏗️ This Document**: Complete technical architecture specification (v3.1) - **🗺️ [Implementation Roadmap](MULTI_AGENT_ROADMAP.md)**: Phase-by-phase implementation plan (Q4 2025 - Q1 2026) --- ## Executive Summary This document describes the evolution of the Graph-RAG TODO MCP Server from single-agent context management to **multi-agent orchestration** with ephemeral workers and adversarial validation. **Key Innovation:** Agent-scoped context management where context pruning happens naturally through process boundaries rather than algorithmic deduplication. **Research Validation:** [CONVERSATION_ANALYSIS.md](../CONVERSATION_ANALYSIS.md) validates this architecture against existing Graph-RAG research. --- ## 🎯 Core Problem Statement ### The Context Accumulation Problem **Traditional Single-Agent Pattern:** ``` Agent Context Growth Over Time: Turn 1: [Research] ← 1K tokens Turn 5: [Research][Task1][Task2] ← 5K tokens Turn 10: [Research][Task1-5] ← 15K tokens Turn 20: [Research][Task1-10][Errors] ← 40K tokens ❌ Context bloat ``` **Issue:** External storage (Graph-RAG) doesn't solve this - retrieval brings context back into the LLM's context window. **Research Finding:** "Lost in the Middle" research shows LLMs have U-shaped performance curves. Middle-positioned information becomes effectively invisible even with 200K+ context windows[^1]. --- ## 🏗️ Architecture Overview ### Multi-Agent System with Deliverable-Focused QC & Retries ``` ┌─────────────────────────────────────────────────────────────────────┐ │ MULTI-AGENT GRAPH-RAG ARCHITECTURE (v4.0) │ │ Deliverable-Focused QC, Evidence-Based Workers, Simplified │ └─────────────────────────────────────────────────────────────────────┘ Phase 0: Request Optimization - "mimir-chain" startup (OPTIONAL) ┌────────────────────────────────────────────┐ │ User Input: "Build authentication system" │ └──────────────┬─────────────────────────────┘ ↓ ┌────────────────────────────────────────────┐ │ Ecko Agent (Prompt Architect) - OPTIONAL │ │ ┌──────────────────────────────────────┐ │ │ │ 1. Receives raw user request │ │ │ │ 2. Analyzes request for clarity │ │ │ │ 3. Documents assumptions & context │ │ │ │ 4. Identifies ambiguities │ │ │ │ 5. Generates optimized specification │ │ │ │ 6. Output: Enhanced user request │ │ │ │ │ │ │ │ Tools: NONE (text analysis only) │ │ │ │ Note: Can skip if prompt is clear │ │ │ └──────────────────────────────────────┘ │ └──────────────┬─────────────────────────────┘ ↓ Optimized Request (or original) ↓ Phase 1: PM Agent (Research & Planning) - "mimir-chain" ┌────────────────────────────────────────────┐ │ PM Agent: Complete Task Breakdown │ │ ┌──────────────────────────────────────┐ │ │ │ Receives Ecko's optimized spec │ │ │ │ │ │ │ │ 1. memory_search_nodes() - Find │ │ │ │ existing TODOs, files, patterns │ │ │ │ 2. memory_query_nodes() - Get related │ │ │ │ context from knowledge graph │ │ │ │ 3. read_file() - Check README, docs │ │ │ │ 4. Analyze repository structure │ │ │ │ │ │ │ │ 5. Break down into tasks: │ │ │ │ - Task 0: Environment validation │ │ │ │ - Task 1.x: Main workflow tasks │ │ │ │ │ │ │ │ 6. For EACH task, define: │ │ │ │ - Worker agent role │ │ │ │ - QC agent role │ │ │ │ - Verification criteria │ │ │ │ - Tool-Based Execution section │ │ │ │ - Estimated tool calls │ │ │ │ - maxRetries (default: 2) │ │ │ │ - Recommended model │ │ │ │ │ │ │ │ 7. Map dependencies between tasks │ │ │ │ 8. Output: chain-output.md │ │ │ │ │ │ │ │ Tools: Filesystem + 5 graph search │ │ │ └──────────────────────────────────────┘ │ └──────────────┬─────────────────────────────┘ ↓ ┌─────────────────────────────────────────────────────────────────────┐ │ KNOWLEDGE GRAPH (Neo4j Persistent) │ │ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐│ │ │ Task 1.1 │ │ Task 1.2 │ │ Task 1.3 ││ │ │ status: pending │→→│ status: pending │→→│ status: pending ││ │ │ + workerRole │ │ + workerRole │ │ + workerRole ││ │ │ + qcRole │ │ + qcRole │ │ + qcRole ││ │ │ + verificationCri│ │ + verificationCri│ │ + verificationCri││ │ │ + maxRetries: 2 │ │ + maxRetries: 2 │ │ + maxRetries: 2 ││ │ │ + attemptNumber:0│ │ + attemptNumber:0│ │ + attemptNumber:0││ │ └───────────────────┘ └───────────────────┘ └───────────────────┘│ │ │ │ [Lock Status: All tasks available, no locks held] │ └─────────────────────────────────────────────────────────────────────┘ ↓ Phase 1.5: Preamble Generation - "mimir-execute" startup ┌─────────────────────────────────────────────────────────────────────┐ │ Agentinator (Preamble Generator) │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ For each unique agent role (Worker + QC): │ │ │ │ │ │ │ │ 1. Extract unique roles from chain-output.md: │ │ │ │ - Worker roles (agentRoleDescription) │ │ │ │ - QC roles (qcRole) │ │ │ │ │ │ │ │ 2. Hash role description → worker-abc123.md │ │ │ │ (Reuse if hash already exists) │ │ │ │ │ │ │ │ 3. Generate specialized preamble with: │ │ │ │ - Role-specific expertise │ │ │ │ - Agentic framework principles │ │ │ │ - Tool usage guidelines │ │ │ │ - Output format requirements │ │ │ │ - Worker: Includes WORKER_TOOL_EXECUTION.md guidance │ │ │ │ - QC: Includes QC_VERIFICATION_CRITERIA.md guidance │ │ │ │ │ │ │ │ 4. Cache in generated-agents/ directory │ │ │ │ │ │ │ │ 5. Return paths to PM for task assignment │ │ │ └──────────────────────────────────────────────────────────────┘ │ └──────────────┬──────────────────────────────────────────────────────┘ │ ├─→ generated-agents/worker-abc123.md (Worker preamble) ├─→ generated-agents/worker-def456.md (QC preamble 1) └─→ generated-agents/worker-ghi789.md (QC preamble 2) ↓ Phase 2: Worker Execution Loop (Per Task) - "mimir-execute" ┌─────────────────────────────────────────────────────────────────────┐ │ 🔄 ATTEMPT LOOP (attemptNumber: 1 → maxRetries+1) │ │ │ │ ┌────────────────────────────────────────────────────────────────┐ │ │ │ Worker Agent Execution │ │ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ │ │ 1. PHASE 1: Task Initialization (System) │ │ │ │ │ │ createGraphNode(taskId): │ │ │ │ │ │ - status: 'pending' │ │ │ │ │ │ - attemptNumber: 0 │ │ │ │ │ │ - taskCreatedAt: timestamp │ │ │ │ │ │ - All task metadata from chain-output.md │ │ │ │ │ │ │ │ │ │ │ │ 2. PHASE 2: Worker Execution Start (System) │ │ │ │ │ │ updateGraphNode(taskId): │ │ │ │ │ │ - status: 'worker_executing' │ │ │ │ │ │ - attemptNumber: 1 (or retry count) │ │ │ │ │ │ - workerStartTime: timestamp │ │ │ │ │ │ - isRetry: boolean │ │ │ │ │ │ - retryReason: (if retry) │ │ │ │ │ │ │ │ │ │ │ │ │ 3. fetchTaskContext(taskId, 'worker') - Pre-fetch: │ │ │ │ │ │ ✅ title, requirements, description, workerRole │ │ │ │ │ │ ✅ files (max 10), dependencies (max 5) │ │ │ │ │ │ ❌ NO PM research, planningNotes, alternatives │ │ │ │ │ │ → 90%+ context reduction! │ │ │ │ │ │ │ │ │ │ │ │ 4. Load worker preamble (generated-agents/worker-*.md) │ │ │ │ │ │ + Evidence-based execution guidance │ │ │ │ │ │ + Tool output verification requirements │ │ │ │ │ │ │ │ │ │ │ │ 5. Calculate dynamic circuit breaker: │ │ │ │ │ │ - PM estimated tool calls × 1.5 │ │ │ │ │ │ - Default: 50 if no estimate │ │ │ │ │ │ - Recursion limit: toolCalls × 3 │ │ │ │ │ │ │ │ │ │ │ │ 6. Execute with LangChain AgentExecutor: │ │ │ │ │ │ - Preamble + Task Context + Task Prompt │ │ │ │ │ │ - If retry: Include errorContext from QC │ │ │ │ │ │ - Tools: filesystem + graph operations (read-only) │ │ │ │ │ │ - maxTokens: 4000 (prevent verbosity) │ │ │ │ │ │ - Circuit breaker: Dynamic limit │ │ │ │ │ │ │ │ │ │ │ │ 7. PHASE 3: Worker Execution Complete (System) │ │ │ │ │ │ updateGraphNode(taskId): │ │ │ │ │ │ - status: 'worker_completed' │ │ │ │ │ │ - workerOutput: <result> (truncated 50k chars) │ │ │ │ │ │ - workerDuration, workerTokens, workerToolCalls │ │ │ │ │ │ - workerCompletedAt: timestamp │ │ │ │ │ │ - workerMessageCount, estimatedContextTokens │ │ │ │ │ └──────────────────────────────────────────────────────────┘ │ │ │ └────────────────────────┬───────────────────────────────────────┘ │ │ ↓ │ │ ┌────────────────────────────────────────────────────────────────┐ │ │ │ 🛡️ QC AGENT VERIFICATION (Circuit Breaker) │ │ │ │ ┌──────────────────────────────────────────────────────────┐ │ │ │ │ │ 1. PHASE 5: QC Execution Start (System) │ │ │ │ │ │ updateGraphNode(taskId): │ │ │ │ │ │ - status: 'qc_executing' │ │ │ │ │ │ - qcStartTime: timestamp │ │ │ │ │ │ - qcAttemptNumber: 1 (or retry count) │ │ │ │ │ │ │ │ │ │ │ │ 2. fetchTaskContext(taskId, 'qc') - Pre-fetch: │ │ │ │ │ │ ✅ requirements, workerOutput, verificationCriteria │ │ │ │ │ │ ❌ NO worker implementation details, PM research │ │ │ │ │ │ │ │ │ │ │ │ 3. memory_get_subgraph(taskId, depth=2) - Get deps │ │ │ │ │ │ │ │ │ │ │ │ 4. Load QC preamble (generated-agents/qc-*.md) │ │ │ │ │ │ Role: Deliverable quality validator │ │ │ │ │ │ │ │ │ │ │ │ 5. Execute deliverable-focused verification: │ │ │ │ │ │ - Focus: Does deliverable meet requirements? │ │ │ │ │ │ - Verify with tools: Read files, run tests │ │ │ │ │ │ - Check completeness, accuracy, functionality │ │ │ │ │ │ - Ignore process metrics (tool calls, evidence) │ │ │ │ │ │ - maxTokens: 1000 (concise feedback) │ │ │ │ │ │ │ │ │ │ │ │ 6. Parse structured output: │ │ │ │ │ │ verdict: "PASS" | "FAIL" │ │ │ │ │ │ score: 0-100 (based on deliverable quality) │ │ │ │ │ │ feedback: <2-3 sentences on deliverable gaps> │ │ │ │ │ │ issues: [<what's missing/wrong in deliverable>] │ │ │ │ │ │ requiredFixes: [<what to add/change in deliverable>] │ │ │ │ │ │ │ │ │ │ │ │ 7. Store full QC result (NO truncation): │ │ │ │ │ │ feedback: complete (no truncation) │ │ │ │ │ │ issues: all issues (no truncation) │ │ │ │ │ │ requiredFixes: all fixes (no truncation) │ │ │ │ │ │ │ │ │ │ │ │ 8. PHASE 6: QC Execution Complete (System) │ │ │ │ │ │ updateGraphNode(taskId): │ │ │ │ │ │ - status: 'qc_passed' OR 'qc_failed' │ │ │ │ │ │ - qcScore, qcPassed, qcFeedback │ │ │ │ │ │ - qcIssues, qcRequiredFixes │ │ │ │ │ │ - qcCompletedAt: timestamp │ │ │ │ │ └──────────────────────────────────────────────────────────┘ │ │ │ └────────────────────────┬───────────────────────────────────────┘ │ │ ↓ │ │ ┌────────────┴────────────┐ │ │ │ │ │ │ ↓ ↓ │ │ ✅ PASS ❌ FAIL │ │ (score ≥ 80) (score < 80) │ │ │ │ │ │ │ ├─→ Check attemptNumber │ │ │ │ │ │ │ ┌────┴────┐ │ │ │ │ │ │ │ │ ↓ ↓ │ │ │ attemptNumber attemptNumber │ │ │ ≤ maxRetries > maxRetries │ │ │ │ │ │ │ │ │ ↓ │ │ │ │ 🚨 CIRCUIT BREAKER │ │ │ │ TRIGGERED │ │ │ │ │ │ │ │ │ ┌────┴────────────────┐ │ │ │ │ │ QC Failure Report │ │ │ │ │ │ (maxTokens: 2000) │ │ │ │ │ ├─────────────────────┤ │ │ │ │ │ - Timeline of │ │ │ │ │ │ attempts │ │ │ │ │ │ - Score progression │ │ │ │ │ │ - Root cause │ │ │ │ │ │ - Recommendations │ │ │ │ │ └─────────────────────┘ │ │ │ │ │ │ │ │ │ ↓ │ │ │ │ PHASE 9: Task Failure (System)│ │ │ │ updateGraphNode: │ │ │ │ - status: 'failed' │ │ │ │ - qcScore: <final score> (PRIMARY)│ │ │ │ - qcPassed: false │ │ │ │ - qcFeedback: <complete feedback>│ │ │ │ - qcFailureReport: <report> │ │ │ │ - totalAttempts: maxRetries+1 │ │ │ │ - totalQCFailures: N │ │ │ │ - qcFailureReportGenerated: true│ │ │ │ - finalWorkerOutput (truncated)│ │ │ │ - improvementNeeded: true │ │ │ │ - qcAttemptMetrics: JSON { │ │ │ │ history, lowestScore, │ │ │ │ highestScore, avgScore │ │ │ │ } │ │ │ │ │ │ │ │ ❌ TASK FAILED │ │ │ │ Exit attempt loop │ │ │ │ │ │ │ ↓ │ │ │ 🔁 RETRY LOOP │ │ │ │ │ │ │ PHASE 7: Retry Preparation (System) │ │ │ updateGraphNode: │ │ │ - status: 'preparing_retry' │ │ │ - nextAttemptNumber: attemptNumber + 1 │ │ │ - retryReason: 'qc_failure' │ │ │ - retryErrorContext: { │ │ │ previousAttempt, │ │ │ qcFeedback (truncated), │ │ │ issues (truncated), │ │ │ requiredFixes (truncated) │ │ │ } │ │ │ - retryPreparedAt: timestamp │ │ │ │ │ │ │ └─→ Back to Worker (Step 1) │ │ │ with errorContext in prompt │ │ │ │ │ ↓ │ │ PHASE 8: Task Success (System) │ │ updateGraphNode: │ │ - status: 'completed' │ │ - qcScore: <final score> (PRIMARY FIELD) │ │ - qcPassed: true │ │ - qcFeedback: <complete feedback> │ │ - verifiedAt: timestamp │ │ - totalAttempts, totalTokensUsed, totalToolCalls │ │ - qcFailuresCount, retriesNeeded │ │ - qcPassedOnAttempt │ │ - qcAttemptMetrics: JSON (history for debugging) │ │ │ │ ✅ TASK COMPLETED │ │ Exit attempt loop │ │ │ └──────────────┬───────────────────────────────────────────────────────┘ ↓ Phase 3: Final Report Generation - "mimir-execute" completion ┌────────────────────────────────────────────┐ │ PM Agent (Final Report) │ │ ┌──────────────────────────────────────┐ │ │ │ 1. Aggregate all task outputs from │ │ │ │ graph (workerOutput, qcVerif.) │ │ │ │ │ │ │ │ 2. If ANY tasks failed: │ │ │ │ - Generate PM failure analysis │ │ │ │ - Impact assessment │ │ │ │ - Blocking dependencies │ │ │ │ - Recommendations │ │ │ │ - maxTokens: 3000 │ │ │ │ │ │ │ │ 3. Summarize files changed │ │ │ │ (from workerOutput + tool calls) │ │ │ │ │ │ │ │ 4. Summarize agent reasoning │ │ │ │ (from qcVerification feedback) │ │ │ │ │ │ │ │ 5. Extract key decisions & metrics │ │ │ │ │ │ │ │ 6. Output: execution-report.md │ │ │ │ with links to graph nodes │ │ │ └──────────────────────────────────────┘ │ └────────────────────────────────────────────┘ ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 🛡️ CIRCUIT BREAKERS & GUARDRAILS (✅ IMPLEMENTED v4.0) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1. ✅ QC Deliverable Focus: Scores deliverable quality, not process metrics 2. ✅ Max Retries: attemptNumber > maxRetries → CIRCUIT BREAKER (default: 2) 3. ✅ Dynamic Tool Call Limits: PM estimated tool calls × 1.5 (prevents spirals) 4. ✅ Recursion Limits: Tool call limit × 3 messages (prevents infinite loops) 5. ✅ NO Truncation: Full QC feedback stored for complete worker guidance 6. ✅ Token Limits: maxTokens on all agents to prevent verbose LLM responses 7. ✅ Context Isolation: Workers get 90%+ reduced context (no PM research) 8. ✅ Graph Storage Gate: System stores results automatically (workers return data) 9. ✅ Automatic Diagnostic Capture: 10 phases of system-level metadata capture 10. ✅ Failure Reporting: Two-level reports (QC technical + PM strategic) 11. ✅ Evidence-Based Workers: Must show actual tool output, not summaries 12. ✅ Hallucination Prevention: Workers required to quote evidence for claims ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ ``` --- ## 🔬 Research Validation ### Statement-by-Statement Analysis | Claim | Research Support | Verdict | |-------|------------------|---------| | Tool calls don't reduce context | ✅ "Lost in the Middle" validates | **CORRECT** | | Duplicates cause hallucinations | ✅ Context Confusion failure mode | **CORRECT** | | PM/Worker architecture | ✅ Extends hierarchical memory | **SOUND** | | Adversarial QC validation | ✅ Aligns with poisoning prevention | **VALID** | | Mutex/locking requirement | ⚠️ Not in research (gap identified) | **CORRECT** | **Full analysis:** [CONVERSATION_ANALYSIS.md](../CONVERSATION_ANALYSIS.md) --- ## 💡 Key Insights ### Insight 1: Agent-Scoped Context = Natural Pruning **Traditional Approach:** - Algorithmic deduplication within single agent - Complex context management logic - Still vulnerable to accumulation over time **Multi-Agent Approach:** - Process boundaries enforce context isolation - Worker termination = automatic cleanup - Operating system analogy: process memory vs. shared disk **Analogy:** ``` OS Process Model Multi-Agent Model ────────────────── ─────────────────── Process A (RAM) ←→ PM Agent (Context) Process B (RAM) ←→ Worker 1 (Context) Shared Disk ←→ Knowledge Graph Process exit ←→ Agent termination ``` ### Insight 2: Adversarial Validation Architecture **Not just parallel execution - it's adversarial:** - **Worker Agent**: Optimized for implementation speed - **QC Agent**: Optimized for verification accuracy - **Correction Loop**: Preserves worker context for efficient retry **Benefits:** 1. Catches hallucinations before storage (prevents error propagation) 2. Provides learning signal (correction prompts improve worker accuracy) 3. Maintains audit trail (compliance requirement for enterprise) ### Insight 3: Context Deduplication ≠ External Storage **Critical Discovery:** Simply offloading to external graph doesn't reduce context - retrieval brings it back. **Solution:** Active deduplication + agent-scoped isolation **Measurement:** ``` Deduplication Rate = 1 - (Unique Context / Total Context) Target: >80% across agent fleet ``` --- ## 🎯 Success Metrics (v3.0+) ### Primary Metrics **1. Context Deduplication Rate** ``` Rate = 1 - (Unique Context Tokens / Total Context Tokens) Target: >80% Measurement: Hash-based fingerprinting across agent contexts ``` **2. Agent Context Lifespan** ``` Avg Lifespan = Σ(agent_context_duration) / num_agents Target: <5 min (workers), <60 min (PM) Measurement: Timestamp from spawn to termination ``` **3. Task Allocation Efficiency** ``` Efficiency = Successful Claims / Total Claim Attempts Target: >95% Measurement: Lock conflict rate ``` **4. Cross-Agent Error Propagation** ``` Propagation = Errors Stored / Total Errors Generated Target: <5% Measurement: QC rejection rate before storage ``` ### Secondary Metrics **5. Subgraph Retrieval Precision** ``` Precision = Relevant Nodes / Total Nodes Retrieved Target: >90% Measurement: Human eval or downstream task success ``` **6. PM → Worker Handoff Completeness** ``` Completeness = 1 - (Worker Questions / Tasks Assigned) Target: <10% clarification needed Measurement: Worker follow-up queries to PM ``` **7. Worker Retry Rate** ``` Retry Rate = QC Rejections / Total Task Attempts Target: <20% Measurement: Correction prompt frequency ``` --- ## 🔧 Implementation Phases ### Phase 1: Multi-Agent Foundation (v3.0) **Objective:** Enable basic PM/Worker/QC pattern **Features:** - [ ] **Task Locking System**: Optimistic locking with version field ```typescript interface TaskLock { taskId: string; agentId: string; version: number; lockedAt: Date; expiresAt: Date; } ``` - [ ] **Agent Lifecycle Management**: Spawn, execute, terminate workers ```typescript class WorkerAgent { async claimTask(): Promise<Task | null> async executeTask(task: Task): Promise<TaskOutput> async storeOutput(output: TaskOutput): Promise<void> async terminate(): void } ``` - [x] **Context Isolation**: ✅ Implemented with ContextManager (v3.1) ```typescript // IMPLEMENTED: src/managers/ContextManager.ts function get_task_context(taskId: string, agentType: 'pm' | 'worker' | 'qc'): Context { // PM: Full context (100%) // Worker: Minimal context (files max 10, no research) → 95%+ reduction // QC: Requirements + worker output } ``` **Success Criteria:** ✅ ACHIEVED - Zero task conflicts across parallel workers ✅ (locking system) - Worker context <5% of PM context size ✅ (95.3-95.6% reduction measured) - PM context doesn't grow during worker execution ✅ (ephemeral workers) ### Phase 2: Adversarial Validation (v3.1) ✅ IMPLEMENTED **Objective:** Add QC agent with verification and correction **Features:** - [x] **Subgraph Verification**: ✅ QC uses filtered context + subgraph ```typescript // IMPLEMENTED: testing/qc-verification-workflow.test.ts async function verifyTask(taskId: string): Promise<VerificationResult> { const qcContext = get_task_context(taskId, 'qc'); const subgraph = memory_get_subgraph(taskId, depth=2); return { passed: boolean, score: 0-100, feedback: string, issues: string[], requiredFixes: string[] }; } ``` - [x] **Retry Logic with Max Attempts**: ✅ Worker gets 2 retries (3 total attempts) ```typescript // IMPLEMENTED: testing/qc-verification-workflow.test.ts interface TaskRetry { attemptNumber: number; // 1, 2, 3 maxRetries: 2; // Default errorContext: { previousAttempt: number; qcFeedback: string; issues: string[]; requiredFixes: string[]; }; qcVerificationHistory: QCResult[]; } // If attemptNumber > maxRetries → Task marked as FAILED ``` - [x] **Two-Level Failure Reporting**: ✅ QC report + PM summary ```typescript // QC Failure Report (after max retries) interface QCFailureReport { timeline: Array<{attempt, score, issues}>; rootCauses: string[]; recommendations: string[]; } // PM Failure Summary (strategic level) interface PMFailureSummary { impactAssessment: {blockingTasks, projectDelay, riskLevel}; nextActions: string[]; lessonsLearned: string[]; } ``` **Success Criteria:** ✅ ACHIEVED - <5% error propagation to graph storage ✅ (QC verification before storage) - <20% worker retry rate ✅ (max 2 retries enforced) - 100% audit trail completeness ✅ (qcVerificationHistory tracked) ### Phase 3: Context Deduplication (v3.2) **Objective:** Active deduplication engine **Features:** - [ ] **Context Fingerprinting**: Hash-based duplicate detection ```typescript interface ContextFingerprint { hash: string; content: string; firstSeen: Date; useCount: number; } function deduplicateContext(contexts: string[]): string[] { const seen = new Map<string, boolean>(); return contexts.filter(c => { const hash = sha256(normalize(c)); if (seen.has(hash)) return false; seen.set(hash, true); return true; }); } ``` - [ ] **Smart Context Merging**: Consolidate redundant information ```typescript function mergeContexts(contexts: TaskContext[]): TaskContext { // Deduplicate file paths // Merge similar error messages // Consolidate dependency information } ``` **Success Criteria:** - >80% deduplication rate across fleet - <10ms overhead per deduplication check - Zero information loss in merge operations ### Phase 4: Scale & Performance (v3.3) **Objective:** Production-ready concurrency and observability **Features:** - [ ] **Distributed Locking**: Move beyond optimistic locking - Redis-based distributed locks - Automatic timeout and expiry - Lock observability and debugging - [ ] **Agent Pool Management**: Dynamic worker lifecycle ```typescript class AgentPool { async spawn(count: number): Promise<WorkerAgent[]> async scale(targetCount: number): Promise<void> async healthCheck(): Promise<PoolHealth> async metrics(): Promise<PoolMetrics> } ``` - [ ] **Performance Monitoring**: Agent-specific observability - Context size tracking per agent - Task completion times - Lock contention metrics - Retry rates and patterns **Success Criteria:** - Support 10+ concurrent workers - <1% lock conflict rate - <50ms P99 task claim latency --- ## 🔒 Concurrency Control Design ### Problem: Race Conditions **Scenario:** ``` Agent A Agent B ↓ ↓ Read: todo-5 (pending) Read: todo-5 (pending) ↓ ↓ Update: in_progress Update: in_progress ← RACE CONDITION ↓ ↓ Both work on same task ← WASTED WORK + CONFLICTS ``` ### Solution 1: Optimistic Locking (v3.0) **Approach:** Version-based conflict detection ```typescript interface Todo { id: string; status: TodoStatus; version: number; // ← Added field lockedBy?: string; lockedAt?: Date; } async function claimTask(taskId: string, agentId: string): Promise<boolean> { const task = await getTodo(taskId); try { await updateTodo({ id: taskId, status: 'in_progress', lockedBy: agentId, lockedAt: new Date(), version: task.version + 1, expectedVersion: task.version // ← Check this matches }); return true; } catch (VersionConflictError) { // Another agent claimed task - try different task return false; } } ``` **Benefits:** - No deadlocks (optimistic) - Automatic retry on conflict - Simple to implement **Limitations:** - High contention = many retries - Not suitable for >10 concurrent workers ### Solution 2: Pessimistic Locking (v3.1) **Approach:** Explicit lock acquisition ```typescript async function acquireLock(taskId: string, agentId: string): Promise<Lock | null> { const lock = await redis.set( `lock:${taskId}`, agentId, { NX: true, // Only set if not exists EX: 300 // Expire after 5 minutes } ); if (!lock) return null; // Another agent holds lock return { taskId, agentId, expiresAt: Date.now() + 300000 }; } async function releaseLock(taskId: string, agentId: string): Promise<void> { const currentHolder = await redis.get(`lock:${taskId}`); if (currentHolder === agentId) { await redis.del(`lock:${taskId}`); } } ``` **Benefits:** - Explicit lock visibility - Automatic timeout/expiry - Scales to 100+ workers **Complexity:** - Requires Redis or similar - Deadlock risk if not careful - Need lock monitoring ### Solution 3: Task Queue (v3.2+) **Approach:** FIFO queue with atomic dequeue ```typescript async function enqueueTask(task: Todo): Promise<void> { await queue.push('pending-tasks', task); } async function dequeueTask(agentId: string): Promise<Todo | null> { // Atomic operation - guaranteed unique const task = await queue.popAtomic('pending-tasks'); if (task) { await updateTodo({ id: task.id, status: 'in_progress', lockedBy: agentId }); } return task; } ``` **Benefits:** - Zero contention (atomic) - Natural FIFO ordering - Scales infinitely **Tradeoffs:** - Less flexible (can't choose specific task) - Requires queue infrastructure - Harder to debug --- ## 📊 Validation Plan ### Proof of Concept (Week 1-2) **Scenario:** "Implement user authentication system" **Setup:** 1. PM agent creates 5 subtasks in graph 2. 3 worker agents pull tasks in parallel 3. QC agent validates each completion **Measurements:** - Task conflict rate (target: 0%) - Worker retry rate (target: <20%) - PM context growth (target: 0%) - Total completion time vs. single-agent baseline **Success Criteria:** - Zero task conflicts - Workers complete with <10% retry rate - PM context remains stable during worker execution ### Benchmark (Week 3-4) **Comparison:** Single-agent vs. Multi-agent on same project **Test Cases:** 1. Small project (5 tasks, 10 files) 2. Medium project (20 tasks, 50 files) 3. Large project (100 tasks, 200 files) **Measurements:** - Total context tokens (single vs. multi-agent) - Context deduplication rate - Task completion accuracy - Time to completion **Hypothesis:** Multi-agent reduces context by 95% vs. single-agent ### Scale Test (Week 5-6) **Scenario:** 10 workers, 100 tasks **Measurements:** - Lock contention rate - Task claim latency (P50, P99) - Worker idle time - QC throughput **Target:** - <1% lock conflicts - <50ms P99 claim latency - <5% worker idle time --- ## 🚀 Getting Started ### For Developers **1. Enable Multi-Agent Mode:** ```typescript const server = new GraphRagTodoServer({ multiAgent: { enabled: true, lockStrategy: 'optimistic', maxWorkers: 3 } }); ``` **2. Spawn PM Agent:** ```typescript const pm = new PMAgent(); await pm.research("Build authentication system"); await pm.createTaskGraph(); ``` **3. Spawn Worker Agents:** ```typescript const workers = await AgentPool.spawn(3); await Promise.all(workers.map(w => w.executeAvailableTasks())); ``` **4. Spawn QC Agent:** ```typescript const qc = new QCAgent(); await qc.verifyCompletedTasks(); ``` ### For AI Agents **See:** [AGENTS.md](../AGENTS.md) - Multi-Agent Orchestration section **Quick Start:** 1. Use `create_todo` to build task graph (PM role) 2. Use `lock_todo` before claiming task (Worker role) 3. Use `memory_get_subgraph` for verification (QC role) --- ## 🎓 Research References [^1]: Liu et al. (2023) - "Lost in the Middle: How Language Models Use Long Contexts" [^2]: Anthropic (2024) - "Introducing Contextual Retrieval" (49-67% improvement) [^3]: iKala AI (2025) - "Context Engineering: Graph-RAG Techniques" [^4]: HippoRAG (2024) - "Neurobiologically Inspired Long-Term Memory" **Full analysis:** [GRAPH_RAG_RESEARCH.md](./GRAPH_RAG_RESEARCH.md) --- ## 📝 Change Log **2025-10-13:** Initial architecture proposal (v3.0) **2025-10-15:** Context isolation implemented (v3.1) **2025-10-18:** QC verification and retry logic implemented (v3.1) **2025-10-22:** Deliverable-focused QC, evidence-based workers, hallucination prevention (v4.0) **Status:** ✅ Production ready - all core features implemented **Key v4.0 Changes:** - QC now evaluates deliverable quality (not process metrics) - Workers must provide evidence-based output with tool quotes - NO truncation of QC feedback (complete guidance for workers) - Hallucination prevention through evidence requirements - 100% task success rate achieved in testing --- **Document maintained by:** Mimir Development Team **Next review:** After production deployment feedback

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/orneryd/Mimir'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

MULTI_AGENT_GRAPH_RAG.md•44.8 KiB