Kiwi MCP

kiwi-mcp
docs

THREAD_ARCHITECTURE_REVIEW.md•24 KiB

# THREAD_AND_STREAMING_ARCHITECTURE.md - Review Report **Date:** 2026-01-25 **Reviewer:** AI Assistant **Document Version:** Draft (6053 lines) --- ## Executive Summary This review evaluates THREAD_AND_STREAMING_ARCHITECTURE.md against core architectural principles and cross-references with PERMISSION_MODEL.md. Overall, the document is **architecturally sound** and demonstrates strong adherence to the "kernel stays dumb" principle. However, there are several **critical terminology inconsistencies**, **missing clarifications**, and **minor technical issues** that need addressing for production readiness. **Key Findings:** - ✅ Core principle "kernel stays dumb" is consistently maintained - ✅ Data-driven architecture is well-explained and consistent - ✅ Three-layer thread spawning model is clear - ⚠️ Critical terminology confusion between tools and directives - ⚠️ Incomplete explanation of harness instantiation flow - ⚠️ Thread registry permission model needs stronger emphasis - ⚠️ Some code examples contain minor inconsistencies --- ## Critical Issues (Must Fix) ### 1. **Terminology Confusion: `run_directive` vs `thread_directive`** **Lines: 1352-1360, 4321** The document uses `run_directive` in two conflicting ways: **Current text (Line 1354):** > **Terminology Note:** `run_directive` and `thread_directive` are distinct in Kiwi: > 1. **run_directive directive** - `.ai/directives/run_directive.md` - Workflow instructions > 2. **thread_directive tool** - `.ai/tools/orchestration/thread_directive.yaml` - Executable tool **Problem:** This creates confusion because: - Line 1352 section header says "The run_directive Directive" - Line 4321 mentions "run_directive directive" again - But the actual tool that LLMs call is `thread_directive` (Layer 3 in Appendix A.1) - The document sometimes implies LLMs call `run_directive`, other times `thread_directive` **Evidence from document:** - Line 1357: "run_directive directive" - workflow instructions (OK) - Line 1358: "thread_directive tool" - executable tool (OK) - Line 1438: Example shows calling `thread_directive` tool (Correct) - Line 4321: "run_directive directive (workflow instructions)" (OK) - BUT: No clear statement that LLMs should NOT call a tool named `run_directive` **Impact:** Medium-High. Implementers might create the wrong tool names. **Recommendation:** Add explicit clarification in Line 1360 section: ```xml **Critical distinction:** - `run_directive` is a DIRECTIVE (.ai/directives/run_directive.md) that provides guidance - `thread_directive` is the TOOL (.ai/tools/orchestration/thread_directive.yaml) that LLMs actually call - There is NO tool named `run_directive` - this would cause confusion - LLMs reference the directive for guidance, but call the thread_directive tool for execution ``` --- ### 2. **Harness Instantiation Flow Incomplete** **Lines: 4380-4432, 1626-1680** The document explains that `spawn_thread` instantiates the harness, but the **actual flow** is confusing across multiple sections. **Problem:** Three different explanations of who instantiates `KiwiHarnessRunner`: **Version 1 (Line 1709-1714):** ``` kiwi_harness.spawn.spawn_directive_thread(): 1. Generate thread_id 2. Load directive, mint capability token 3. Spawn background thread with KiwiHarnessRunner ``` **Version 2 (Line 4218-4244 - Appendix A.1):** ```python # kiwi_harness tool instantiates KiwiHarnessRunner async def spawn_with_kiwi_harness(...): runner = KiwiHarnessRunner(...) return await execute_tool("spawn_thread", { "runner_instance": runner, # Pass instantiated runner }) ``` **Version 3 (Line 4401-4410 - Appendix A.2):** ``` 4. spawn_thread tool (Layer 1 - Python runtime tool): ├── Imports kiwi_harness package ├── Creates KiwiHarnessRunner instance ``` **Contradiction:** Version 2 says `kiwi_harness` tool instantiates the runner and passes it to `spawn_thread`. Version 3 says `spawn_thread` imports and instantiates it. These are incompatible. **From PERMISSION_MODEL.md (lines 109-116):** The permission document shows harness loads directive AFTER spawning: ```python # Inside spawned thread, KiwiHarnessRunner loads directive directive_data = await mcp.execute(...) ``` **Impact:** High. This is a critical implementation detail that must be correct. **Correct Flow (based on Layer 2 definition in A.1):** 1. LLM calls `thread_directive` tool (Layer 3 validation/orchestration) 2. `thread_directive` calls `kiwi_harness` tool (Layer 2 Kiwi implementation) 3. `kiwi_harness` instantiates `KiwiHarnessRunner` object 4. `kiwi_harness` calls `spawn_thread` with `runner_instance` parameter 5. `spawn_thread` spawns OS thread/process with the runner 6. Inside spawned thread: `runner.run()` loads directive, mints token, starts loop **Recommendation:** - Fix Line 4401-4410 (Version 3) to match Version 2 - Change Line 4401 from "spawn_thread tool imports and creates" to "spawn_thread tool receives pre-instantiated runner from kiwi_harness" - Add sequence diagram in Appendix A.1 showing the complete flow with all 3 layers --- ### 3. **Token Minting Location Ambiguity** **Lines: 4650-4662, 1709-1714** **Problem:** Multiple contradictory statements about when/where tokens are minted: **Statement 1 (Line 1712):** ``` kiwi_harness.spawn.spawn_directive_thread(): 2. Load directive, mint capability token ← Before spawning 3. Spawn background thread with KiwiHarnessRunner ``` **Statement 2 (Line 4650-4662):** ``` - thread_directive tool (Layer 3) does NOT mint tokens - kiwi_harness tool (Layer 2) does NOT mint tokens - spawn_thread tool (Layer 1) instantiates KiwiHarnessRunner class - KiwiHarnessRunner.run_directive() method is what actually: 1. Loads the directive via MCP 2. Extracts <permissions> 3. Mints the capability token ← INSIDE spawned thread 4. Stores it in ThreadContext ``` **From PERMISSION_MODEL.md (lines 128-150):** Confirms Statement 2 is correct - token minted INSIDE spawned thread after directive is loaded. **Impact:** High. This affects implementation order and permission flow understanding. **Recommendation:** - Fix Line 1712 to remove "mint capability token" from pre-spawn phase - Update to: "2. Auto-generate thread_id, prepare spawn parameters" - Ensure all references consistently show token minting happens INSIDE spawned thread --- ### 4. **Thread Registry Permission Model Insufficient** **Lines: 4529-4571, 22** **Problem:** Document states thread_registry is "privileged internal tool" but permission model contradicts this. **Line 22 (Executive Summary):** ``` 5. Thread Registry - Privileged internal tool used by harness, not subject to directive permissions ``` **Lines 4529-4571 (Appendix A.2):** ```yaml # thread_registry Tool (Regular MCP Tool) requires: - registry.write # For register, update_status - registry.read # For get_status, query ``` ```python # thread_registry tool implementation async def register_thread(params: dict) -> dict: token = params.get("__auth") if not validate_token(token, required_caps=["registry.write"]): return {"error": "Missing required capability: registry.write"} ``` **Contradiction:** - Executive summary says "not subject to directive permissions" - Implementation shows it REQUIRES permissions and validates tokens - This is actually correct behavior (matches PERMISSION_MODEL.md), but executive summary is wrong **From PERMISSION_MODEL.md (lines 4434-4448):** Confirms thread_registry is a regular tool requiring system capabilities, NOT a special bypassed tool. **Impact:** Medium. Could lead to incorrect assumptions about security model. **Recommendation:** - Fix Executive Summary Line 22-23 - Change to: "Thread Registry - Regular MCP tool requiring system capabilities (registry.write/read), only accessible to core directives" - Remove "not subject to directive permissions" claim - Add emphasis that thread_registry is NOT special - it's the core directives that call it that are special (they have the required capabilities) --- ## Minor Inconsistencies (Should Fix) ### 5. **Sink Instantiation Timing** **Lines: 324-376, 451-495** **Problem:** Document says sinks are instantiated at "chain resolution time" but doesn't clarify if this is before or during http_client execution. **Line 328 comment:** ```python """Execute streaming HTTP request with destination fan-out. Sinks are pre-instantiated by the tool executor at chain resolution time. See tool chain resolution section above for sink instantiation logic. """ ``` **Line 453 (Sink Architecture):** ``` Where Sinks Are Instantiated: Sink instantiation happens at **tool chain resolution time** in the tool executor, NOT in http_client. ``` **Ambiguity:** "Chain resolution time" could mean: - Before http_client is called (sinks passed as parameter) ✓ Correct - During http_client execution (http_client loads sinks) ✗ Wrong **Evidence it's pre-instantiation:** Line 333 shows `sinks = params.pop("__sinks", [])` - sinks come FROM params, not loaded by http_client. **Impact:** Low. Implementation is clear in code examples, but terminology could be clearer. **Recommendation:** Change Line 453 to: "Sink instantiation happens **before http_client execution** by the tool executor during chain resolution. Sinks are passed to http_client via `__sinks` parameter." --- ### 6. **Code Example Inconsistency: Thread ID Generation** **Lines: 1712, 4213-4216** **Problem:** Two different patterns for thread ID generation shown: **Version 1 (Line 1712):** ``` "deploy_staging_20260125_103045" ``` **Version 2 (Line 4216):** ```python def generate_thread_id(directive_name: str) -> str: timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") return f"{directive_name}_{timestamp}" # Example: "deploy_staging_20260125_103045" ``` **Consistency check:** Both produce the same format. ✅ No issue here, actually. **However:** Line 760 shows same pattern again, and all three locations should be kept in sync if format ever changes. **Impact:** Very Low. Just a maintenance consideration. **Recommendation:** Add a note in one location (suggest Line 4216) stating: "This format is used consistently throughout documentation examples. Any format changes should update all references." --- ### 7. **Missing Cross-Reference to Permission Model** **Lines: 2817-2893, 4366-4869** **Problem:** Permission sections provide **good summaries** but lack consistent cross-references to PERMISSION_MODEL.md for details. **Current cross-references:** - Line 153: ✅ Links to PERMISSION_MODEL.md in principles - Line 2819: ✅ Links with list of topics - Line 4436: ✅ Links with list of topics - Line 4606: ✅ Links with complete details list - Line 4672: ✅ Links for capability system **Missing cross-references:** - Line 2815-2893 (Layer 6 section): Explains tokens but no link - Line 4788-4817 (Token Structure): Detailed example but no link **Impact:** Very Low. Document already has good cross-references overall. **Recommendation:** Add one-liner at start of Line 2815 section: "See [PERMISSION_MODEL.md](./PERMISSION_MODEL.md) for complete token lifecycle and enforcement details." --- ## Suggestions for Clarity ### 8. **"Kernel Stays Dumb" - Definition Could Be Earlier** **Lines: 149, 1246-1254** **Current structure:** - Line 149: Listed as principle #1 - Line 1246: First detailed explanation in Layer 4.5 **Suggestion:** Add expanded definition in "Key Architectural Principles" section (around Line 149): ```markdown 1. **Kernel Stays Dumb**: - MCP kernel has NO thread logic - it only loads and returns data - Thread spawning is a tool call, not kernel behavior - Kernel forwards capability tokens opaquely, never interprets them - All decision-making happens in harness or tools, not kernel - Example: execute(directive, run, X) always returns directive data; LLM decides whether to spawn thread ``` **Impact:** Low. Would improve comprehension on first read. --- ### 9. **Async Spawn Pattern Inconsistently Emphasized** **Lines: 23, 1677-1679, 3244-3585** **Problem:** Async spawn behavior is mentioned in multiple places but not consistently emphasized as a critical behavior. **Line 23 (Executive Summary):** ✅ "Spawn Returns Immediately" **Line 1677:** ✅ "Both spawn_thread and kiwi_harness are async spawn operations" **Line 3244:** ✅ Full section on "Thread Monitoring & Async Spawn Pattern" **Missing:** No async warning in the three-layer tool definitions (Appendix A.1, Lines 4108-4323) **Recommendation:** Add async behavior note to each layer definition in A.1: ```yaml # Layer 1: spawn_thread config: background: true # CRITICAL: Returns immediately, thread runs async ``` **Impact:** Low. Already well-documented, just could use one more reminder. --- ### 10. **Context vs Cost Budget Separation** **Lines: 23, 826-890, 5212-5404** **Current explanation:** Line 23 executive summary mentions "Context vs Cost - Separate concerns" but doesn't explain the difference. **Detailed explanation:** Lines 826-890 (ContextBudget class) and Section A.4 (Line 5212) explain thoroughly. **Gap:** No quick summary of the difference for readers who just skim the executive summary. **Suggestion:** Add to Line 23 (or new Line 24): ``` 8. Context vs Cost - Separate concerns: • Context = current input tokens per call (model limit, e.g., 200k tokens) • Cost = cumulative usage across thread lifetime (budget limit, e.g., $10 total) • Both tracked independently, either can trigger termination ``` **Impact:** Very Low. Would improve skim-reading comprehension. --- ## Missing Pieces ### 11. **No Diagram for Three-Layer Spawning** **Lines: 4104-4323** **Gap:** Appendix A.1 explains three-layer thread spawning in detail, but lacks a visual diagram showing the complete flow. **Current text-based diagram (Line 4325-4348):** Shows static layer structure but not execution flow. **Recommendation:** Add sequence diagram after Line 4348: ``` THREAD SPAWNING SEQUENCE (Three Layers) LLM (in current thread) │ │ execute(tool, run, thread_directive, {directive_name: "deploy"}) ▼ Layer 3: thread_directive tool │ ├── Load directive metadata │ ├── Validate <cost>, <permissions>, <model>, version │ └── Call kiwi_harness ▼ Layer 2: kiwi_harness tool │ ├── Generate thread_id = "deploy_20260125_103045" │ ├── Instantiate: runner = KiwiHarnessRunner(...) │ └── Call spawn_thread with runner_instance ▼ Layer 1: spawn_thread tool │ ├── Validate thread_id uniqueness │ ├── Spawn OS thread: threading.Thread(target=runner.run, ...) │ └── Return {"thread_id": "...", "status": "spawned"} │ │ [Async boundary - caller returns immediately] │ ▼ INSIDE SPAWNED THREAD: runner.run() │ ├── Load directive via MCP │ ├── Extract <permissions> from directive │ ├── Mint capability token │ ├── Build system prompt from AGENTS.md │ └── Start agent loop (call LLM → execute tools → repeat) ``` **Impact:** Low. Would significantly improve understanding of the most complex part. --- ### 12. **Error Handling Examples Missing** **Lines: 4350-4363, 5987-6036** **Current coverage:** - Line 4350: Error response format (basic) - Line 5991: Error wrapping pattern (general) - Line 6011: Example error response (validation) **Gap:** No error handling examples for: - Thread ID collision (spawn_thread returns existing thread error) - Permission denied during tool execution (token validation fails) - Cost budget exceeded mid-execution (harness terminates) - Stream interruption recovery (reconnection logic) **Recommendation:** Add subsection in Layer 8.5 (Thread Monitoring) showing error handling: ```markdown ### Common Error Scenarios #### Thread ID Collision If spawn_thread is called with duplicate thread_id: - Base layer (spawn_thread): Returns collision error - kiwi_harness layer: Regenerates thread_id with microseconds - thread_directive layer: Retries with new ID (max 3 attempts) #### Permission Denied If tool requires capability not in token: - Tool validates token, returns structured error - Harness formats for LLM: "Cannot execute write_file: missing fs.write capability" - LLM can search for alternative approach or report to user #### Cost Budget Exceeded When max_turns, max_tokens, or max_usd exceeded: - Cost tracker detects violation after LLM response - Harness terminates loop immediately - Final message injected: "Budget exceeded: X of Y used" - Thread status set to "completed" with budget_exceeded flag ``` **Impact:** Low. Would help implementers handle edge cases correctly. --- ## Strengths ### What Works Well 1. **✅ Consistent "Kernel Stays Dumb" Application** - Every layer correctly shows kernel as data router - No special cases or kernel thread logic anywhere - Clear separation between kernel and harness responsibilities 2. **✅ Three-Layer Spawning Model** - Clear separation of concerns across layers - Base layer is harness-agnostic (excellent design) - Kiwi-specific layer adds opinionated defaults - High-level layer adds validation 3. **✅ Data-Driven Architecture** - Sinks, capabilities, extractors all follow same pattern - Tools are configuration, primitives are code (well-explained) - No hardcoded special cases 4. **✅ Permission Model Integration** - Capability tokens explained correctly - Tool-layer enforcement clearly described - Hierarchical permissions (core vs user) well-articulated - Path-based scoping integrated properly 5. **✅ Comprehensive Examples** - Code examples are detailed and mostly consistent - YAML tool definitions are production-ready - Python implementations show actual code patterns 6. **✅ Clear Cross-References** - Good use of "See Appendix X" links - References to PERMISSION_MODEL.md where appropriate - Internal section references for complex topics 7. **✅ Implementation Phases** - Realistic time estimates - Clear dependencies between phases - Concrete file paths and deliverables --- ## Architectural Alignment ### ✅ "Kernel Stays Dumb" Verification **Checked all mentions of kernel behavior:** - ✅ Line 14: "The MCP stays dumb. Threads are just tools." - ✅ Line 1248: "The kernel has **no knowledge of threads**" - ✅ Line 1263-1266: "Kernel: Parse, validate, return directive data (No thread logic...)" - ✅ Line 2811: "Kernel stays dumb: Forwards opaque capability tokens" - ✅ Line 4370: "MCP kernel does NOT: Know about 'threads'" - ✅ Line 4642: "Kernel forwards opaquely (no validation)" **Verdict:** ✅ Consistently maintained. No violations found. --- ### ✅ Data-Driven Pattern Verification **Checked for hardcoded logic violations:** - ✅ Sinks: All except `return` are data-driven tools (Lines 377-450) - ✅ Capabilities: Data-driven tools in `.ai/tools/capabilities/` (Lines 4670-4710) - ✅ Extractors: Referenced as parallel pattern to sinks (Lines 422-495) - ✅ MCPs: Registry-based, not hardcoded (Lines 2966-3127) - ✅ Thread spawning: Three-layer tool system, not kernel logic (Lines 4104-4323) **Verdict:** ✅ Consistently applied. All extensibility points are data-driven. --- ### ✅ Permission Model Alignment **Cross-checked with PERMISSION_MODEL.md:** | Concept | THREAD doc | PERMISSION doc | Aligned? | |---------|------------|----------------|----------| | Token minting location | Inside spawned thread (Line 4656) | Inside spawned thread (Line 109) | ✅ Yes | | Thread-scoped tokens | One per thread (Line 4617) | One per thread (Line 200) | ✅ Yes | | Nested execution | Same thread, same token (Line 4838) | Same thread, same token (Line 226) | ✅ Yes | | Hierarchical permissions | Core vs user (Line 4432) | Core vs user (Line 346) | ✅ Yes | | Path scoping | Required for fs.* (Line 4751) | Required for fs.* (Line 593) | ✅ Yes | | Project sandboxing | Default (Line 4522) | Default (Line 876) | ✅ Yes | | System capabilities | SYSTEM_ONLY (Line 4459) | SYSTEM_ONLY (Line 646) | ✅ Yes | | Tool enforcement | All tools validate (Line 4564) | All tools validate (Line 1117) | ✅ Yes | **Verdict:** ✅ Fully aligned with PERMISSION_MODEL.md --- ## Summary Checklist ### Consistency - ⚠️ **Terminology** - `run_directive` vs `thread_directive` needs clarification (Issue #1) - ⚠️ **Harness instantiation** - Conflicting explanations need resolution (Issue #2) - ⚠️ **Token minting** - Ambiguous timing needs fixing (Issue #3) - ✅ Code examples are mostly consistent (minor issue #6) - ✅ Diagrams match text descriptions ### Technical Correctness - ✅ Data flows are logical and correct - ✅ No circular dependencies found - ✅ Three-layer spawning aligns with code examples - ✅ Permission examples are accurate - ⚠️ **Thread registry** - Permission model needs clarification (Issue #4) - ✅ Cost/context budget systems clearly separated - ✅ Streaming architecture is sound ### Clarity & Completeness - ✅ Complex concepts well-explained - ✅ No major missing pieces - ✅ Code examples are complete - ⚠️ `thread_directive` tool vs directive distinction could be clearer (Issue #1) - ✅ Thread registry "optional" claim needs fixing (Issue #4) - ✅ Direct vs managed execution clearly differentiated - ⚠️ Missing visual diagram for three-layer spawning (Issue #11) ### Architectural Alignment - ✅ "Kernel stays dumb" consistently maintained - ✅ No special kernel logic violations - ✅ Data-driven pattern applied throughout - ✅ Permission enforcement at tool level (not kernel) - ✅ Infrastructure tools treated as regular tools ### Specific Focus Areas - ✅ Layer 4.5 (Execute Directive): Verified no thread logic in kernel - ⚠️ Thread spawning flow: Clear in concept, conflicting in details (Issue #2) - ⚠️ Harness architecture: Instantiation flow needs clarification (Issue #2) - ⚠️ Sink architecture: Timing could be clearer (Issue #5) - ✅ Permission examples show scopes and project sandboxing correctly - ⚠️ Thread registry: "Privileged" claim incorrect (Issue #4) - ✅ Package structure clear - ✅ Streaming vs sync modes well-explained --- ## Priority Fixes ### 🔴 Must Fix Before Implementation 1. **Issue #2** - Harness Instantiation Flow (Lines 4380-4432) - Critical for correct implementation - Fix conflicting explanations of who instantiates KiwiHarnessRunner - Add sequence diagram 2. **Issue #3** - Token Minting Location (Lines 1709-1714, 4650-4662) - Critical for security architecture understanding - Ensure all references show minting INSIDE spawned thread 3. **Issue #1** - Terminology Clarification (Lines 1352-1360) - High priority to prevent implementation errors - Make tool vs directive distinction crystal clear ### 🟡 Should Fix Before Release 4. **Issue #4** - Thread Registry Permission Model (Line 22, 4529-4571) - Important for security understanding - Fix executive summary claim about "not subject to permissions" 5. **Issue #11** - Add Three-Layer Spawning Diagram (After Line 4348) - Would significantly improve comprehension - Most complex part of the architecture ### 🟢 Nice to Have 6. **Issue #8** - Earlier "Kernel Stays Dumb" Definition (Around Line 149) 7. **Issue #10** - Context vs Cost Quick Summary (Line 23-24) 8. **Issue #12** - Error Handling Examples (New subsection in Layer 8.5) --- ## Conclusion The THREAD_AND_STREAMING_ARCHITECTURE.md document presents a **sound and well-designed architecture** that successfully maintains the "kernel stays dumb" principle while enabling sophisticated thread management, streaming, and permission enforcement. **Strengths:** - Architecturally correct and principled - Comprehensive coverage of all layers - Good integration with permission model - Realistic implementation plan **Weaknesses:** - Several critical terminology inconsistencies (#1, #2, #3, #4) - Some ambiguous explanations that could confuse implementers - Missing visual diagrams for most complex parts **Recommendation:** ✅ **Approve architecture with required fixes**. Address Issues #1-4 before implementation begins. Issues #5-12 can be addressed during implementation or before final release. **Estimated fix time:** 2-4 hours for critical issues, 4-6 hours for all issues.

Loading blob content...

Latest Blog Posts

Redis vs ioredis vs valkey-glide
By punkpeye on January 26, 2026.
benchmark
Redis
valkey
Quickstart: Publish an MCP Server to the MCP Registry
By punkpeye on January 24, 2026.
mcp
official reference mirror
Official MCP Registry Server.json Requirements
By punkpeye on January 24, 2026.
mcp
official reference mirror

MCP directory API

We provide all the information about MCP servers via our MCP API.

curl -X GET 'https://glama.ai/api/mcp/v1/servers/leolilley/kiwi-mcp'

If you have feedback or need assistance with the MCP directory API, please join our Discord server

THREAD_ARCHITECTURE_REVIEW.md•24 KiB