We provide all the information about MCP servers via our MCP API.
curl -X GET 'https://glama.ai/api/mcp/v1/servers/leolilley/kiwi-mcp'
If you have feedback or need assistance with the MCP directory API, please join our Discord server
# Safety Harness Implementation Plan
**Source Documents:**
- `.ai/tmp/harness_and_execution_model.md` - Execution model
- `.ai/tmp/parser_first_principles.md` - Cost/permission structure
**Status:** Planning
**Estimated Time:** 2-3 weeks
---
## Core Principle: Fully Data-Driven Hooks
Following the existing kernel pattern:
- **Primitives** (subprocess, http_client) = only code that executes
- **Everything else** = data/config that gets extracted and passed to primitives
- **Harness** = a tool that wraps thread execution with enforcement
- **Hooks** = expressions evaluated against a generic `event` context, NOT hardcoded trigger types
### Why Expression-Based Hooks?
The previous design used named trigger tags (`<on_permission_denied>`, `<on_cost_exceeded>`). This smuggles in an implicit enum—the harness must know when to emit each trigger name.
**Expression-based hooks are fully data-driven:**
1. Harness builds a generic `event` context at each checkpoint
2. Hooks define `<when>` expressions evaluated against that context
3. First matching hook executes
4. **No hardcoded trigger taxonomy**—new conditions = new expressions, not new code
---
## Architecture (Expression-Based Hooks)
```
┌─────────────────────────────────────────────────────────────────┐
│ Directive Metadata │
│ <hooks> │
│ <hook> │
│ <when>event.code == "permission_denied"</when> │
│ <directive>request_elevated_permissions</directive> │
│ <inputs> │
│ <original_directive>${directive.name}</original_directive>│
│ </inputs> │
│ </hook> │
│ <hook> │
│ <when>cost.turns > limits.turns</when> │
│ <directive>handle_cost_exceeded</directive> │
│ </hook> │
│ </hooks> │
│ │
│ (No named trigger types - expressions match against context) │
└─────────────────────────────────────────────────────────────────┘
│
│ Parser extracts
▼
┌─────────────────────────────────────────────────────────────────┐
│ Extracted Hook Data │
│ [ │
│ { │
│ "when": "event.code == \"permission_denied\"", │
│ "directive": "request_elevated_permissions", │
│ "inputs": {"original_directive": "${directive.name}"} │
│ }, │
│ { │
│ "when": "cost.turns > limits.turns", │
│ "directive": "handle_cost_exceeded" │
│ } │
│ ] │
└─────────────────────────────────────────────────────────────────┘
│
│ Harness evaluates at checkpoints
▼
┌─────────────────────────────────────────────────────────────────┐
│ SafetyHarness Tool │
│ │
│ At each checkpoint (before_step, after_step, on_error): │
│ 1. Build context: {event, cost, limits, directive, ...} │
│ 2. For each hook: evaluate "when" expression against context │
│ 3. First match: substitute templates, execute hook directive │
│ 4. Read action from directive output: {action: "retry"} │
│ 5. Do what the directive says │
│ │
│ (Harness is a TOOL, not a kernel primitive) │
└─────────────────────────────────────────────────────────────────┘
```
---
## Design Decisions (Locked In)
| Decision | Answer |
| ---------------------- | ------------------------------------------------------------------- |
| Hook matching | **Expression-based.** `<when>` expressions, NOT named trigger tags |
| Hook execution | **Directives only.** Hooks execute directives, nothing else |
| Hook actions | **Returned by directive.** No `on_success`/`on_failure` in metadata |
| Expression eval | Simple operators only. Evaluated against context dict |
| Event structure | Generic `{name, code, detail}` - no hardcoded event taxonomy |
| Cost limits | From `<limits>` tag: `turns`, `tokens`, `spawns`, `duration`, `spend` |
| First match wins | Stop at first triggered hook, don't run multiple |
| Missing hook directive | Fail with error (directive must exist) |
| Template substitution | `${path.to.value}` replaced from context. Missing = leave as-is |
| Checkpoints | Fixed control-flow points, not semantic triggers |
### Checkpoints vs Triggers
The harness evaluates hooks at **checkpoints** (control-flow points), not at semantic trigger events:
| Checkpoint | When It Runs | Example `event` Value |
| ------------- | -------------------------------- | ----------------------------------------------------------- |
| `before_step` | Before executing a tool/step | `{name: "before_step", step: "deploy"}` |
| `after_step` | After successful step completion | `{name: "after_step", step: "deploy", result: {...}}` |
| `on_error` | After any error occurs | `{name: "error", code: "permission_denied", detail: {...}}` |
| `on_limit` | When a limit is about to be hit | `{name: "limit", code: "turns", current: 10, max: 10}` |
**Key insight:** Checkpoints are control-flow (where evaluation happens). Events are data values (what gets evaluated). This keeps the harness simple while allowing arbitrary hook logic.
### Event Context Structure
At each checkpoint, the harness builds this context:
```python
context = {
"event": {
"name": "error", # Checkpoint that triggered
"code": "permission_denied", # Specific error/limit code
"detail": { # Error-specific details
"missing": "fs.write",
"attempted": "filesystem_mcp.write_file"
}
},
"directive": {
"name": "deploy_staging",
"inputs": {...}
},
"cost": {
# CURRENT USAGE (tracked by harness at runtime)
"turns": 5,
"spawns": 2,
"duration_seconds": 120,
"tokens": 3500
},
"limits": {
# FROM METADATA (static, extracted from <limits> tag)
"turns": 10,
"tokens": 5000,
"spawns": 3,
"duration": 300,
"spend": 10.00,
"spend_currency": "USD"
},
"permissions": {
"granted": ["fs.read", "tool.bash"],
"required": ["fs.read", "fs.write"]
}
}
```
Hooks can match on any part of this context using expressions.
### Mapping `<limits>` Metadata to Context
The `<limits>` tag in directive metadata maps directly to context:
```xml
<limits>
<turns>10</turns>
<tokens>5000</tokens>
<spawns>3</spawns>
<duration>300</duration>
<spend currency="USD">10.00</spend>
</limits>
```
Maps to context as:
| XML Element | Context Path | Purpose |
| ------------------------------------ | ------------------------ | --------------------------- |
| `<turns>10</turns>` | `limits.turns` | Max turns allowed |
| `<tokens>5000</tokens>` | `limits.tokens` | Max tokens allowed |
| `<spawns>3</spawns>` | `limits.spawns` | Max spawns allowed |
| `<duration>300</duration>` | `limits.duration` | Max seconds allowed |
| `<spend currency="USD">10.00</spend>`| `limits.spend` | Max spend allowed |
| `<spend currency="...">` | `limits.spend_currency` | Currency for spend limit |
The `cost` object (current usage) is tracked by the harness at runtime, not extracted from metadata.
**Key distinction:**
- `limits.*` = static values from directive metadata (what you're allowed)
- `cost.*` = dynamic values tracked during execution (what you've used)
### Hooks Execute Directives Only
Simplified hook model:
- Hooks call **directives only** - no arbitrary MCP tool exposure
- Metadata uses `<directive>name</directive>` with optional `<inputs>`
- **No hardcoded actions** in hook metadata
- Hook directives can do whatever they need internally (call tools, etc.)
### Hook Directives Return Actions
The hook directive determines what happens next via its **output**:
```xml
<!-- Hook directive outputs -->
<outputs>
<action>retry|continue|skip|fail|abort</action>
<context>Optional context passed back to harness</context>
</outputs>
```
**Flow:**
1. Expression matches → harness executes hook directive
2. Hook directive runs, returns `{action: "retry", context: {...}}`
3. Harness reads action from output, does what directive says
**Valid actions:**
- `retry` - re-execute the original directive
- `continue` - proceed despite the condition
- `skip` - skip current step, continue workflow
- `fail` - return error to caller
- `abort` - terminate entire execution tree
This keeps actions **fully data-driven** - the directive decides, not hardcoded metadata.
---
## Kernel vs Harness Separation
**CRITICAL: The kernel knows NOTHING about threads.**
```
┌─────────────────────────────────────┐
│ LLM Runtime (Amp, Cursor, etc.) │
│ - Has MCP attached │
│ - Instantiates SafetyHarness │
└─────────────────────────────────────┘
│
│ Harness wraps LLM execution
▼
┌─────────────────────────────────────┐
│ SafetyHarness TOOL │ ← .ai/tools/threads/safety_harness.py
│ - Calls spawn_thread (tool) │
│ - Calls thread_registry (tool) │
│ - Calls inject_message (tool) │
│ - Calls MCP execute (via tool call)│
│ - Evaluates hook expressions │
│ - NOT in kiwi_mcp/ │
└─────────────────────────────────────┘
│
│ Makes MCP tool calls
▼
┌─────────────────────────────────────┐
│ KERNEL (Dumb) │ ← kiwi_mcp/
│ - Just returns JSON │
│ - No thread knowledge │
│ - No harness knowledge │
└─────────────────────────────────────┘
```
**Kernel primitives** (subprocess, http_client) = execution code for tools
**Harness** = a tool that orchestrates other tools, lives in `.ai/tools/`
---
## Phase 1: Hook Extraction (Days 1-2)
**Output:** Update `.ai/parsers/markdown_xml.py` to extract expression-based hooks
### Current Parser Architecture
Parsing is data-driven via:
1. **Parser** (`.ai/parsers/markdown_xml.py`) - parses file, returns structured dict
2. **Extractor** (`.ai/extractors/directive/markdown_xml.py`) - defines extraction rules
3. **SchemaExtractor** (`kiwi_mcp/schemas/tool_schema.py`) - orchestrates
We add hook extraction to the **parser**, not the kernel.
### 1.1 Update markdown_xml.py parser
Add hook extraction inside the `metadata` processing block:
```python
# In .ai/parsers/markdown_xml.py, inside the metadata processing loop
elif meta_tag == "hooks":
hooks = []
for hook_elem in meta_child:
if hook_elem.tag != "hook":
raise ValueError(f"Expected <hook> inside <hooks>, got <{hook_elem.tag}>")
# Required: when expression
when_elem = hook_elem.find("when")
if when_elem is None or not when_elem.text:
raise ValueError("Hook missing required <when> element")
# Required: directive to execute
directive_elem = hook_elem.find("directive")
if directive_elem is None or not directive_elem.text:
raise ValueError("Hook missing required <directive> element")
hook = {
"when": when_elem.text.strip(),
"directive": directive_elem.text.strip(),
}
# Optional: inputs (template variables like ${directive.name})
inputs_elem = hook_elem.find("inputs")
if inputs_elem is not None:
hook["inputs"] = _xml_to_dict(inputs_elem)
hooks.append(hook)
result["hooks"] = hooks
```
### 1.2 Update extractor to include hooks field
Add to `.ai/extractors/directive/markdown_xml.py`:
```python
EXTRACTION_RULES = {
# ... existing rules ...
"hooks": {"type": "path", "key": "hooks"}, # NEW
}
```
### 1.3 Result structure
After parsing, hooks are available at:
```python
parsed["hooks"] # List of hook definitions
# [
# {
# "when": "event.code == \"permission_denied\"",
# "directive": "request_elevated_permissions",
# "inputs": {"original_directive": "${directive.name}"}
# },
# {
# "when": "cost.turns > limits.turns",
# "directive": "handle_cost_exceeded"
# }
# ]
```
### 1.4 Example directive with hooks
```xml
<directive name="deploy_staging" version="1.0.0">
<metadata>
<description>Deploy to staging environment</description>
<category>deployment</category>
<author>devops</author>
<model tier="balanced" fallback_id="gpt-4o-mini">
Deployment orchestration with shell commands
</model>
<limits>
<turns>20</turns>
<tokens>50000</tokens>
<spawns>3</spawns>
<duration>600</duration>
<spend currency="USD">5.00</spend>
</limits>
<permissions>
<read resource="filesystem" path="src/**" />
<write resource="filesystem" path="dist/**" />
<execute resource="tool" id="bash" />
</permissions>
<hooks>
<hook>
<when>event.code == "permission_denied"</when>
<directive>request_elevated_permissions</directive>
<inputs>
<original_directive>${directive.name}</original_directive>
<missing_cap>${event.detail.missing}</missing_cap>
</inputs>
</hook>
<hook>
<when>cost.turns > limits.turns * 0.9</when>
<directive>warn_approaching_limit</directive>
</hook>
<hook>
<when>event.name == "error" and event.code == "timeout"</when>
<directive>handle_timeout</directive>
</hook>
</hooks>
</metadata>
...
</directive>
```
### Tasks:
- [ ] Update `.ai/parsers/markdown_xml.py` to extract `<hooks>` with expression-based `<when>`
- [ ] Validate `<when>` and `<directive>` are present (required)
- [ ] Extract optional `<inputs>`
- [ ] Update `.ai/extractors/directive/markdown_xml.py` to include hooks field
- [ ] Add tests for hook extraction
---
## Phase 2: Expression Evaluator (Days 3-4)
**Output:** `.ai/tools/threads/expression_evaluator.py`
### 2.1 Safe expression evaluator
Support only (non-Turing-complete):
- Comparison: `>`, `<`, `==`, `!=`, `>=`, `<=`
- Logical: `and`, `or`, `not`
- Membership: `in`, `not in`
- Arithmetic: `+`, `-`, `*`, `/` (for limit calculations)
- Literals: numbers, strings, booleans
- Property access: `event.code`, `cost.turns`, `limits.turns`
- Variables: identifiers resolved from context
**NOT supported** (security):
- Function calls
- Attribute access to methods
- Imports
- Arbitrary Python execution
```python
# .ai/tools/threads/expression_evaluator.py
def evaluate_expression(expr: str, context: Dict) -> bool:
"""
Safely evaluate expression against context.
Examples:
evaluate_expression("cost.turns > limits.turns", context) # True/False
evaluate_expression("event.code == \"permission_denied\"", context)
evaluate_expression("\"fs.write\" in permissions.required", context)
"""
tokens = tokenize(expr)
ast = parse_expression(tokens)
return evaluate_ast(ast, context)
def resolve_path(path: str, context: Dict) -> Any:
"""
Resolve dotted path like 'event.detail.missing' from context.
Returns None if path doesn't exist (rather than raising).
"""
parts = path.split(".")
value = context
for part in parts:
if isinstance(value, dict):
value = value.get(part)
else:
return None
if value is None:
return None
return value
```
### 2.2 Template substitution
```python
def substitute_templates(obj: Any, context: Dict) -> Any:
"""
Replace ${path.to.value} with values from context.
Handles:
- Strings: "${directive.name}" → "deploy_staging"
- Nested dicts: recursively substitute
- Lists: substitute each element
- Missing: leave ${...} as-is
"""
```
### 2.3 Expression grammar
```
expression := or_expr
or_expr := and_expr ("or" and_expr)*
and_expr := not_expr ("and" not_expr)*
not_expr := "not" not_expr | comparison
comparison := additive (comp_op additive)?
comp_op := "==" | "!=" | "<" | ">" | "<=" | ">=" | "in" | "not in"
additive := term (("+"|"-") term)*
term := factor (("*"|"/") factor)*
factor := literal | path | "(" expression ")"
path := IDENT ("." IDENT)*
literal := NUMBER | STRING | "true" | "false" | "null"
```
### Tasks:
- [ ] Implement tokenizer for safe expressions
- [ ] Implement AST parser (only allowed operators)
- [ ] Implement evaluator with path resolution
- [ ] Implement template substitution
- [ ] Add comprehensive tests
- [ ] Document supported syntax
---
## Phase 3: Safety Harness Tool (Days 5-8)
**Output:** `.ai/tools/threads/safety_harness.py`
### Key Principle: Harness is a TOOL, NOT a Kernel Primitive
The kernel is dumb. It knows nothing about threads. The harness:
- Lives in `.ai/tools/threads/` (NOT in `kiwi_mcp/`)
- Uses other tools: `spawn_thread`, `thread_registry`, `inject_message`
- Calls MCP via standard tool interface
- Is instantiated by the LLM runtime, not the kernel
### 3.1 SafetyHarness class
```python
# .ai/tools/threads/safety_harness.py
__tool_type__ = "python"
__version__ = "1.0.0"
__executor_id__ = "python_runtime"
__category__ = "threads"
"""
Safety Harness: Wraps directive execution with enforcement.
This is a TOOL that orchestrates other tools. It:
- Calls spawn_thread, thread_registry, inject_message (other tools)
- Calls MCP via tool interface
- Evaluates hook expressions at checkpoints
- Is NOT part of the kernel
"""
from expression_evaluator import evaluate_expression, substitute_templates
class SafetyHarness:
"""
Wraps directive execution with permission, cost, and hook enforcement.
Uses expression-based hooks evaluated at checkpoints.
"""
# Checkpoints where hooks are evaluated (control-flow points, not semantic events)
CHECKPOINTS = ["before_step", "after_step", "on_error", "on_limit"]
def __init__(self, project_path: Path, parent_token: Optional[CapabilityToken] = None):
self.project_path = project_path
self.parent_token = parent_token
async def execute_directive(
self,
directive_name: str,
inputs: Dict,
hooks: List[Dict], # Extracted from directive metadata
limits: Dict, # From <limits> tag
model_config: Dict, # From <model> tag
) -> HarnessResult:
"""Execute directive with full harness enforcement."""
# 1. Build initial context
context = self._build_context(directive_name, inputs, limits)
# 2. Check permissions (using capability token)
perm_result = self._check_permissions(context)
if not perm_result.ok:
# Set event for permission failure
context["event"] = {
"name": "error",
"code": "permission_denied",
"detail": {
"missing": perm_result.missing_caps,
"granted": list(self.parent_token.caps) if self.parent_token else []
}
}
return await self._evaluate_hooks(hooks, context)
# 3. Spawn thread using spawn_thread TOOL
thread_result = await spawn_thread.execute(
thread_id=f"{directive_name}_{uuid4().hex[:8]}",
directive_name=directive_name,
project_path=str(self.project_path),
)
# 4. Inject MCP call + response using inject_message TOOL
await inject_message.execute(
thread_id=thread_result["thread_id"],
role="assistant",
content=json.dumps({
"tool_call": "kiwi-mcp",
"action": "execute",
"params": {"item_type": "directive", "action": "run", "item_id": directive_name}
}),
)
directive_content = await self._load_directive(directive_name)
await inject_message.execute(
thread_id=thread_result["thread_id"],
role="tool_result",
content=json.dumps(directive_content),
)
# 5. Execute with enforcement loop
return await self._execute_with_enforcement(
thread_result["thread_id"], context, hooks, limits
)
async def _execute_with_enforcement(self, thread_id, context, hooks, limits):
"""Execute turns with checkpoint-based hook evaluation."""
while not self._is_thread_complete(thread_id):
# CHECKPOINT: on_limit - check before each turn
if context["cost"]["turns"] >= limits["turns"]:
context["event"] = {
"name": "limit",
"code": "turns",
"current": context["cost"]["turns"],
"max": limits["turns"]
}
result = await self._evaluate_hooks(hooks, context)
if result.action != "continue":
return result
if context["cost"]["spawns"] >= limits["spawns"]:
context["event"] = {
"name": "limit",
"code": "spawns",
"current": context["cost"]["spawns"],
"max": limits["spawns"]
}
result = await self._evaluate_hooks(hooks, context)
if result.action != "continue":
return result
# CHECKPOINT: before_step
context["event"] = {"name": "before_step", "turn": context["cost"]["turns"]}
result = await self._evaluate_hooks(hooks, context)
if result.action == "skip":
continue
if result.action not in ("continue", None):
return result
# Let LLM execute one turn
await self._wait_for_turn(thread_id)
context["cost"]["turns"] += 1
# Check for errors via thread_registry TOOL
status = await thread_registry.get_status(thread_id)
if status.get("error"):
# CHECKPOINT: on_error
context["event"] = {
"name": "error",
"code": status["error"].get("code", "unknown"),
"detail": status["error"]
}
result = await self._evaluate_hooks(hooks, context)
if result.action != "continue":
return result
# CHECKPOINT: after_step
context["event"] = {
"name": "after_step",
"turn": context["cost"]["turns"],
"result": status.get("result")
}
await self._evaluate_hooks(hooks, context)
return HarnessResult(success=True, output=self._get_thread_result(thread_id))
async def _evaluate_hooks(self, hooks: List[Dict], context: Dict) -> HarnessResult:
"""Evaluate all hooks against context. First match wins."""
for hook in hooks:
when_expr = hook["when"]
# Evaluate expression against context
try:
if evaluate_expression(when_expr, context):
# First match - execute hook directive
return await self._execute_hook(hook, context)
except Exception as e:
# Expression evaluation error - log and continue
logger.warning(f"Hook expression error: {when_expr} - {e}")
continue
# No hook matched - return continue (default behavior)
return HarnessResult(action="continue")
async def _execute_hook(self, hook: Dict, context: Dict) -> HarnessResult:
"""Execute hook directive and read action from its output."""
# Substitute templates in hook inputs
hook_directive_name = hook["directive"]
inputs = substitute_templates(hook.get("inputs", {}), context)
# Create child harness for hook execution (recursive enforcement)
child_harness = SafetyHarness(
project_path=self.project_path,
parent_token=self._attenuate_token(context),
)
# Load hook directive to get its hooks/limits
hook_directive = await self._load_directive(hook_directive_name)
# Execute hook directive with child harness
result = await child_harness.execute_directive(
directive_name=hook_directive_name,
inputs=inputs,
hooks=hook_directive["metadata"].get("hooks", []),
limits=hook_directive["metadata"]["limits"],
model_config=hook_directive["metadata"]["model"],
)
# Read action from directive output (data-driven)
action = result.output.get("action", "fail") # Default to fail if no action
return self._handle_action(action, context, result)
def _handle_action(self, action: str, context, result):
"""Handle action returned by hook directive."""
if action == "retry":
return HarnessResult(action="retry")
elif action == "continue":
return HarnessResult(action="continue", success=True)
elif action == "skip":
return HarnessResult(action="skip", success=True)
elif action == "fail":
return HarnessResult(success=False, error=result.output.get("error"))
elif action == "abort":
raise AbortExecution(result.output.get("error"))
else:
# Unknown action - treat as fail
return HarnessResult(success=False, error=f"Unknown action: {action}")
```
### 3.2 Standardized error envelope
All tools/directives must return errors in this format for consistent hook matching:
```python
# Success
{"ok": True, "output": {...}}
# Error
{"ok": False, "error": {"code": "permission_denied", "detail": {...}}}
```
This allows hooks to match on `event.code` without harness changes for new error types.
### Tasks:
- [x] Create `SafetyHarness` class in `.ai/tools/threads/`
- [x] Implement execution context building with `event` structure
- [x] Implement permission checking using CapabilityToken
- [x] Implement checkpoint-based hook evaluation loop
- [x] Implement expression-based hook matching (first wins)
- [x] Implement hook execution with child harness (recursive)
- [x] Read action from directive output (not hardcoded metadata)
- [x] Wire to use other tools (spawn_thread, thread_registry, inject_message)
- [x] Add tests (102 harness tests passing)
---
## Phase 4: thread_directive Tool (Days 9-10)
**Output:** `.ai/tools/threads/thread_directive.py`
### 4.1 User-facing tool
This is the tool LLMs call to spawn a directive on a new thread. It uses `SafetyHarness` internally.
```python
# .ai/tools/threads/thread_directive.py
__tool_type__ = "python"
__version__ = "1.0.0"
__executor_id__ = "python_runtime"
__category__ = "threads"
"""
Thread Directive Tool: Spawn a thread and execute a directive with harness enforcement.
This tool:
1. Loads the directive to get metadata (hooks, cost, permissions)
2. Creates a SafetyHarness instance
3. Calls harness.execute_directive()
4. Returns the result
"""
from safety_harness import SafetyHarness
async def execute(
directive_name: str,
inputs: Optional[Dict] = None,
**params
) -> Dict:
"""
Spawn a thread and execute a directive on it.
"""
project_path = Path(params.get("_project_path", Path.cwd()))
# Load directive to get hooks, cost, permissions, model
# (Uses MCP load call internally)
directive = await _load_directive(directive_name, project_path)
# Create harness with parent token (if we're in a child thread)
harness = SafetyHarness(
project_path=project_path,
parent_token=params.get("_token"),
)
# Execute with full enforcement
result = await harness.execute_directive(
directive_name=directive_name,
inputs=inputs or {},
hooks=directive["metadata"].get("hooks", []),
limits=directive["metadata"]["limits"],
model_config=directive["metadata"]["model"],
)
return result.to_dict()
async def _load_directive(name: str, project_path: Path) -> Dict:
"""Load directive using MCP (as a tool would)."""
from kiwi_mcp.handlers.directive.handler import DirectiveHandler
handler = DirectiveHandler(str(project_path))
return await handler.load(name, source="project")
```
### 4.2 YAML sidecar
```yaml
# .ai/tools/threads/thread_directive.yaml
tool_id: thread_directive
tool_type: python
version: "1.0.0"
executor_id: python_runtime
description: "Spawn a thread and execute a directive with full harness enforcement"
requires:
- spawn.thread
- kiwi-mcp.execute
parameters:
- name: directive_name
type: string
required: true
description: "Name of the directive to execute"
- name: inputs
type: object
required: false
description: "Input parameters for the directive"
```
### Tasks:
- [x] Create tool Python file
- [x] Create YAML sidecar
- [x] Wire to SafetyHarness class
- [x] Add tests
---
## Phase 5: Integration & Tests (Days 11-13)
### 5.1 End-to-end tests
```python
# tests/harness/test_thread_harness.py
async def test_permission_denied_triggers_hook():
"""Permission denied → event.code == 'permission_denied' → hook matches → retry."""
async def test_cost_exceeded_triggers_hook():
"""cost.turns > limits.turns → hook matches → handle_cost_exceeded runs."""
async def test_first_hook_wins():
"""Multiple hooks match → only first executes."""
async def test_no_hook_matches():
"""No hook expression matches → default continue behavior."""
async def test_recursive_harness():
"""Hook directive spawns child → child also has harness with own hooks."""
async def test_template_substitution():
"""${directive.name}, ${event.detail.missing} replaced correctly in hook inputs."""
async def test_complex_expression():
"""event.name == 'error' and event.code in ['timeout', 'rate_limit']"""
async def test_arithmetic_in_expression():
"""cost.turns > limits.turns * 0.9 evaluates correctly."""
```
### 5.2 Hook directive examples
Create example hook directives that get called:
```
.ai/directives/hooks/
├── request_elevated_permissions.md
├── handle_cost_exceeded.md
├── handle_timeout.md
├── warn_approaching_limit.md
└── handle_execution_failure.md
```
Each is just a normal directive with its own `<metadata>`, `<inputs>`, `<outputs>`, and content.
### 5.3 Example hook directive
````xml
<directive name="request_elevated_permissions" version="1.0.0">
<metadata>
<description>Request elevated permissions when access is denied</description>
<category>hooks</category>
<author>system</author>
<model tier="fast">
Simple user interaction for permission request
</model>
<limits>
<turns>5</turns>
<tokens>5000</tokens>
<spawns>0</spawns>
<duration>60</duration>
<spend currency="USD">0.10</spend>
</limits>
<permissions>
<!-- Hook directives have minimal permissions -->
</permissions>
</metadata>
<inputs>
<original_directive type="string" required="true" />
<missing_cap type="string" required="true" />
</inputs>
<outputs>
<action type="string">retry|fail</action>
<context type="object" />
</outputs>
<process>
<step name="request_permission">
Ask the user for permission to use capability: ${missing_cap}
If granted, return:
```json
{"action": "retry", "context": {"elevated": true}}
```
If denied, return:
```json
{"action": "fail", "error": "Permission denied by user"}
```
</step>
</process>
</directive>
````
### Tasks:
- [ ] Write integration tests
- [ ] Create example hook directives
- [ ] Test full flow end-to-end
- [ ] Document expression syntax
- [ ] Document hook patterns
---
## File Structure (Final)
```
kiwi_mcp/ # KERNEL (dumb, no thread knowledge, UNCHANGED)
├── primitives/ # Unchanged
├── handlers/ # Unchanged
├── schemas/ # Unchanged (uses data-driven extractors)
└── utils/ # Unchanged
.ai/parsers/ # DATA-DRIVEN PARSERS
└── markdown_xml.py # UPDATE: Add expression-based hook extraction
.ai/extractors/directive/ # DATA-DRIVEN EXTRACTORS
└── markdown_xml.py # UPDATE: Add hooks field
.ai/tools/threads/ # HARNESS + THREAD TOOLS (outside kernel)
├── safety_harness.py # NEW: SafetyHarness class with checkpoint-based evaluation
├── thread_directive.py # NEW: User-facing tool
├── thread_directive.yaml # NEW: Tool discovery
├── expression_evaluator.py # NEW: Safe expression eval with path resolution
├── spawn_thread.py # Existing
├── thread_registry.py # Existing
├── inject_message.py # Existing
├── pause_thread.py # Existing
└── resume_thread.py # Existing
.ai/directives/hooks/ # NEW: Example hook directives
├── request_elevated_permissions.md
├── handle_cost_exceeded.md
├── handle_timeout.md
├── warn_approaching_limit.md
└── handle_execution_failure.md
```
**Key principles:**
- Kernel (`kiwi_mcp/`) is UNCHANGED - stays dumb
- Hook extraction added to data-driven parser (`.ai/parsers/`)
- Harness is a tool in `.ai/tools/` - uses other tools
- **Hooks are expression-based** - no hardcoded trigger taxonomy
---
## Summary
| Layer | What It Does | Code Location |
| ---------------------- | ---------------------------------------------- | ------------------------------------------- |
| **Directive metadata** | Defines hooks with `<when>` expressions | `.ai/directives/*.md` |
| **Parser** | Extracts hooks to JSON (no trigger enum) | `.ai/parsers/markdown_xml.py` |
| **Expression eval** | Safely evaluates expressions against context | `.ai/tools/threads/expression_evaluator.py` |
| **SafetyHarness tool** | Builds context, evaluates hooks at checkpoints | `.ai/tools/threads/safety_harness.py` |
| **thread_directive** | User-facing tool, uses SafetyHarness | `.ai/tools/threads/thread_directive.py` |
**Key principles:**
1. **Kernel is dumb** - just extracts and returns JSON, no thread knowledge
2. **Harness is a tool** - lives in `.ai/tools/`, NOT in `kiwi_mcp/`
3. **Hooks are expression-based** - `<when>` expressions, NOT named trigger tags
4. **Events are data** - generic `{name, code, detail}` structure, no hardcoded taxonomy
5. **Checkpoints are control-flow** - fixed points where hooks are evaluated
6. **New conditions = new expressions** - no code changes needed for new hook types
7. **Harness uses other tools** - spawn_thread, thread_registry, inject_message
---
## Appendix: Expression Examples
```python
# Permission checks
"event.code == \"permission_denied\""
"\"fs.write\" in permissions.required"
"event.detail.missing == \"spawn.thread\""
# Cost checks
"cost.turns > limits.turns"
"cost.turns > limits.turns * 0.9" # 90% warning
"cost.spawns >= limits.spawns"
# Error handling
"event.name == \"error\""
"event.name == \"error\" and event.code == \"timeout\""
"event.code in [\"timeout\", \"rate_limit\", \"network_error\"]"
# Complex conditions
"event.name == \"error\" and (event.code == \"permission_denied\" or event.code == \"quota_exceeded\")"
"cost.turns > 5 and event.name == \"before_step\""
```